Get keys and values from a Scala Map

scala> val states = Map("AK" -> "Alaska", "AL" -> "Alabama", "AR" -> "Arkansas")
states: scala.collection.immutable.Map[String,String] = Map(AK -> Alaska, AL -> Alabama, AR -> Arkansas)

scala> states.keySet
res0: scala.collection.immutable.Set[String] = Set(AK, AL, AR)

scala> states.keys
res1: Iterable[String] = Set(AK, AL, AR)

scala> states.values
res2: Iterable[String] = MapLike.DefaultValuesIterable(Alaska, Alabama, Arkansas)

scala> states("AK")
res3: String = Alaska

states + ("NY" -> "New York")
res7: scala.collection.immutable.Map[String,String] = Map(AK -> Alaska, AL -> Alabama, AR -> Arkansas, NY -> New York)

Idempotence

The more I work with systems exchanging messages among them, the more I am aware of idempotency.

Formally speaking, in mathematics, an idempotent function has the same effect no mater how many times it is applied. Think about a remote to lock/unlock a car, most remotes will just lock the doors no matter how many times you press the lock switch. In mathematical notation:

f(x)=f(f(x))

Why is it so important? Well, if you have a set of uncoupled systems, you can experiment partitions or other problems that prevent messages from being successfully processed. If this is the case the first thing you need is the ability to retry sending the message not warring about a possible duplicate effect on the world.

How to implement idempotency

There are two options to do idempotency:

store state at the server so that it recognises a duplicate request and just ignores it, usually it is achieved storing the processed messages IDs; or
make the algorithms idempotent, imagine something like when the message encodes an intention rather then the way to do something. One example would be in a game have the instruction grab sward, no matter the amount of times it is processed the effect would be the same.

Final remarks

I strongly support idempotent code, from the start, even if right now you do not figure out the benefits. I always expect idempotency to emerge as an essential way to avoid expensive tasks to avoid unintended duplicates.

Message attributes in messaging systems

For me the best way to create micro-services is to follow a small set of rules:

highly decoupled – each service does not know anything about other services;
asynchronous communication – reactive system;
stateless wherever possible; and
small services – a single developer should be able to maintain the entire micro-service in her head.

There is no better way to create decoupling than to use a messaging system. A service subscribes to one or several topics and publishes to one or more topics. This way a particular service does not know where messages are originated and is not aware of what happens downstream.

Point to point systems

Traditional systems where components communicate directly rely on the knowledge that consumers have on servers endpoints. This is the philosophy behind REST and gRPC. Each service is passively waiting for its services to be requested, exposing endpoints for clients to call. The client usually maintains the connection open waiting for a response.

Event driven reactive systems

Event driven reactive systems communicate via asynchronous messages. This messages should go flow through channels in a publish-subscribe pattern. A service subscribes to topics and publish to other topics. Just this easy. A message is an event. The system works reacting to messages flow, without need for a coordinator or orchestrator.

Canonical data model

The first step is to create a canonical data model for messages to conform to. In Kafka this is the schema registry. Each service must adapt its events internal representation to the canonical data model when consuming or producing messages.

Message attributes

A very important aspect is the distinction between payload and attributes (in Kafka they are called headers). The payload is the contents of the message, the event contents, for instance computation results or reply to a query. Attributes give context, for instance a timestamp.

Attributes are a fundamental piece of a micro-services architecture but usually they do not receive the attention they deserve. Attributes are seldom mentioned.

Monitoring and observability

The most important use is to create a monitoring and observability layer. A service mesh leads to a complex web of paths, and the decoupling of its components makes it hard to understand the system at rest. Its the monitoring of the live system that emerges system patterns.

Every message should be enriched with metadata, and metadata place is in the attributes. Attributes are typically:

timestamp;
message ID;
correlation ID (if it is a reply to a message or part of a set of messages);
relevant time to reply;
sender;
owner; and
if it is a probe message (for instance a message sent periodically to access system health).

Attributes are processed by other systems, for instance ElasticSearch, to provide metrics, warnings and errors.

You may be thinking that usually this is solved using logs. I argue that collecting logs is an invaluable post-mortem tool and some alerting but total reliance on logs is an antipattern. I could see it happen to surface the time it takes to process a message.

If there is metadata to associate with messages, it is the correct way to handle it. An event should be directly linked with its metadata. This way metadata becomes data fed to control systems in a very direct way. Metadata may start to be processed in the future as the system matures in ever more complex ways, increasing our understanding and enhancing our control over the system.