Brian Storti2023-11-30T13:17:00+00:00Brian Stortiwww.brianstorti.comSWIM: The scalable membership protocol2017-10-17T00:00:00+00:00www.brianstorti.com/swim<p>In a distributed system we have a group of nodes that need to collaborate and
send messages to each other. To achieve that they need to first answer a simple
question: <em>Who are my peers?</em><br />
That’s what membership protocols do. They help each node in this system to
maintain a list of nodes that are alive, notifying them when a new node joins
the group, when someone intentionally leaves and when a node dies (or at least
appears to be dead). <code class="language-plaintext highlighter-rouge">SWIM</code>, or <strong>S</strong>calable <strong>W</strong>eakly-consistent
<strong>I</strong>nfection-style Process Group <strong>M</strong>embership Protocol, is one of these
protocols.</p>
<h4 id="understanding-the-name">Understanding the name</h4>
<p><strong>S</strong>calable <strong>W</strong>eakly-consistent <strong>I</strong>nfection-style Process Group
<strong>M</strong>embership Protocol is quite a long name for a protocol that seems to do
such a simple thing, so let’s break down the name and understand what each piece
means. This will also help us understand why this protocol was created in the
first place.</p>
<p><strong>Scalable</strong>: Prior to <code class="language-plaintext highlighter-rouge">SWIM</code> most membership protocols used a heart-beating
approach, that is, each node would send a heartbeat (i.e. an empty message that
just means “I’m alive!”) to every other node in the cluster, every interval <code class="language-plaintext highlighter-rouge">T</code>.
If a node <code class="language-plaintext highlighter-rouge">N1</code> doesn’t receive a heartbeat from node <code class="language-plaintext highlighter-rouge">N2</code> after a certain
period, it declares this node dead. This works fine for a small cluster, but as
the number of nodes in this cluster increases, the number of messages that need
to be sent increases quadratically. If you have 10 nodes, it may be fine to send
100 messages every second, but with 1,000 nodes you would need to send
1,000,000 messages, and that would not scale very well.</p>
<p><strong>Weakly-consistent</strong>: That means that at a given point in time different nodes
can have a different view of the world. They will eventually converge to the
same state but we cannot expect strong consistency.</p>
<p><strong>Infection-style</strong>: That’s what is also commonly known as a <em>gossip</em> or
<em>epidemic</em> protocol. It means that a node shares some information with a subset
of its peers, that then share this information with a subset of its own peers,
until the entire cluster receives that information. That means a node doesn’t
need to send a message to all of its peers, it just tells that to a few
nodes that will gossip about it.</p>
<p><strong>Membership</strong>: Well, that basically means we will ultimately answer the
question “Who are my peers?”</p>
<h4 id="swim-components">SWIM Components</h4>
<p>Heart-beating protocols usually solve 2 different problems with the heartbeat
messages: They detect when a node fails (because it stops sending the
heartbeat), and they keep the list of the peers in the cluster (that is, every
node that is sending a heartbeat).<br />
<code class="language-plaintext highlighter-rouge">SWIM</code> decided to take the novel approach of dividing these 2 problems in
different components, so it has a failure detection and a dissemination module.</p>
<h5 id="failure-detection">Failure Detection</h5>
<p>Each node in the cluster will choose a node at random (say, <code class="language-plaintext highlighter-rouge">N2</code>) and will send
a <code class="language-plaintext highlighter-rouge">ping</code> message, expecting to receive an <code class="language-plaintext highlighter-rouge">ack</code> back. This is simply a probe
message and in normal circumstances it would receive this <code class="language-plaintext highlighter-rouge">ack</code> message and
confirm that <code class="language-plaintext highlighter-rouge">N2</code> is still alive.<br />
When that doesn’t happen, though, instead of immediately marking this node as
dead, it will try to probe it <em>through</em> other nodes. It will randomly select <code class="language-plaintext highlighter-rouge">k</code>
other nodes from its membership list and send a <code class="language-plaintext highlighter-rouge">ping-req(N2)</code> message.</p>
<p><img src="/assets/images/swim/failure-detection.png" /></p>
<p>This helps to prevent false-positives when for some reason <code class="language-plaintext highlighter-rouge">N1</code> cannot get a
response directly from <code class="language-plaintext highlighter-rouge">N2</code> (maybe because there’s a network congestion between
the two), but the node is still alive and accessible by <code class="language-plaintext highlighter-rouge">N4</code>.</p>
<p>If the node cannot be accessed by any of the <code class="language-plaintext highlighter-rouge">k</code> members, though, it’s marked as
dead.</p>
<p><img src="/assets/images/swim/failure-detection2.png" /></p>
<h5 id="dissemination">Dissemination</h5>
<p>Upon detecting a node as dead, the protocol can then just multicast this
information to all the other nodes in the cluster, and each node would remove
<code class="language-plaintext highlighter-rouge">N2</code> from its local list of peers. Information about nodes that are voluntarily
leaving or joining the cluster can be multicast in a similar way.</p>
<p><img src="/assets/images/swim/dissemination-multicast.png" /></p>
<h4 id="making-swim-more-robust">Making SWIM more robust</h4>
<p>This makes for a quite simple protocol, but there a few modification that the
original SWIM paper suggests to make it more robust and efficient, they are:</p>
<ul>
<li>Change the dissemination component to use an infection-style approach, instead of multicasting (after all, that’s in the name of the protocol);</li>
<li>Use a suspicion mechanism for the failure detection to reduce the false positives;</li>
<li>Use a round-robin probe target selection instead of randomly selecting nodes.</li>
</ul>
<p>Let’s explore each of these points to understand what they mean and why that
would be an improvement over the protocol that we have discussed thus far.</p>
<h5 id="infection-style-dissemination">Infection-style dissemination</h5>
<p>There are (at least) two issues that we need to be aware of when using this
multicast primitive to disseminate information:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/IP_multicast">IP multicast</a>, although generally available, is usually not enabled in most environments. For example, if you are running on Amazon VPC, <a href="https://aws.amazon.com/vpc/faqs/#Routing_Topology">you are out of luck</a>. You would then need to use a pretty inefficient point-to-point solution;</li>
<li>Even if you can use this type of multicast, it will usually use <code class="language-plaintext highlighter-rouge">UDP</code>, that’s a best-effort protocol, meaning the network can (and probably will) drop packets, making it hard to maintain a reliable membership list.</li>
</ul>
<p>A better (and, in my opinion, quite elegant) solution suggested in the SWIM
paper is to forget about this multicast idea, and instead use the <code class="language-plaintext highlighter-rouge">ping</code>,
<code class="language-plaintext highlighter-rouge">ping-req</code> and <code class="language-plaintext highlighter-rouge">ack</code> messages that we use for failure detection to <em>piggyback</em>
the information we need to disseminate. We are not adding any new messages, just
leveraging the messages that we already send, and “reusing” them to also
transport some information about membership updates.</p>
<h5 id="suspicion-mechanism-for-failure-detection">Suspicion mechanism for failure detection</h5>
<p>Another optimization is to first <em>suspect</em> a node is dead, before declaring it
dead. The goal here is to minimize false positives, as it is usually preferable
to take longer to detect a failed node than it is to wrongly mark a healthy node
as dead. It is a trade-off, though, and depending on the specific case this
might not make sense.</p>
<p>It works like this: When a node <code class="language-plaintext highlighter-rouge">N1</code> cannot receive an <code class="language-plaintext highlighter-rouge">ack</code> message from node
<code class="language-plaintext highlighter-rouge">N2</code> (neither directly, through a <code class="language-plaintext highlighter-rouge">ping</code> message, nor indirectly, through a
<code class="language-plaintext highlighter-rouge">ping-req</code> message), instead of disseminating that <code class="language-plaintext highlighter-rouge">N2</code> is dead and should be
removed from the membership list, it just disseminates that it suspects that
<code class="language-plaintext highlighter-rouge">N2</code> is dead.<br />
This suspected node is treated like a non-faulty node for all effects, and it
keeps receiving <code class="language-plaintext highlighter-rouge">ping</code> messages like any other member. If any node can get an
<code class="language-plaintext highlighter-rouge">ack</code> from <code class="language-plaintext highlighter-rouge">N2</code>, it’s marked again as alive and this information is
disseminated. <code class="language-plaintext highlighter-rouge">N2</code> itself can also receive a message saying it’s suspected to be
dead and tell the group that they are wrong and that it never felt better.<br />
If after a predefined timeout we don’t hear from <code class="language-plaintext highlighter-rouge">N2</code>, it’s then confirmed to be
dead and this information is disseminated.</p>
<h5 id="round-robin-probe-target-selection">Round-Robin probe target selection</h5>
<p>In the original protocol definition the node that is selected to be probed (i.e.
the node we send a <code class="language-plaintext highlighter-rouge">ping</code> message, expecting an <code class="language-plaintext highlighter-rouge">ack</code> in return) is picked at
random order. Although we can guarantee that eventually a node failure will be
detected by every non-faulty node, this can take a relatively long time if we
are out of luck with our target selection. A way to minimize this problem is to
maintain a list of the members we want to probe, and go through this list in a
round-robin fashion, and new joiners can be added at a random position in this
list.<br />
Using this approach we can have a time-bounded failure detection, where in the
worst case it will take <code class="language-plaintext highlighter-rouge">probe interval * number of nodes</code> to select this faulty
node.</p>
<h4 id="summary">Summary</h4>
<ul>
<li><code class="language-plaintext highlighter-rouge">SWIM</code> is a membership protocol that helps us know which nodes are part of a cluster, maintaining an updated list of healthy peers;</li>
<li>It divides the membership problem into 2 components: Failure detection and dissemination;</li>
<li>The failure detection component works by selecting a random node and sending it a <code class="language-plaintext highlighter-rouge">ping</code> message, expecting an <code class="language-plaintext highlighter-rouge">ack</code> in return;</li>
<li>If it does not receive an <code class="language-plaintext highlighter-rouge">ack</code>, it selects <code class="language-plaintext highlighter-rouge">k</code> peers to probe this node through a <code class="language-plaintext highlighter-rouge">ping-req</code> message;</li>
<li>An optimization of this failure detection is to first mark a node as “suspected”, and mark it as dead just after a timeout;</li>
<li>We can disseminate membership information by piggybacking the failure detection messages (<code class="language-plaintext highlighter-rouge">ping</code>, <code class="language-plaintext highlighter-rouge">ping-req</code> and <code class="language-plaintext highlighter-rouge">ack</code>) instead of using a multicast primitive;</li>
<li>A way to improve the failure-detection time is to select nodes in a round-robin fashion instead of selecting nodes at random.</li>
</ul>
<h4 id="other-resources">Other resources</h4>
<p>The original
<a href="https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf">paper</a>
is short and very readable. <code class="language-plaintext highlighter-rouge">Serf</code>, a HashiCorp tool, uses <code class="language-plaintext highlighter-rouge">SWIM</code> and has a
<a href="https://www.serf.io/docs/internals/gossip.html">nice overview</a> in their
documentation. They also talk about some modifications they made to increase
propagation speed and convergence rates, and the <a href="https://github.com/hashicorp/memberlist">code is open source</a>.<br />
Armon Dadgar, one of HashiCorp’s co-founders, also gave a very nice talk
explaining the protocol and these modifications:</p>
<iframe width="760" height="415" src="https://www.youtube.com/embed/bkmbWsDz8LM" frameborder="0" allowfullscreen=""></iframe>
Raft: Consensus made simple(r)2017-10-12T00:00:00+00:00www.brianstorti.com/raft<p>Consensus is one of the fundamental problems in distributed systems. We want
clients to perceive our system as a single coherent unit, but at the same time
we don’t want to have a single point of failure. We need to have several
machines collaborating in a way that they can all agree on the state of the
world, even though a lot of things can go wrong. Nodes can crash, messages can
be delivered out of order or not be delivered at all, and different nodes can
have a different idea of what the world looks like. Making a distributed system
behave like a coherent unit in face of these failures can be a challenge, and
that’s why we sometimes need a consensus algorithm, like <code class="language-plaintext highlighter-rouge">Raft</code>, that gives us
some guarantees about the properties that we can expect of this system.</p>
<h3 id="what-is-raft">What is Raft</h3>
<p><code class="language-plaintext highlighter-rouge">Raft</code> is a consensus algorithm that was created with the goal of being
understandable. This is a direct response to <code class="language-plaintext highlighter-rouge">Paxos</code>, which is probably the most
well-known algorithm in this space. <code class="language-plaintext highlighter-rouge">Paxos</code> solves the same type of problem, but
it’s a fairly complicated algorithm, and <code class="language-plaintext highlighter-rouge">Raft</code> promises to give us the same
guarantees, while being a lot simpler.</p>
<p>It’s currently used in several large scale system, like
<a href="https://www.consul.io/">Consul</a>, <a href="https://github.com/coreos/etcd">etcd</a> and
<a href="https://www.influxdata.com/">InfluxDB</a>, so it’s pretty mature and
battle-tested.</p>
<h3 id="how-it-works">How it works</h3>
<p><code class="language-plaintext highlighter-rouge">Raft</code> works by keeping a replicated log. This log is an append-only data
structure where new entries are added, and only a single server, the leader, is
responsible for managing this log. Every <code class="language-plaintext highlighter-rouge">write</code> request is sent to the leader
node, and this node will distribute it to the follower nodes and make sure the
client receives a confirmation for this write just when the data is safely
stored. Let’s get into the details.</p>
<p>The consensus problem is divided into three sub-problems: Leader election,
Replication and Safety.</p>
<h4 id="leader-election">Leader election</h4>
<p>Every node will always be in one of these three states: Leader, Follower or
Candidate, and we should never have more than one leader at the same time. Time
in <code class="language-plaintext highlighter-rouge">Raft</code> is divided into <em>terms</em>, which is basically an arbitrary period of time,
identified by a number that is sequentially incremented.</p>
<p>A server always starts as a follower, and it expects a <em>heartbeat</em> from the
leader. The follower will wait for this heartbeat for some time (defined as the
<code class="language-plaintext highlighter-rouge">election timeout</code>), and if it does not receive it, it will assume the leader is
dead and transition to the Candidate state. After it goes to this state, the
first thing it will do is to vote for itself, and then send a vote request to
all the other nodes (this request is an RPC called <code class="language-plaintext highlighter-rouge">RequestVote</code>). If it receives
a confirmation for this request from the majority of the nodes in this cluster
(e.g. 3 out of 5), it transitions to the Leader state.</p>
<p><img src="/assets/images/raft/leader_election.png" /></p>
<p>There are some interesting things that can happen here, though, and it’s where
<code class="language-plaintext highlighter-rouge">Raft</code>’s focus on understandability becomes apparent.</p>
<p>First, if all nodes start at the same time, they would all also timeout at the
same time, meaning every node would trigger this same <code class="language-plaintext highlighter-rouge">RequestVote</code> RPC, making
it a lot harder for a single node to obtain the majority of the votes. <code class="language-plaintext highlighter-rouge">Raft</code>
mitigates this issue by using a randomized election timeout for each node,
meaning one of the followers will usually timeout before the others, likely
becoming the new leader.</p>
<p>Even having this randomized timeout, we can still have a <em>split vote</em> situation,
where none of the nodes have the majority of the votes. For example, in a
cluster of 5 nodes when the leader dies we would end up with 4 nodes, and if 2
of these nodes timeout roughly at the same time, each one could get 2 votes, so
none of them can become the leader. The solution is as simple as it can be: Just
wait for another timeout, that will most likely solve the issue. When this
timeout happens and the term doesn’t have a leader, a new term will be
initiated, and each node will have a new random timeout value for the next
election, that is probably not the same again. We will have a performance
penalty because of that, but this timeout is usually just a few milliseconds,
and a <em>split vote</em> situation should be quite rare.</p>
<h4 id="log-replication">Log Replication</h4>
<p>This is the part that we really care about: How to keep this replicated log.<br />
After we have an elected leader, every request is sent to this node. If a
follower node receives a request it can just redirect it to the leader or return
an error to the client, indicating which node is the leader.</p>
<p>When the leader receives a request, it first appends it to its log, and then send
a request to every follower so they can do the same thing. This RPC is called
<code class="language-plaintext highlighter-rouge">AppendEntries</code>. Although the message was appended to the log, it was not
committed yet, and the client didn’t get a confirmation that the operation
succeeded. Just after the leader gets a confirmation from the majority of the
nodes it can actually commit the message, knowing it’s safely stored, and then
respond to the client. When the followers receive the next heartbeat message
(that is just an empty <code class="language-plaintext highlighter-rouge">AppendEntries</code> RPC) they know they can also commit this
message.</p>
<p>Other than the command sent by the client, each log entry also has a <em>term</em> number
and an <em>index</em>. The <em>term</em> just defines a unit of time (and, remember, each term
has no more than one leader), and the <em>index</em> is the position in the log. Let’s
understand why recording these two values is important.</p>
<h4 id="safety">Safety</h4>
<p>To ensure that every log is correctly replicated and that commands are executed
in the same order, some safety mechanisms are necessary.</p>
<h5 id="the-log-matching-property">The Log Matching Property</h5>
<p><code class="language-plaintext highlighter-rouge">Raft</code> maintains the <em>Log Matching Property</em> property, that says that if two
distinct log entries have the same term number and the same index, then they
will:</p>
<ul>
<li>Store the exact same command;</li>
<li>Be identical in all the preceding entries.</li>
</ul>
<p>As the leader will never create more than one entry with the same index in the
same term, the first property is fulfilled</p>
<p>The second property, guaranteeing that all the preceding entries are identical,
is achieved by a consistency check that the followers perform when they receive
an <code class="language-plaintext highlighter-rouge">AppendEntries</code> RPC.<br />
It works like this: The leader keeps track of the highest index that is
committed in its log, and send that information in every <code class="language-plaintext highlighter-rouge">AppendEntries</code> RPC
(even heartbeats). If the follower does not find an entry with that index in
its local log, it will reject the request, so if the <code class="language-plaintext highlighter-rouge">AppendEntries</code> RPC returns
successfully, the leader knows that its log and the follower’s are identical.</p>
<p>When the nodes are operating normally, these logs will always be consistent.
When a leader crashes, though, this log can be left inconsistent, and that’s
when <code class="language-plaintext highlighter-rouge">AppendEntries</code>’s consistency check will help us. Imagine this scenario:</p>
<ul>
<li>We have three nodes, <code class="language-plaintext highlighter-rouge">N1</code>, <code class="language-plaintext highlighter-rouge">N2</code> and <code class="language-plaintext highlighter-rouge">N3</code>, <code class="language-plaintext highlighter-rouge">N1</code> being the leader;</li>
<li><code class="language-plaintext highlighter-rouge">N1</code> replicates the messages <code class="language-plaintext highlighter-rouge">term=1; index=1; command=x</code> and <code class="language-plaintext highlighter-rouge">term=1; index=2; command=y</code> with <code class="language-plaintext highlighter-rouge">N2</code>, but <code class="language-plaintext highlighter-rouge">N3</code> never gets these messages;</li>
<li>Now <code class="language-plaintext highlighter-rouge">N1</code> crashes and <code class="language-plaintext highlighter-rouge">N2</code> becomes the new leader;</li>
<li>If <code class="language-plaintext highlighter-rouge">N2</code> tries to replicate the message <code class="language-plaintext highlighter-rouge">term=2; index=3; command=z</code> to <code class="language-plaintext highlighter-rouge">N3</code>,
the consistency check will reject this message, as the highest committed
index (<code class="language-plaintext highlighter-rouge">3</code>) is not present in <code class="language-plaintext highlighter-rouge">N3</code>’s log;</li>
<li><code class="language-plaintext highlighter-rouge">N2</code> will then go back in the log and transmit all the entries after the
latest entry present in <code class="language-plaintext highlighter-rouge">N3</code>, making the logs consistent again.</li>
</ul>
<h5 id="election-restriction">Election Restriction</h5>
<p>This property guarantees that a candidate will never win the leader election if
it does not have all the committed entries in its own log. As an entry needs to
be present in the majority of the nodes to be considered committed, when an
election is taking place at least one node will have the latest committed entry.
If a follower node receives a <code class="language-plaintext highlighter-rouge">RequestVote</code> RPC from a candidate that is behind
in the log (meaning a smaller term number, or same term number but smaller
index), it will not grant its vote to this candidate.</p>
<p><img src="/assets/images/raft/election_restriction.png" /></p>
<p>In the example above we have three logs, each entry represented with the term
number in which it was created.<br />
In this case, <code class="language-plaintext highlighter-rouge">Node 1</code> was the leader, and was able to commit up to index 5,
where it got a confirmation from the majority of the nodes (itself and <code class="language-plaintext highlighter-rouge">Node
2</code>). If <code class="language-plaintext highlighter-rouge">Node 1</code> dies and a new election starts, maybe <code class="language-plaintext highlighter-rouge">Node 3</code> can be the first
to transition to the Candidate state and try to become the leader. This would be
a problem, as its log does not have the latest committed entry (term 3, index
5). When it sends a <code class="language-plaintext highlighter-rouge">RequestVote</code> to <code class="language-plaintext highlighter-rouge">Node 2</code>, this node will notice that its
own log is more up to date than <code class="language-plaintext highlighter-rouge">Node 3</code>’s, and therefore will not grant its
vote, making it impossible for <code class="language-plaintext highlighter-rouge">Node 3</code> to become the leader.</p>
<h4 id="summary">Summary</h4>
<ul>
<li><code class="language-plaintext highlighter-rouge">Raft</code> is divided into 3 parts: Leader election, log replication and safety;</li>
<li>A node can be in one of these three states: Follower, Candidate or Leader;</li>
<li>Every node starts as a Follower, and after an election timeout transitions to the candidate state;</li>
<li>A Candidate will vote for itself and send <code class="language-plaintext highlighter-rouge">RequestVote</code> RPCs to all the other nodes;</li>
<li>If it gets votes from the majority of the nodes, it becomes the new Leader;</li>
<li>The leader is the only node responsible for managing the log, followers just add new entries to their logs in response to the leader <code class="language-plaintext highlighter-rouge">AppendEntries</code> RPC;</li>
<li>When the leader receives a command from the client, it first saves this uncommitted message, then sends it to every follower;</li>
<li>When it gets a successful response from the majority of nodes, the command is committed and the client gets a confirmation;</li>
<li>In the next <code class="language-plaintext highlighter-rouge">AppendEntries</code> RPC sent to the follower (that can be a new entry or just a heartbeat), the follower also commits the message;</li>
<li>The <code class="language-plaintext highlighter-rouge">AppendEntries</code> RPC implements a consistency check, to guarantee its local log is consistent with the leader’s;</li>
<li>A follower will just grant its vote to a candidate that has a log at least as up to date as its own;</li>
</ul>
<h3 id="other-resources">Other resources</h3>
<p>There are a lot of details left out in this post, so I really encourage you to
check out the entire <a href="https://raft.github.io/raft.pdf">Raft paper</a>, which is
quite readable. There’s also a <a href="https://raft.github.io/">Raft website</a> with a
lot of resources, including implementations in several different languages.
<a href="http://thesecretlivesofdata.com/raft">This Raft visualization</a> also helps a lot
to understand how the leader election and replication works, going step by step
and explaining everything that is happening, even though it will not cover every
scenario described in the paper.</p>
<p>And this is a great lecture by one of the authors:</p>
<iframe width="760" height="415" src="https://www.youtube.com/embed/vYp4LYbnnW8?rel=0" frameborder="0" allowfullscreen=""></iframe>
TCP Flow Control2017-06-30T00:00:00+00:00www.brianstorti.com/tcp-flow-control<p><code class="language-plaintext highlighter-rouge">TCP</code> is the protocol that guarantees we can have a reliable communication
channel over an unreliable network. When we send data from a node to another,
packets can be lost, they can arrive out of order, the network can be congested
or the receiver node can be overloaded. When we are writing an application,
though, we usually don’t need to deal with this complexity, we just write some
data to a socket and <code class="language-plaintext highlighter-rouge">TCP</code> makes sure the packets are delivered correctly to the
receiver node. Another important service that <code class="language-plaintext highlighter-rouge">TCP</code> provides is what is called
<em>Flow Control</em>. Let’s talk about what that means and how <code class="language-plaintext highlighter-rouge">TCP</code> does its magic.</p>
<h4 id="what-is-flow-control-and-what-its-not">What is Flow Control (and what it’s not)</h4>
<p>Flow Control basically means that <code class="language-plaintext highlighter-rouge">TCP</code> will ensure that a sender is not
overwhelming a receiver by sending packets faster than it can consume. It’s
pretty similar to what’s normally called <em>Back pressure</em> in the Distributed
Systems literature. The idea is that a node receiving data will send some kind
of feedback to the node sending the data to let it know about its current
condition.</p>
<p>It’s important to understand that this is <strong>not</strong> the same as <em>Congestion
Control</em>. Although there’s some overlap between the mechanisms <code class="language-plaintext highlighter-rouge">TCP</code> uses to
provide both services, they are distinct features. Congestion control is about
preventing a node from overwhelming the network (i.e. the links between two
nodes), while Flow Control is about the end-node.</p>
<h4 id="how-it-works">How it works</h4>
<p>When we need to send data over a network, this is normally what happens.</p>
<p><img src="/assets/images/tcp-flow-control/layers.png" /></p>
<p>The sender application writes data to a socket, the transport layer (in our
case, <code class="language-plaintext highlighter-rouge">TCP</code>) will wrap this data in a segment and hand it to the network layer
(e.g. <code class="language-plaintext highlighter-rouge">IP</code>), that will somehow route this packet to the receiving node.</p>
<p>On the other side of this communication, the network layer will deliver this
piece of data to <code class="language-plaintext highlighter-rouge">TCP</code>, that will make it available to the receiver application
as an exact copy of the data sent, meaning if will not deliver packets out of
order, and will wait for a retransmission in case it notices a gap in the byte
stream.</p>
<p>If we zoom in, we will see something like this.</p>
<p><img src="/assets/images/tcp-flow-control/buffers.png" /></p>
<p><code class="language-plaintext highlighter-rouge">TCP</code> stores the data it needs to send in the <em>send buffer</em>, and the data it
receives in the <em>receive buffer</em>. When the application is ready, it will then
read data from the receive buffer.</p>
<p>Flow Control is all about making sure we don’t send more packets when the
receive buffer is already full, as the receiver wouldn’t be able to handle them
and would need to drop these packets.</p>
<p>To control the amount of data that <code class="language-plaintext highlighter-rouge">TCP</code> can send, the receiver will advertise
its <em>Receive Window (rwnd)</em>, that is, the spare room in the receive buffer.</p>
<p><img src="/assets/images/tcp-flow-control/rwnd.png" /></p>
<p>Every time <code class="language-plaintext highlighter-rouge">TCP</code> receives a packet, it needs to send an <code class="language-plaintext highlighter-rouge">ack</code> message to the
sender, acknowledging it received that packet correctly, and with this <code class="language-plaintext highlighter-rouge">ack</code>
message it sends the value of the current receive window, so the sender knows if
it can keep sending data.</p>
<h4 id="the-sliding-window">The sliding window</h4>
<p><code class="language-plaintext highlighter-rouge">TCP</code> uses a sliding window protocol to control the number of bytes in flight it
can have. In other words, the number of bytes that were sent but not yet <code class="language-plaintext highlighter-rouge">ack</code>ed.</p>
<p>Let’s say we want to send a 150000 bytes file from node A to node B. <code class="language-plaintext highlighter-rouge">TCP</code> could
break this file down into 100 packets, 1500 bytes each. Now let’s say that when
the connection between node A and B is established, node B advertises a receive
window of 45000 bytes, because it really wants to help us with our math here.</p>
<p>Seeing that, <code class="language-plaintext highlighter-rouge">TCP</code> knows it can send the first 30 packets (1500 * 30 = 45000)
before it receives an acknowledgment. If it gets an <code class="language-plaintext highlighter-rouge">ack</code> message for the first
10 packets (meaning we now have only 20 packets in flight), and the receive
window present in these <code class="language-plaintext highlighter-rouge">ack</code> messages is still 45000, it can send the next 10
packets, bringing the number of packets in flight back to 30, that is the limit
defined by the receive window. In other words, at any given point in time it can
have 30 packets in flight, that were sent but not yet <code class="language-plaintext highlighter-rouge">ack</code>ed.</p>
<p><img src="/assets/images/tcp-flow-control/sliding-window.png" /></p>
<div class="image-description">
Example of a sliding window. As soon as packet 3 is acked, we can slide
the window to the right and send the packet 8.
</div>
<p>Now, if for some reason the application reading these packets in node B slows
down, <code class="language-plaintext highlighter-rouge">TCP</code> will still <code class="language-plaintext highlighter-rouge">ack</code> the packets that were correctly received, but as
these packets need to be stored in the receive buffer until the application
decides to read them, the receive window will be smaller, so even if <code class="language-plaintext highlighter-rouge">TCP</code>
receives the acknowledgment for the next 10 packets (meaning there are currently 20
packets, or 30000 bytes, in flight), but the receive window value received in
this <code class="language-plaintext highlighter-rouge">ack</code> is now 30000 (instead of 45000), it will not send more packets, as
the number of bytes in flight is already equal to the latest receive window
advertised.</p>
<p>The sender will always keep this invariant:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LastByteSent - LastByteAcked <= ReceiveWindowAdvertised
</code></pre></div></div>
<h4 id="visualizing-the-receive-window">Visualizing the Receive Window</h4>
<p>Just to see this behavior in action, let’s write a very simple application that
reads data from a socket and watch how the receive window behaves when we make
this application slower. We will use <code class="language-plaintext highlighter-rouge">Wireshark</code> to see these packets,
<code class="language-plaintext highlighter-rouge">netcat</code> to send data to this application, and a <code class="language-plaintext highlighter-rouge">go</code> program to read data from
the socket.</p>
<p>Here’s the simple <code class="language-plaintext highlighter-rouge">go</code> program that reads and prints the data received:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span>
<span class="k">import</span> <span class="p">(</span>
<span class="s">"bufio"</span>
<span class="s">"fmt"</span>
<span class="s">"net"</span>
<span class="p">)</span>
<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">listener</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">net</span><span class="o">.</span><span class="n">Listen</span><span class="p">(</span><span class="s">"tcp"</span><span class="p">,</span> <span class="s">"localhost:3040"</span><span class="p">)</span>
<span class="n">conn</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">listener</span><span class="o">.</span><span class="n">Accept</span><span class="p">()</span>
<span class="k">for</span> <span class="p">{</span>
<span class="n">message</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">bufio</span><span class="o">.</span><span class="n">NewReader</span><span class="p">(</span><span class="n">conn</span><span class="p">)</span><span class="o">.</span><span class="n">ReadBytes</span><span class="p">(</span><span class="sc">'\n'</span><span class="p">)</span>
<span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="kt">string</span><span class="p">(</span><span class="n">message</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This program will simply listen to connections on port <code class="language-plaintext highlighter-rouge">3040</code> and print the
string received.</p>
<p>We can then use <code class="language-plaintext highlighter-rouge">netcat</code> to send data to this application:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ nc localhost 3040
</code></pre></div></div>
<p>And we can see, using <code class="language-plaintext highlighter-rouge">Wireshark</code>, that the connection was established and a
window size advertised:</p>
<p><a href="/assets/images/tcp-flow-control/conn-established.png" target="_blank">
<img src="/assets/images/tcp-flow-control/conn-established.png" />
</a></p>
<div class="image-description">
Click on the image to enlarge it.
</div>
<p>Now let’s run this command to create a stream of data. It will simply add the
string “foo” to a file, that we will use to send to this application:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="k">while </span><span class="nb">true</span><span class="p">;</span> <span class="k">do </span><span class="nb">echo</span> <span class="s2">"foo"</span> <span class="o">></span> stream.txt<span class="p">;</span> <span class="k">done</span>
</code></pre></div></div>
<p>And now let’s send this data to the application:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">tail</span> <span class="nt">-f</span> stream.txt | nc localhost 3040
</code></pre></div></div>
<p>Now if we check <code class="language-plaintext highlighter-rouge">Wireshark</code> we will see a lot of packets being sent, and the
receive window being updated:</p>
<p><a href="/assets/images/tcp-flow-control/win-decreasing-1.png" target="_blank">
<img src="/assets/images/tcp-flow-control/win-decreasing-1.png" />
</a></p>
<p><a href="/assets/images/tcp-flow-control/win-decreasing-2.png" target="_blank">
<img src="/assets/images/tcp-flow-control/win-decreasing-2.png" />
</a></p>
<p>The application is still fast enough to keep up with the work, though. So let’s
make it a bit slower to see what happens:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">package main
</span>
import (
"bufio"
"fmt"
"net"
"time"
<span class="err">)</span>
<span class="p">func main() {
</span> listener, _ := net.Listen("tcp", "localhost:3040")
conn, _ := listener.Accept()
for {
message, _ := bufio.NewReader(conn).ReadBytes('\n')
fmt.Println(string(message))
<span class="gi">+ time.Sleep(1 * time.Second)
</span> }
<span class="err">}</span>
</code></pre></div></div>
<p>Now we are sleeping for 1 second before we read data from the receive buffer. If
we run <code class="language-plaintext highlighter-rouge">netcat</code> again and observe <code class="language-plaintext highlighter-rouge">Wireshark</code>, it doesn’t take long until the
receive buffer is full and <code class="language-plaintext highlighter-rouge">TCP</code> starts advertising a 0 window size:</p>
<p><a href="/assets/images/tcp-flow-control/zero-window.png" target="_blank">
<img src="/assets/images/tcp-flow-control/zero-window.png" />
</a></p>
<p>At this moment <code class="language-plaintext highlighter-rouge">TCP</code> will stop transmitting data, as the receiver’s buffer is
full.</p>
<h4 id="the-persist-timer">The persist timer</h4>
<p>There’s still one problem, though. After the receiver advertises a zero window,
if it doesn’t send any other <code class="language-plaintext highlighter-rouge">ack</code> message to the sender (or if the <code class="language-plaintext highlighter-rouge">ack</code> is
lost), it will never know when it can start sending data again. We will have a
deadlock situation, where the receiver is waiting for more data, and the sender
is waiting for a message saying it can start sending data again.</p>
<p>To solve this problem, when <code class="language-plaintext highlighter-rouge">TCP</code> receives a zero-window message it starts the
<em>persist timer</em>, that will periodically send a small packet to the receiver
(usually called <code class="language-plaintext highlighter-rouge">WindowProbe</code>), so it has a chance to advertise a nonzero window
size.</p>
<p><a href="/assets/images/tcp-flow-control/window-probe.png" target="_blank">
<img src="/assets/images/tcp-flow-control/window-probe.png" />
</a></p>
<p>When there’s some spare space in the receiver’s buffer again it can advertise a
non-zero window size and the transmission can continue.</p>
<h4 id="recap">Recap</h4>
<ul>
<li><code class="language-plaintext highlighter-rouge">TCP</code>’s flow control is a mechanism to ensure the sender is not overwhelming the
receiver with more data than it can handle;</li>
<li>With every <code class="language-plaintext highlighter-rouge">ack</code> message the receiver advertises its current receive window;</li>
<li>The receive window is the spare space in the receive buffer, that is,
<code class="language-plaintext highlighter-rouge">rwnd = ReceiveBuffer - (LastByteReceived – LastByteReadByApplication)</code>;</li>
<li><code class="language-plaintext highlighter-rouge">TCP</code> will use a sliding window protocol to make sure it never has more bytes
in flight than the window advertised by the receiver;</li>
<li>When the window size is 0, <code class="language-plaintext highlighter-rouge">TCP</code> will stop transmitting data and will start
the persist timer;</li>
<li>It will then periodically send a small <code class="language-plaintext highlighter-rouge">WindowProbe</code> message to the receiver
to check if it can start receiving data again;</li>
<li>When it receives a non-zero window size, it resumes the transmission.</li>
</ul>
<p>If you want to learn more about TCP (and <em>a lot</em> more), the book <a href="https://amzn.to/3rpYITr">Computer
Networking: A Top-Down Approach</a> is a great resource.</p>
<script src="https://gumroad.com/js/gumroad.js"></script>
<p><a class="gumroad-button" href="https://gum.co/tcp-flow-control" target="_blank">Get PDF</a></p>
A Primer on Database Replication2017-05-23T00:00:00+00:00www.brianstorti.com/replication<script src="https://gumroad.com/js/gumroad.js"></script>
<p><a class="gumroad-button" href="https://gum.co/replication" data-gumroad-single-product="true" target="_blank">
Get as PDF or ePub
</a></p>
<p>Replicating a database can make our applications faster and increase our
tolerance to failures, but there are a lot of different options available and
each one comes with a price tag. It’s hard to make the right choice if we do not
understand how the tools we are using work, and what are the guarantees they
provide (or, more importantly, do <em>not</em> provide), and that’s what I want to
explore here.</p>
<h4 id="before-we-start-some-background">Before we start, some background</h4>
<p><a href="https://engineering.alphasights.com/">AlphaSights</a> has offices in North
America, Europe and Asia and is rapidly expanding. People working in these 3
continents rely heavily on the tools that we build to do their job, so any
performance issue has a big impact in their work. As the number of people using
our systems increased our database started to feel the pressure. Initially we
could just keep increasing our database server capacity, getting a more powerful
machine, adding more RAM, and keep scaling vertically, but there is one problem
that we cannot solve, unfortunately: The limit of the speed of light.</p>
<blockquote>
<p>Light travels with a speed of 299,792 km/s in a vacuum. Even if we assume that
our requests are traveling at this speed, and that they travel in a straight
line, it would still take 133ms for a round the world trip. In reality, our
requests will be slower than the speed of light and there will be a lot of zig
zag from one hop to another until they reach their destination.</p>
</blockquote>
<p><img src="/assets/images/replication/ping-table.png" /></p>
<p>No matter how quickly we can <em>execute</em> a query, if the database is in North
America, the data still needs to travel all the way to Asia before people in
that office can use it. It was clear that we had to make that data available
somewhere closer to them, and so the quest began.</p>
<p>Researching all the available options and everything that is involved in a
database replication setup can be overwhelming, there are literally decades of
literature about the subject, and after you start digging it’s hard to see the
end.</p>
<p>I am by no means a replication expert, but during this process I learned a
thing or two, and that’s what I want to share here. This is not supposed to be
an extensive resource to learn everything there is to know about replication,
but hopefully it’s a good starting point that you can use in your own journey.
In the end of this article I will link to some great resources that can be
helpful if you decide to learn more.</p>
<p>Sounds good? Cool, grab a cup of coffee and let’s have fun.</p>
<h4 id="first-things-first-the-what-and-the-why">First things first, the What and the Why</h4>
<p>Just to make sure we are on the same page, let’s define what replication is and
describe the three main reasons why we might want it.</p>
<p>When we say we want to replicate something, it means we want to keep a copy of
the same data in multiple places. In the case of databases, that can mean a copy
of the entire database, which is the most common scenario, or just some parts of
it (e.g. a set of tables). These multiple locations where we will keep the data
are usually connected by a network, and that’s the origin of most of our
headaches, as you will see in a bit.</p>
<p>The reason for wanting that will be one or more of the following:</p>
<ul>
<li>
<p>You want to keep the data closer to your users so you can save the travel
time. Remember, no matter how fast your database is, the data still needs to
travel from the computer that started the request to the server where the
database is, and then back again. You can optimize the heck out of your
database, but you cannot optimize the laws of physics.</p>
</li>
<li>
<p>You want to scale the number of machines serving requests. At some point a
single server will not be able to handle the number of clients it needs to
serve. In that case, having several databases with the same data helps you serve
more clients. That’s what we call scaling <em>horizontally</em> (as opposed to
<em>vertically</em>, which means having a more powerful machine).</p>
</li>
<li>
<p>You want to be safe in case of failures (that will happen). Imagine you have
your data in single database server and that server catches fire, then what
happens? I am sure you have some sort of backup (right?!), but your backup will
a) take some time to be restored and b) probably be <em>at least</em> a couple of hours
old. Not cool. Having a replica means you can just start sending your requests
to this server while you are solving the fire situation, and maybe no one will
even notice that something bad happened.</p>
</li>
</ul>
<h4 id="the-obligatory-cap-introduction">The obligatory CAP introduction</h4>
<p>The CAP theorem was introduced by Eric Brewer in the year 2000, so it’s not a
new idea. The acronym stands for <code class="language-plaintext highlighter-rouge">C</code>onsistency, <code class="language-plaintext highlighter-rouge">A</code>vailability and <code class="language-plaintext highlighter-rouge">P</code>artition
Tolerance, and it basically says that, given these 3 properties in a distributed
system, you need to choose 2 of them (i.e. you cannot have all 3). In practice,
it means you need to choose between consistency and availability when an
inevitable partition happens. If this sounds confusing, let me briefly define
what these 3 terms mean, and why I am even talking about this here.</p>
<p><img src="/assets/images/replication/cap.png" /></p>
<p><strong>Consistency</strong>: In the CAP definition, consistency means that all the nodes in
a cluster (e.g. all your database servers, leaders and replicas) see the same
data at any given point in time. In practice, it means that if you query any of
your database servers at the exact same time, you will get the same result back.</p>
<blockquote>
<p>Notice that this is completely unrelated to the ‘Consistency’ from the
<a href="https://en.wikipedia.org/wiki/Consistency_(database_systems)#As_an_ACID_guarantee">ACID</a>
properties.</p>
</blockquote>
<p><strong>Availability</strong>: It means that reads and writes will always succeed, even if we
cannot guarantee that it will have the most recent data. In practice, it means
that we will still be able to use one of our databases, even when it cannot talk
to the others, and therefore might not have received the latest updates.</p>
<p><strong>Partition Tolerance</strong>: This means that your system will continue working even
if there is a network partition. A network partition means that the nodes in
your cluster cannot talk to each other.</p>
<p><img src="/assets/images/replication/network-partition.png" /></p>
<p>And why am I talking about this? Well, because depending on the route you take
you will have different trade-offs, sometimes favoring consistency and sometimes
availability.</p>
<p>How valuable the CAP theorem is in the distributed systems discussions is
debatable, but I think it is useful to keep in mind that you are almost always
trading consistency for availability (and vice-versa) when dealing with network
partitions.</p>
<h4 id="a-word-about-latency">A word about latency</h4>
<p><em>Latency</em> is the time that a request is waiting to be handled (it’s
<em>latent</em>). Our goal is to have the lowest latency possible. Of course, even with
a low latency we can still have a high <em>response time</em> (if a query takes a long
time to run, for example), but that’s a different problem.</p>
<p>When we replicate our database we can decrease the latency by shortening the
distance this request needs to travel and/or increasing our capacity, so the
request doesn’t need to wait before it can be handled due to a busy server.</p>
<p>I’m just mentioning this here because I think it’s very important to be sure
that the reason why we are experiencing high response times is really because
the latency is high, otherwise we may be solving the wrong problem.</p>
<h4 id="asynchronous-replication">Asynchronous replication</h4>
<p>When we talk about replication, we are basically saying that when I write some
data in a given node <code class="language-plaintext highlighter-rouge">A</code>, this same data also needs to be written in node <code class="language-plaintext highlighter-rouge">B</code>
(and maybe <code class="language-plaintext highlighter-rouge">C</code> and <code class="language-plaintext highlighter-rouge">D</code> and <code class="language-plaintext highlighter-rouge">E</code> and…), but we need to decide <em>how</em> this
replication will happen, and what are the guarantees that we need. As always,
it’s all about trade-offs. Let’s explore our options.</p>
<p>The first option is to be happy to send a confirmation back to the client as soon
as the node that received the message has successfully written the data, and
<em>then</em> send this message to the replicas (that may or may not be alive). It works
somewhat like this:</p>
<p><img src="/assets/images/replication/async.png" /></p>
<p>This looks great, we don’t notice any performance impact as the replication
happens in the background, after we already got a response, and if the
replica is dead or slow we won’t even notice it, as the data was already sent
back to the client. Life is good.</p>
<p>There are (at least) two main issues with asynchronous replication. The first is
that we are weakening our durability guarantees, and the other is that we are
exposed to replication lags. We will talk about replication lag later, let’s
focus on the durability issue first.</p>
<p>Our problem here is that if the node that received this write request fails
before it can replicate this change to the replicas, the data is lost, even
though we sent a confirmation to the client.</p>
<p><img src="/assets/images/replication/async_failure.png" /></p>
<p>You may be asking yourself</p>
<blockquote>
<p>“But what are the chances of a failure happening right at THAT moment?!”</p>
</blockquote>
<p>If that’s the case, I’ll suggest that you instead ask</p>
<blockquote>
<p>“What are the <em>consequences</em> if a failure happens at that moment?”</p>
</blockquote>
<p>Yes, it may be totally fine to take the risk, but in the classic example of
dealing with financial transactions, maybe it’s better to pay the price to have
stronger guarantees. But what is the price?</p>
<h4 id="synchronous-replication">Synchronous replication</h4>
<p>As you might expect, synchronous replication basically means that we will
<em>first</em> replicate the data, and then send a confirmation to the client.</p>
<p><img src="/assets/images/replication/sync.png" /></p>
<p>So when the client gets the confirmation we can be sure that the data is
replicated and safe (well, it’s never 100% safe, all of our data centers can, in
theory, explode at the same time, but it’s safe enough).</p>
<p>The price we need to pay is: Performance and availability.</p>
<p>The performance penalty is due to the fact that we need to <em>wait</em> for these -
potentially - slow replicas to do their thing and send us a confirmation before
we can tell the client that everything is going to be fine. As these replicas
are usually distributed geographically, and potentially very far from each
other, this takes more time than we would like to wait.</p>
<p>The second issue is availability. If one of the replicas (remember, we can have
many!) is down or we cannot reach it for some reason, we simply cannot write
any data. You should always plan for failures, and network partitions are more
common than we imagine, so depending on <em>all</em> replicas being reachable to
perform any write doesn’t seem like a great idea to me (but maybe it is for your
specific case).</p>
<h4 id="not-8-not-80">Not 8, not 80</h4>
<p>There’s some middle ground. Some databases and replication tools allow us to
define a number of followers to replicate synchronously, and the others just use
the asynchronous approach. This is sometimes called <em>semi-synchronous
replication</em>.</p>
<p>As an example, in <code class="language-plaintext highlighter-rouge">Postgres</code> you can define a configuration called
<a href="https://www.postgresql.org/docs/9.6/static/runtime-config-replication.html"><code class="language-plaintext highlighter-rouge">synchronous_standby_names</code></a>
to specify which replicas will receive the updates synchronously, and the other
replicas will just receive them asynchronously.</p>
<h4 id="single-leader-replication">Single leader replication</h4>
<p>The most common replication topology is to have a single leader, that then
replicate the changes to all the followers.</p>
<p>In this setup, the clients always send writes (in the case of databases,
<code class="language-plaintext highlighter-rouge">INSERT</code>, <code class="language-plaintext highlighter-rouge">UPDATE</code> and <code class="language-plaintext highlighter-rouge">DELETE</code> queries) to the leader, and never to a follower.
These followers can, however, answer read queries.</p>
<p><img src="/assets/images/replication/single-leader.png" /></p>
<p>The main benefit of having a single leader is that we avoid conflicts caused by
concurrent writes. All the clients are writing to the same server, so the
coordination is easier. If we instead allow clients to write to 2 different
servers at the same time, we need to somehow resolve the conflict that will
happen if they both try to change the same <em>object</em>, with different values (more
on that later).</p>
<p>So, what are the problems that we need to keep in mind if we decide to go with
the single leader approach? The first one is that we need to make sure that just
one node is able to handle all the writes. Although we can split the read work
across the entire cluster, all the writes are going to a single server, and if
your application is very write-intensive that might be a problem. Keep in mind
though, that most applications read a lot more data than they write, so you need
to analyze if that’s really a problem for you.</p>
<p>Another problem is that you will need to pay the latency price on writes.
Remember our colleagues in Asia? Well, when they want to update some data, that
query will still need to travel the globe before they get a response.</p>
<p>Lastly, although this is not really a problem just for single leader
replication, you need to think about what will happen when the leader node dies.
Is the entire system going to stop working? Will it be available just for reads
(from the replicas), but not for writes? Is there a process to <em>elect</em> a new
leader (i.e. promoting one of the replicas to a leader status)? Is this
election process automated or will it need someone to tell the system who is the
new king in town?</p>
<p>At first glance it seems like the best approach is to just have an automatic
failover strategy, that will elect a new leader and everything will keep working
wonderfully. That, unfortunately, is easier said than done.</p>
<h5 id="the-challenges-of-an-automatic-failover">The challenges of an automatic failover</h5>
<p>Let me list <em>some</em> of the challenges in implementing this automatic failover
strategy.</p>
<p>The first question we need to ask is: How can we be sure that the leader is
dead? And the answer is: We probably can’t.</p>
<p>There are a billion things that can go wrong, and, like in any distributed
system, it is impossible to distinguish a slow-to-answer from a dead node.
Databases usually use a timeout to decide that (e.g. if I don’t hear from you in
20 seconds you are dead to me!). That is usually good enough, but certainly not
perfect. If you wait more, it is less likely that you will identify a node as
dead by mistake, but it will also take more time start your failover process,
and in the meantime your system is probably unusable. On the other hand, if you
don’t give it enough time you might start a failover process that was not
necessary. So that is challenge number one.</p>
<p>Challenge number two: You need to decide who is the new leader. You have all
these followers, living in an anarchy, and they need to somehow agree on who
should be the new leader. For example, one relatively simple (at least
conceptually) approach it to have a predefined successor node, that will assume
the leader position when the original leader dies. Or you can choose the node
that has the most recent update (e.g. the one that is closer to the leader), to
minimize data loss. Any way you decide to choose the new leader, all the nodes
still need to <em>agree</em> on that decision, and that’s the hard part. This is known
as a <a href="https://en.wikipedia.org/wiki/Consensus_(computer_science)">consensus
problem</a>, and can be
quite tricky to get right.</p>
<p>Alright, you detected that the leader is really dead and selected a new leader,
now you need to somehow tell the clients to start sending writes to this new
leader, instead of the dead one. This is a <em>request routing</em> problem, and we can
also approach it from several different angles. For example, you can allow
clients to send writes to any node, and have these nodes redirect this request
to the leader. Or you can have a <em>routing layer</em> that receives this messages and
redirect them to the appropriate node.</p>
<p>If you are using asynchronous replication, the new leader might not have all the
data from the previous leader. In that case, if the old leader resurrects (maybe
it was just a network glitch or a server restart) and the new leader received
conflicting updates in the meantime, how do we handle these conflicts?<br />
One common approach is to just discard these conflicts (using a last-write-win
approach), but that can also be dangerous (take this <a href="https://github.com/blog/1261-github-availability-this-week">Github
issue</a> (from 2012)
as an example).</p>
<p>We can also have a funny (well, maybe it’s not that funny when it happens in
production) situation where the previous leader comes back up and thinks it is
still the leader. That is called a <em>split brain</em>, and can lead to a weird
situation.</p>
<p>If both leaders starts accepting writes and we are not ready to handle conflicts
it is possible to lose data.</p>
<p>Some systems have fencing mechanisms that will force one node to shut down if it
detects that there are multiple leaders. This approach is known by the great
name <code class="language-plaintext highlighter-rouge">STONITH</code>, Shoot The Other Node In The Head.</p>
<blockquote>
<p>This is also what happens when there’s a network partition and we end up with
what appears to be two isolated clusters, each one with its own leader, as each
part of this cluster cannot see the other, and therefore thinks they are all
dead.</p>
</blockquote>
<p><img src="/assets/images/replication/split-brain.png" /></p>
<p>As you can see, automatic failovers are not simple. There are a lot of things to
take into consideration, and for that reason sometimes it’s better to have a
human manually perform this procedure. Of course, if your leader database dies
at 7pm and there’s no one on-call, it might not be the best solution to wait
until tomorrow morning, so, as always, trade-offs.</p>
<h4 id="multi-leader-replication">Multi leader replication</h4>
<p>So, we talked a lot about single leader replication, now let’s discuss an
alternative, and also explore its own challenges and try to identify scenarios
where it might make sense to use it.</p>
<p>The main reason to consider a multi leader approach is that is solves some of
the problems that we face when we have just one leader node. Namely, we have
more than one node handling writes, and these writes can be performed by databases
that are closer to the clients.</p>
<p><img src="/assets/images/replication/multi-leader.png" /></p>
<p>If your application needs to handle a very high number of writes, it might make
sense to split that work across multiple leaders. Also, if the latency price
to write in a database that is very far is too high, you could have one leader
in each location (for example, one in North America, one in Europe and another
in Asia).</p>
<p>Another good use case is when you need to support offline clients, that might be
writing to their own (leader) database, and these writes need to be synchronized
with the rest of the databases once this client gets online again.</p>
<p>The main problem you will face with multiple leaders accepting writes is
that you need some way to solve conflicts. For example, let’s say you have a
database constraint to ensure that your users’ emails are unique. If two
clients write to two different leaders that are not yet in sync, both writes
will succeed in their respective leaders, but we will have problems when we try
to replicate that data. Let’s talk a bit more about these conflicts.</p>
<blockquote>
<p>Here we are assuming that these leaders replicate data asynchronously, that’s
why we can have conflicts. You could, in theory, have multi leader synchronous
replication, but that doesn’t really make a lot of sense, as you lose the main
benefit of having leaders accepting writes independently, and might just use
single leader replication instead. There are some projects, though, like
<a href="https://wiki.postgresql.org/wiki/PgCluster">PgCluster</a>, that implement multi
master synchronous replication, but they are mostly abandoned, and I will not
talk about this type of replication here.</p>
</blockquote>
<h5 id="dealing-with-conflicts">Dealing with conflicts</h5>
<p>The easiest way to handle conflicts is to not have conflicts in the first place.
Not everyone is lucky enough to be able to do that, but let’s see how that could
be achieved.</p>
<p>Let’s use as an example an application to manage the projects in your company.
You can ensure that all the updates in the projects related to the American
office are sent to the leader in North America, and all the European projects
are written to the leader in Europe. This way you can avoid conflicts, as the
writes to the same projects will be sent to the same leader. Also, if we assume
that the clients updating these projects will probably be in their respective
offices (e.g. people in the New York office will update the American projects,
that will be sent to the leader in North America), we can ensure that they are
accessing a database geographically close to them.</p>
<p>Of course, this is a very biased example, and not every application can
“partition” its data in such an easy way, but it’s something to keep in mind.
If that’s not your case, we need another way to make sure we end up in a
consistent state.</p>
<p>We cannot let each node just apply the writes in the order that they see them,
because a node <code class="language-plaintext highlighter-rouge">A</code> may first receive an update setting <code class="language-plaintext highlighter-rouge">foo=1</code> and then another
update setting <code class="language-plaintext highlighter-rouge">foo=2</code>, while node <code class="language-plaintext highlighter-rouge">B</code> receive these updates in the opposite
order (remember, these messages are going through the network and can arrive out
of order), and if we just blindly apply them we would end up with <code class="language-plaintext highlighter-rouge">foo=2</code> on
node <code class="language-plaintext highlighter-rouge">A</code> and <code class="language-plaintext highlighter-rouge">foo=1</code> on node <code class="language-plaintext highlighter-rouge">B</code>. Not good.</p>
<p><img src="/assets/images/replication/multi-leader-conflict.png" /></p>
<p>One common solution is to attach some sort of timestamp to each write, and then
just apply the write with the highest value. This is called LWW (last write
wins). As we discussed previously, with this approach we may lose data, but
that’s still very widely used.</p>
<blockquote>
<p>Just be aware that physical clocks <a href="http://books.cs.luc.edu/distributedsystems/clocks.html">are not
reliable</a>, and when
using timestamps you will probably need at least some sort to clock
synchronization, like
<a href="https://en.wikipedia.org/wiki/Network_Time_Protocol">NTP</a>.</p>
</blockquote>
<p>Another solution is to record these conflicts, and then write application code
to allow the user to manually resolve them later. This may not be feasible in some
cases, like in our previous example with the unique constraint for the
email column. In other cases, though, it may be just a matter of showing two
values and letting the user decide which one should be kept and which should be
thrown away.</p>
<p>Lastly, some databases and replication tools allow us to write custom conflict
resolution code. This code can be executed on write or on read time.
For instance, when a conflict is detected a stored procedure can be called with
the conflicting values and it decides what to do with them. This is a <em>on write</em>
conflict resolution. <a href="https://bucardo.org/wiki/Bucardo">Bucardo</a> and
<a href="http://bdr-project.org/docs/stable/conflicts.html">BDR</a> are example of tools
that use this approach.</p>
<p>Other tools use a different approach, storing all the conflicting writes, and
also returning all of them when a client tries to read that value. The client is
then responsible for deciding what to do with those values, and write it back to
the database. <a href="http://couchdb.apache.org/">CouchDB</a>, for example, does that.</p>
<p>There is also a relatively new family of data structures that provide automatic
conflict resolution. They are called <em>Conflict-free replicated data type</em>, or
<em>CRDT</em>, and to steal the
<a href="https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type">wikipedia</a>
definition:</p>
<blockquote>
<p>CRDT is a data structure which can be replicated across multiple computers in
a network, where the replicas can be updated independently and concurrently
without coordination between the replicas, and where it is always mathematically
possible to resolve inconsistencies which might result.</p>
</blockquote>
<p>Unfortunately there are some limitations to where these data structured can
be used (otherwise our lives would be too easy, right?), and as far as I know
they are still not very widely used for conflict resolution in databases,
although some CRDTs were implemented in <a href="https://gist.github.com/russelldb/f92f44bdfb619e089a4d">Riak</a>.</p>
<h5 id="ddl-replication">DDL replication</h5>
<p>Handing DDLs (changes in the structure of the database, like adding/removing a
column) can also be tricky in a multi leader scenario. It’s, in some sense, also
a conflict issue, we cannot change the database structure while other nodes are
still writing to the old structure, so we usually need to get a global database
lock, wait until all the pending replications take place, and then execute this
DDL. In the meantime, all the writes will either be blocked or fail. Of course,
the specific details will depend on the database or replication tool used, and
some of them will <a href="https://bucardo.org/wiki/Bucardo/FAQ#Can_Bucardo_replicate_DDL.3F">not even
try</a> to
replicate DDLs, so you need to somehow do that manually, and other tools will
replicate <em>some</em> types of DDLs, but not other (for instance, DDLs that need to
rewrite the entire table are forbidden in
<a href="http://bdr-project.org/docs/1.0/ddl-replication-statements.html#DDL-REPLICATION-PROHIBITED-COMMANDS">BDR</a>).</p>
<p>The point is, there is a lot more coordination involved in replicating DDLs when
you have multiple leaders, so that’s also something to keep in mind when
considering this setup.</p>
<h5 id="the-topologies-of-a-multi-leader-setup">The topologies of a multi leader setup</h5>
<p>There are several different kinds of topologies that we can use with multiple
leaders. A topology defines the communication patterns between your nodes,
and different ways to arrange your communication paths have different
characteristics.</p>
<p>If you have only two leaders, there are not a lot of options: Node <code class="language-plaintext highlighter-rouge">A</code> sends
updates to node <code class="language-plaintext highlighter-rouge">B</code>, and node <code class="language-plaintext highlighter-rouge">B</code> sends updates to node <code class="language-plaintext highlighter-rouge">A</code>. Things start to get
more interesting when you have three or more leaders.</p>
<p>A common topology is to have each leader sending its updates to every other leader.</p>
<p><img src="/assets/images/replication/all-to-all.png" /></p>
<p>The main problem here is that messages can arrive out of order. For example, if
a node <code class="language-plaintext highlighter-rouge">A</code> inserts a row, and then node <code class="language-plaintext highlighter-rouge">B</code> updates this row, but node <code class="language-plaintext highlighter-rouge">C</code>
receives the update before the insert, we will have problems.</p>
<p>This is a <em>causality problem</em>, we need to make sure that all the nodes first
process the insert event before processing the update event. There are different
ways to solve this problem (for instance, using <a href="https://en.wikipedia.org/wiki/Logical_clock">logical
clocks</a>), but the point is: You
need to make sure your database or replication tool is actually handling this
issue, or, in case it’s not, be aware that this is a failure that can happen.</p>
<p>Another alternative is to use what some databases call the <em>star topology</em>.</p>
<p><img src="/assets/images/replication/star.png" /></p>
<p>In this case one node receives the updates and sends them to everyone else. With
this topology we can avoid the causality problem but, on the other hand,
introduce a single point of failure. If this central node dies, the replication
will stop. It’s high price to pay, in some cases.</p>
<p>Of course, these are just 2 examples, but the imagination is the limit for all
the different topologies you can have, and there’s no perfect answer, each one
will have their pros and cons.</p>
<h4 id="and-yes-leaderless-replication">And yes, leaderless replication</h4>
<p>Another idea that was popularized by Amazon’s
<a href="https://aws.amazon.com/dynamodb/">DynamoDB</a> (although it first appeared some
decades ago) is to simply have no leaders, every replica can accept writes
(maybe it should be called leaderful?).</p>
<p>It seems like this is going to be a mess, doesn’t it? If we had lots of
conflicts to handle with a few leaders, imagine what will happen when writes are
taking place everywhere. Chaos!</p>
<p>Well, it turns out these database folks are quite smart, and there are some
clever ways to deal with this chaos.</p>
<p>The basic idea is that clients will send writes not only to one replica, but to
several (or, in some cases, to all of them).</p>
<p><img src="/assets/images/replication/leaderless.png" /></p>
<p>The client sends this write request concurrently to several replicas, and as
soon as it gets a confirmation from some of them (we will talk about how many
are “some” in a bit) it can consider that write a success and move on.</p>
<p>One advantage we have here is that we can tolerate node failures more easily.
Think about what would happen in a scenario where we had to send a write to a
single leader and for some reason that leader didn’t respond. The write would
fail and we would need to start a failover process to elect a new leader that
could start receiving writes again. No leaders, no failover, and if you remember
what we’ve talked about failovers, you can probably see why this can be a big
deal.</p>
<p>But, again, there is no free lunch, so let’s take a look at the price tag here.</p>
<p>What happens if, say, your write succeeds in 2 replicas, but fails in 1 (maybe
that server was being rebooted when you sent the write request)?<br />
You now have 2 replicas with the new value and 1 with the old value. Remember,
these replicas are not talking to each other, there’s no leader handling any
kind of synchronization.</p>
<p><img src="/assets/images/replication/leaderless-stale.png" /></p>
<p>Now if you read from this replica, BOOM, you get stale data.</p>
<p>To deal with this problem, a client will not read data from one replica,
but also send requests to several replicas concurrently (like it did for
writes). The replicas then return their values, and also some kind of version
number, that the clients can use to decide which value it should use, and which
it should discard.</p>
<p>We still have a problem, though. One of the replicas still has the old value,
and we need to somehow synchronize it with the rest of the replicas (after all,
replication is the process of keeping the <em>same</em> data in several places).</p>
<p>There are usually two ways to do that: We can make the client responsible for
this update, or we can have another process that is responsible just for finding
differences in the data and fixing them.</p>
<p>Making the client fix it is conceptually simple, when the client reads data from
several nodes and detects that one of them is stale, it sends a write request
with the correct value. This is usually called <em>read repair</em>.</p>
<p>The other solution, having a background process fixing the data, really depends
on the database implementation, and there are several ways to do that, depending
on how the data is stored. For example, <code class="language-plaintext highlighter-rouge">DynamoDB</code> has an <em>anti-entropy</em> process
using Merkle trees.</p>
<h5 id="quorums">Quorums</h5>
<p>So we said we need to send the write/read requests to “some” replicas. There are
good ways to define how many are enough, and what we are compromising (and
gaining) if we decide to decrease this number.</p>
<p>Let’s first talk about the most obvious problematic scenario, when we require
just one successful response to consider a value written, and also read from
just one replica. From there we can expand the problem to more realistic
scenarios.</p>
<p><img src="/assets/images/replication/leaderless-write-one-replica.png" /></p>
<p>As there is no synchronization between these replicas, we will read stale values
every time we send a read request to a node other than the only one that
succeeded.</p>
<p>Now let’s imagine we have 5 nodes and require a successful write in 2 of them,
and also read from 2. Well, we will have the exact same problem. If we write to
nodes <code class="language-plaintext highlighter-rouge">A</code> and <code class="language-plaintext highlighter-rouge">B</code> and read from nodes <code class="language-plaintext highlighter-rouge">C</code> and <code class="language-plaintext highlighter-rouge">D</code>, we will always get stale
data.</p>
<p>What we need is some way to guarantee that at least one of the nodes that we are
reading from is a node that received the write, and that’s what quorums are.</p>
<p>For example, if we have 5 replicas and require that 3 of them accept the write,
and also read from 3 replicas, we can be sure that <em>at least</em> one of these
replicas that we are reading from accepted the write and therefore has the most
recent data. There’s always an overlap.</p>
<p>Most databases allow us to configure how many replicas need to accept a write (<code class="language-plaintext highlighter-rouge">w</code>)
and how many we want to read from (<code class="language-plaintext highlighter-rouge">r</code>). A good rule of thumb is to always have
<code class="language-plaintext highlighter-rouge">w + r > number of replicas</code>.</p>
<p>Now you can start playing with these numbers. For example, if your application
writes to the database very rarely, but reads very frequently, maybe you can set
<code class="language-plaintext highlighter-rouge">w = number of replicas</code> and <code class="language-plaintext highlighter-rouge">n = 1</code>. What that means is that writes need to be
confirmed by every replica, but you can then read from just one of them as you
are sure every replica has the latest value. Of course, you are then making your
writes slower and less available, as just a single replica failure will prevent
any write from happening, so you need to measure your specific needs and what is
the right balance.</p>
<h4 id="replication-lag">Replication lag</h4>
<p>In a leader-based replication, as we have seen, writes need to be sent to a
leader, but reads can be performed by any replica. When we have applications
that are mostly reading from the database, and writing a lot less often (which
is the most common case), it can be tempting to add many replicas to handle
all these read requests, creating what can be called a <em>read scaling
architecture</em>. Not only that, but we can have many replicas geographically close
to our clients to also improve latency.</p>
<p>The more replicas we have, though, the harder it is to use synchronous
replication, as the probability of one these many nodes being down when we need
to replicate an update increases, and our availability decreases. The only
feasible solution in this case is to use asynchronous replication, that is, we
can still perform updates even if a node is not responding, and when this
replica is back up it should catch up with the leader.</p>
<p>We’ve already discussed the benefits and challenges in using synchronous and
asynchronous replication, so I’ll not talk about that again, but assuming we are
replicating updates asynchronously, we need to be aware of the problems we can
have with the replication lag, or, in other words, the delay between the time an
update is applied in the leader node and the time it’s applied in a given replica.</p>
<p><img src="/assets/images/replication/replication-lag.png" /></p>
<p>If a client reads from this replica during this period, it will receive outdated
information, because the latest update(s) were not applied yet. In other words,
if you send the same query to 2 different server, you may get 2 different
answers. As you may remember when we talked about the CAP theorem, this breaks
the <em>consistency</em> guarantee. This is just temporary, though, eventually all the
nodes replicas will get this update, and if you stop writing new data, they will
all end up being identical. This is what we call <em>eventual consistency</em>.</p>
<p>In theory there is no limit for how long it will take to a replica to be
consistent with its leader (the only guarantee we have is that <em>eventually</em> it
will be), but in practice we usually expect this to happen fairly quickly, maybe
in a couple of milliseconds.</p>
<p>Unfortunately, we cannot expect that to always be the case, and we need to plan
for the worst. Maybe the network is slow, or the server is operating near
capacity and is not replicating the updates as fast as we’d except, and this
replication lag can increase. Maybe it will increase a couple of seconds, maybe
minutes. What happens then?</p>
<p>Well, the first step is to understand the guarantees we need to provide. For
example, is it really a problem that, when facing an issue that increases the
lag, it will take 30 seconds for your friend to be able to see that last cat
picture you just posted on Facebook? Probably not.</p>
<p>In a lot of cases this replication lag, and eventual consistency in general,
will not be a problem (after all, the physical world is eventually consistent),
so let’s focus on some cases where this <em>can</em> be an issue, and see some
alternatives to handle them.</p>
<h5 id="read-your-writes-consistency">Read-your-writes consistency</h5>
<p>The most common problem we can have with asynchronous replicas is when a client
sends a write to the leader, and shortly after tries to read that same value
from a replica. If this read happens before the leader had enough time to
replicate the update, it will look like the write didn’t actually work.</p>
<p><img src="/assets/images/replication/read-your-writes.png" /></p>
<p>So, although it might not be a big issue if a client doesn’t see other clients’
updates right away, it’s pretty bad if they don’t see their own writes. This is
what is called <em>read-your-writes consistency</em>, we want to make sure that a
client never reads the database in a state it was before it performed a write.</p>
<p>Let’s talk about some techniques that can be used to achieve this type of
consistency.</p>
<p>A simple solution is to actually read from the leader when we are trying to read
something that the user might have changed. For example, if we are implementing
something like Twitter, we can read other people’s timeline from a replica (as
the user will not be able to write/change it), but when viewing their own
timeline, read from the leader, to ensure we don’t miss any update.</p>
<p>Now, if there are lots of things that can be changed by every user, that doesn’t
really work very well, as we would end up sending all the reads to the leader,
defeating the whole purpose of having replicas, so in this case we need a
different strategy.</p>
<p>Another technique that can be used is to track the timestamp of the last write
request and for the next, say, 10 seconds, send all the read requests to the
leader. Then you need to find the right balance here, because if there is a new
write every 9 seconds you will also end up sending all of your reads to the
leader. Also, you will probably want to monitor the replication lag to make sure
replicas that fall more than 10 seconds behind stop receiving requests until
they catch up.</p>
<p>Then there are also more sophisticated ways to handle this, that requires more
collaboration of your database. For example, <a href="http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index.html">Berkeley DB</a>
will generate a <em>commit token</em> when you write something to the leader. The
client can then send this token to the replica it’s trying to read from, and
with this token the replica knows if it’s current enough (i.e. if it has already
applied that commit). If so, it can just serve that read without any problem,
otherwise it can either block until it receives that update, and then answer the
request, or it can just reject it, and the client can try another replica.</p>
<p>As always, there are no right answers here, and I am sure there lots of other
techniques that can be used to work around this problem, but you need to know
how your system will behave when facing a large replication lag and if
<em>read-your-writes</em> is a guarantee you really need to provide, as there are
databases and replication tools that will simply ignore this issue.</p>
<h5 id="monotonic-reads-consistency">Monotonic Reads consistency</h5>
<p>This is a fancy name to say that we don’t want clients to see time moving
backwards: If I read from a replica that has already applied commits 1, 2 and 3,
I don’t want my next read to go to a replica that only has commits 1 and 2.</p>
<p>Imagine, for example, that I’m reading the comments of a blog post. When I
refresh the page to check if there’s any new comment, what actually happens is
that the last comment disappears, as if it was deleted. Then I refresh again,
and it’s back there. Very confusing.</p>
<p><img src="/assets/images/replication/monotonic-reads.png" /></p>
<p>Although you can still see stale data, what monotonic read guarantees is that if
you make several reads to a given value, all the successive reads will be at
least as recent as the previous one. Time never moves backwards.</p>
<p>The simplest way to achieve monotonic reads is to make each client send their
read requests to the same replica. Different clients can still read from
different replicas, but having a given client always (or at least for the
duration of a session) connected to the same replica will ensure it never reads
data from the past.</p>
<p>Another alternative is to have something similar to the commit token that we
talked about in the <em>read-your-writes</em> discussion. Every time that a client
reads from a replica it receives its latest commit token, that is then sent in
the next read, that can go to another replica. This replica can then check this
commit token to know if it’s eligible to answer that query (i.e. if its own
commit token is “greater” than the one received). If that’s not the case, it can
wait until more data is replicated before responding, or it can return an error.</p>
<h5 id="bounded-staleness-consistency">Bounded Staleness consistency</h5>
<p>This consistency guarantee means, as the name indicates, that there should be a
limit on how stale the data we are reading is. For example, we may want to
guarantee that clients will not read data that is more than 3 minutes old.
Alternatively, this staleness can be defined in terms of number of missing
updates, or anything that is meaningful the application.</p>
<h4 id="delayed-replicas">Delayed replicas</h4>
<p>We talked about replication lags, some of the problems that we can have when
this lag increases too much, and how to deal with these problems, but sometimes
we may actually <em>want</em> this lag. In other words, we want a <em>delayed replica</em>.</p>
<p>We will not really read (or write) from this replica, it will just sit there,
lagging behind the leader, maybe by a couple of hours, while no one is using it.
So, why would anyone want that?</p>
<p>Well, imagine that you release a new version of your application, and a bug
introduced in this release starts deleting all the records from your <code class="language-plaintext highlighter-rouge">orders</code>
table. You notice the issue and rollback this release, but the deleted data is
gone. Your replicas are not very useful at this point, as all these deletions
were already replicated and you have the same messy database replicated. You
could start to restore a backup, but if you have a big database you probably
won’t have a backup running every couple of minutes, and the process to restore
a database can take a lot of time.</p>
<p>That’s were a delayed replica can save the day. Let’s say you have a replica
that is always 1 hour behind the leader. As long as you noticed the issue in
less than 1 hour (as you probably will when your orders evaporate) you can just
start using this replica and, although you will still probably lose some data,
the damage could be a lot worse.</p>
<p>A replica will almost never replace a proper backup, but in some cases having a
delayed replica can be extremely helpful (as the developer that shipped that bug
can confirm).</p>
<h4 id="replication-under-the-hood">Replication under the hood</h4>
<p>We talked about several different replication setups, consistency guarantees,
benefits and disadvantages of each approach. Now let’s go one level below, and
see how one node can actually send its data to another, after all, replication
is all about copying bytes from one place to another, right?</p>
<h5 id="statement-based-replication">Statement-based replication</h5>
<p>Statement-based replication basically means that one node will send the same
statements it received to its replicas. For example, if you send an <code class="language-plaintext highlighter-rouge">UPDATE foo
= bar</code> statement to the leader, it will execute this update and send the same
instruction to its replicas, that will also execute the update, hopefully
getting to the same result.</p>
<p>Although this is a very simple solution, there are some things to be considered
here. The main problem is that not every statement is deterministic, meaning
that each time you execute them, you can get a different result. Think about
functions like <code class="language-plaintext highlighter-rouge">CURRENT_TIME()</code> or <code class="language-plaintext highlighter-rouge">RANDOM()</code>, if you simply execute these
functions twice in a row, you will get different results, so just letting each
replica re-execute them would lead to inconsistent data.</p>
<p>Most databases and replication tools that use statement-based replication (e.g.
<code class="language-plaintext highlighter-rouge">MySQL</code> before 5.1) will try to replace these nondeterministic function calls
with fixed values to avoid these problems, but it’s hard to account for every
case. For example, a user-defined function can be used, or a trigger can be
called after an update, and it’s hard to guarantee determinism in these cases.
<a href="https://www.voltdb.com/"><code class="language-plaintext highlighter-rouge">VoltDB</code></a>, for instance, uses logical replication but
<a href="https://docs.voltdb.com/UsingVoltDB/DesignProc.php#DesignProcDeterminism">requires stored procedures to be
deterministic</a>.</p>
<p>Another important requirement is that we need to make sure that all transactions
either commit or abort on every replica, so we don’t have a change being applied
in some replicas and not in others.</p>
<h5 id="log-shipping-replication">Log Shipping replication</h5>
<p>Most databases use a <a href="https://en.wikipedia.org/wiki/Write-ahead_logging">log</a>
(an append-only data structure) to provide durability and atomicity (from the
<em>ACID</em> properties). Every change is first written to this log before being
applied, so the database can recover in case of crashes during a write
operation.</p>
<p>The log describes changes to the database at a very low level, describing, for
example, which bytes were changed and where exactly in the disk. It’s not meant
to be read by humans, but machines can interpret them pretty efficiently.</p>
<p>The idea of log shipping replication is to transfer these log files to the
replicas, that can then apply them to get the exact same result.</p>
<p>The main limitation that we have when shipping these logs is that, as it
describes the changes at such a low level, we probably won’t be able to replicate
a log generated by a different version of the database, for example, as the way
the data is physically stored may have changed.</p>
<p>Another issue is that we cannot use multi-master replication with log shipping,
as there’s no way to unify multiple logs into one, and if data is changing in
multiple locations at the same time, that would be necessary.</p>
<p>This technique is used by <code class="language-plaintext highlighter-rouge">Postgres</code>’ <a href="https://www.postgresql.org/docs/current/static/warm-standby.html">streaming
replication</a>
and also to provide <a href="https://www.postgresql.org/docs/9.6/static/continuous-archiving.html">incremental backups and Point-in-Time
Recovery</a>.</p>
<p>This is also known as <em>physical replication</em>.</p>
<h5 id="row-based-replication">Row-based replication</h5>
<p>Row-based, or <em>logical</em> replication, is kind of a mix of these two techniques.
Instead of shipping the internal log (<code class="language-plaintext highlighter-rouge">WAL</code>), it uses a different log just for
replication. This log can then be decoupled from the storage engine and
therefore can be used, in most cases, to replica data across different database
versions.</p>
<p>This row-based log will include enough information to uniquely identify a row,
and a set of changes that need to be performed.</p>
<p>A benefit of using a row-based approach is that we can, for example, upgrade the
database version with zero downtime. We can take one node down to upgrade it,
and in the meantime the other replicas handle all the requests, and after it’s
back up, with the new version, we do the same thing for the other nodes.</p>
<p>The main disadvantage here when compared to a statement-based replication is
that sometimes we need to log a lot more data. For example, if we want to
replicate <code class="language-plaintext highlighter-rouge">UPDATE foo = bar</code>, and this update changes 100 rows, with the
statement-based replication we would log just this simple <code class="language-plaintext highlighter-rouge">SQL</code>, while we would
need to log all the 100 rows when using the row-based technique. In the same
way, if you use an user-defined function that generates a lot of data, all that
data needs to be logged, instead of just the function call.</p>
<p><code class="language-plaintext highlighter-rouge">MySQL</code> for example, allows us to define a <a href="https://dev.mysql.com/doc/refman/8.0/en/binary-log-mixed.html"><code class="language-plaintext highlighter-rouge">MIXED</code> logging
format</a>, that
will switch between statement and row-base replication, trying to use the best
technique for each case.</p>
<h4 id="diving-deeper">Diving deeper</h4>
<p>I hope this introduction to the different ideas and concepts behind database
replication made you curious to learn more. If that’s the case, here’s the list
of resources that I used (and am still using) in my own studies and can
recommend:</p>
<ul>
<li>
<p><strong><a href="https://amzn.to/3loKhuU">(Book) Designing Data-Intensive Applications</a></strong><br />
This is one of the best books I’ve ever read. It covers lots of different topics
related to distributed systems, and in the 5th Chapter the author focuses on
replication. Although it’s not specific for databases, most of the topics
covered are applicable.</p>
</li>
<li>
<p><strong><a href="http://book.mixu.net/distsys/single-page.html">(Book) Distributed Systems For Fun and Profit</a></strong><br />
This is another book that is not really focused only on databases, but in
distributed systems in general, but chapters 4 and 5 are focused on replication.
It explains in a straightforward (and fun) way a lot of important topics that
were not covered in depth here.</p>
</li>
<li>
<p><strong><a href="https://amzn.to/3IdG07f">(Book) PostgreSQL Replication</a></strong><br />
This book is, of course, focused on <code class="language-plaintext highlighter-rouge">PostgreSQL</code>, but it also presents some
concepts that are applicable to other RDBMS. Also, it’s a good way to see how
these ideas can be applied in practice. It explains how to setup
synchronous/asynchronous replication, WAL shipping, and use tools that use a
variety of techniques (e.g. <code class="language-plaintext highlighter-rouge">Bucardo</code>, using triggers and <code class="language-plaintext highlighter-rouge">BDR</code>, using row-based
replication that we discussed here).</p>
</li>
<li>
<p><strong><a href="https://infoscience.epfl.ch/record/52326/files/IC_TECH_REPORT_199935.pdf">(Paper) Understanding Replication in Databases and Distributed
Systems</a></strong><br />
This paper compares the replication techniques discussed in the distributed
systems and databases literatures. It first describes an abstract model and then
examines how that applies to synchronous/asynchronous replication (that is
called <em>lazy</em>/<em>eager</em> replication in the paper), and in a
single-leader/leaderless setup (that is calls <em>active/passive</em> for distributed
systems and <em>primary-copy/update- everywhere</em> for databases).</p>
</li>
<li>
<p><strong><a href="http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf">(Paper) Dynamo: Amazon’s Highly Available Key-value Store</a></strong><br />
Dynamo is the key value data storage created by Amazon that popularized the idea
of leaderless databases. This is the paper that explains (at least
superficially) how it works. It’s interesting to see how they handle conflicts
(letting the clients resolve them) and also use some techniques not explored in
this post, like sloppy quorums (as opposed to strict quorums) and hinted
handoff to achieve more availability. The paper is 10 years old already and I’m
sure a few things changed, but it’s an interesting read anyway.</p>
</li>
<li>
<p><strong><a href="https://arxiv.org/pdf/1509.05393.pdf">(Paper) A Critique of the CAP Theorem</a></strong><br />
This paper explains why the CAP theorem is not always the best tool to use when
reasoning about distributed systems. The author explains the problems he sees
with the CAP definitions (or lack of definitions, in some cases). For example,
<em>consistency</em> in CAP means a very specific kind of consistency
(<em>linearizability</em>), but there’s a whole spectrum of different consistency
guarantees that are ignored.</p>
</li>
<li>
<p><strong><a href="https://www.microsoft.com/en-us/research/publication/replicated-data-consistency-explained-through-baseball/">(Paper) Replicated Data Consistency Explained Through Baseball</a></strong><br />
This paper describes six consistency guarantees and tries to define how each
participant (scorekeeper, umpire, etc.) in a baseball game would require a
different guarantee. What I think is interesting in this paper is that it
shows that every consistency model can be useful depending on the situation,
sometimes it’s fine to live with eventual consistency, and sometimes you need
stronger guarantees (e.g. read-your-writes, described above).</p>
</li>
<li>
<p><strong><a href="https://www.voltdb.com/wp-content/uploads/2017/03/lv-technical-note-how-voltdb-does-transactions.pdf">(Paper) How VoltDB does Transactions</a></strong><br />
There is a section in this paper that explains specifically how <code class="language-plaintext highlighter-rouge">VoltDB</code> handles
replication. It’s interesting to see a relatively new database using a kind of
statement-based replication, and how they enforce determinism to avoid data
inconsistencies.</p>
</li>
<li>
<p><strong><a href="http://domino.research.ibm.com/library/cyberdig.nsf/papers/A776EC17FC2FCE73852579F100578964/$File/RJ2571.pdf">(Research report) Notes on Distributed Databases</a></strong><br />
This research report from 79 (!) covers a lot of different topics, but the first
chapter is dedicated to replication. It’s amazing to see how little the problems
that we face have changed, as the network constraints are still the same.</p>
</li>
<li>
<p><strong><a href="http://www.allthingsdistributed.com/2008/12/eventually_consistent.html">(Article) Eventually Consistent</a></strong><br />
This article, by Amazon’s CTO Werner Vogels, is a very good introduction to
what it means to be eventually consistency. He talks about the trade-offs that
need to be made in order to achieve high availability in large scale systems
(like the ones Amazon operates).</p>
</li>
<li>
<p><strong><a href="http://books.cs.luc.edu/distributedsystems/clocks.html">(Article) Clocks and Synchronization</a></strong><br />
This article explains why physical clocks are not reliable.The summary is: When
building a distributed system, do not take it for granted that you can simply
trust the clock on the machine that is executing your code.</p>
</li>
<li>
<p><strong><a href="https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed">(Article) CAP Twelve Years Later: How the “Rules” Have Changed</a></strong><br />
This is an article written by Eric Brewer, the guy that first presented the CAP
theorem in 2000, so it’s interesting to see what the author has to say 12 years
later. Although the theorem can be useful to make us think about the trade-offs
in a particular design decision, it can also be misleading in some cases.</p>
</li>
<li>
<p><strong><a href="https://docs.oracle.com/cd/E17276_01/html/gsg_db_rep/C/rywc.html">(Documentation) Berkeley DB Read-Your-Writes Consistency</a></strong><br />
This section of the documentation explains how <code class="language-plaintext highlighter-rouge">Berkeley</code> achieves
Read-Your-Writes consistency, using the commit token technique that we discussed
here.</p>
</li>
<li>
<p><strong><a href="https://docs.voltdb.com/UsingVoltDB/ChapReplication.php">(Documentation) VoltDB Database Replication</a></strong><br />
This section of the documentation is dedicated to explain how VoltDB handles
replication. It explains the two options available: single and multi-master
(that they call one-way and two-way replication, respectively).</p>
</li>
</ul>
<p><a class="gumroad-button" href="https://gum.co/replication" target="_blank">Get PDF</a></p>
Exponential Backoff in RabbitMQ2016-05-20T00:00:00+00:00www.brianstorti.com/rabbitmq-exponential-backoff<p><code class="language-plaintext highlighter-rouge">RabbitMQ</code> is a core piece of our event-driven architecture at <a href="http://engineering.alphasights.com">AlphaSights</a>. It makes our services decoupled from each other and extremely easy for a new application to
start consuming the events it needs.</p>
<p>Sometimes, though, things go wrong and consumers can’t process a message. Usually there are two reasons for that: Either we introduced a bug that is making our worker fail or
this worker depends on another service that is not available at the moment.</p>
<p><img src="/assets/images/reject.png" /></p>
<p>There are normally three ways to handle failures in <code class="language-plaintext highlighter-rouge">RabbitMQ</code>: Discarding the message, requeuing it, or sending it to a <a href="https://www.rabbitmq.com/dlx.html">dead-letter exchange</a>.<br />
Assuming all the messages we receive are important, we can’t just discard them, so we have two options left.</p>
<h4 id="the-problem-in-requeuing">The problem in requeuing</h4>
<p>This was our initial approach, every time a message fails, we just requeue it so we can try to process it again.<br />
Although this can be a valid solution for simple scenarios, in our case it caused more problems than it solved.</p>
<p><img src="/assets/images/requeue.png" /></p>
<p>In cases where our worker is broken, just trying to process the same message again won’t help, it will just keep failing over and over again
(and creating a lot of noise in your monitoring tools). The worst problem, though, is when we <code class="language-plaintext highlighter-rouge">DDoS</code> another service. If this service is not available due to period of
high load, sending it thousands of requests is not a very good idea.</p>
<h4 id="the-problem-in-dead-lettering">The problem in dead-lettering</h4>
<p>The second approach was to just send the failed messages to a dead-letter exchange, that would route it to a resting queue that we need to manually handle. After the problem
is identified, we can <code class="language-plaintext highlighter-rouge">shovel</code> the message back to the working queue to be processed, or we can just reject the message if it doesn’t make sense to consume it anymore, that way we never
<code class="language-plaintext highlighter-rouge">DDoS</code> other services.</p>
<p><img src="/assets/images/dlx.png" /></p>
<p>The problem now is that we have a manual step. A lot of times the failure is caused just by an intermittent issue, like a timeout, that would be solved if the message was processed again in a few seconds (or minutes).</p>
<h4 id="enters-the-exponential-backoff-strategy">Enters the exponential backoff strategy</h4>
<p>Given these issues we were facing, having an exponential backoff strategy was the logical solution. We would not <code class="language-plaintext highlighter-rouge">DDoS</code> other services, and for intermittent failures, the messages would be automatically retried,
avoiding the need for a manual intervention when this is not necessary. Implementing this strategy, though, was not as straightforward as we thought.</p>
<p>We started by looking at what <a href="http://dev.venntro.com/2014/07/back-off-and-retry-with-rabbitmq/">other</a> <a href="https://felipeelias.github.io/rabbitmq/2016/02/22/rabbitmq-exponential-backoff.html">people</a> were doing.
The common approach seems to be to use a <code class="language-plaintext highlighter-rouge">retry</code> exchange with a <a href="https://www.rabbitmq.com/ttl.html">per-message ttl</a>. It works somewhat like this:</p>
<p><img src="/assets/images/ttl.png" /></p>
<p>Once you understand how <code class="language-plaintext highlighter-rouge">RabbitMQ</code> handles time to live (<code class="language-plaintext highlighter-rouge">TTL</code>) and dead-letter exchanges, the implementation is straightforward:</p>
<ul>
<li>
<p>We have two exchanges: The working and the retry exchange;</p>
</li>
<li>
<p>The working exchange is defined as the dlx for the retry exchange;</p>
</li>
<li>
<p>Based on how many times the message fails, we calculate the <code class="language-plaintext highlighter-rouge">TTL</code> for this message. For example, the first time the message fails we publish it with a <code class="language-plaintext highlighter-rouge">TTL</code> of 1000ms, if it fails again
we publish a <code class="language-plaintext highlighter-rouge">TTL</code> of 2000ms, and so on;</p>
</li>
<li>
<p>Given that the working exchange is the dlx for the retry exchange, when a message reaches its time to live and is automatically rejected, it goes to the working exchange and is consumed again.</p>
</li>
</ul>
<h4 id="the-problem">The problem</h4>
<p>As you probably guessed, there’s also a problem with this common approach, and it has to do with the way <code class="language-plaintext highlighter-rouge">RabbitMQ</code> handles expired messages. From the documentation:</p>
<blockquote>
<p>While consumers never see expired messages, only when expired messages reach
the head of a queue will they actually be discarded (or dead-lettered). When
setting a per-queue TTL this is not a problem, since expired messages are
always at the head of the queue. When setting per-message TTL however,
expired messages can queue up behind non-expired ones until the latter are
consumed or expired.</p>
</blockquote>
<p>What this means is that a message will only be dead-lettered when it reaches the top of the queue, so if we have one message with a <code class="language-plaintext highlighter-rouge">TTL</code> of 5 minutes and another message with a <code class="language-plaintext highlighter-rouge">TTL</code> of 1 second, the first
message will block the rest of the queue, and the second message will only be dead-letter (and executed again), after the first message expires.</p>
<p><img src="/assets/images/blocked_messages.png" /></p>
<p>The reason is that <code class="language-plaintext highlighter-rouge">RabbitMQ</code> queues are always “first-in first-out”, so the time to live will just tell <code class="language-plaintext highlighter-rouge">RabbitMQ</code> if it should send this message to a consumer or if it can safely reject it. As
the retry queue doesn’t have any consumer, the message just hangs there until it can be safely rejected.</p>
<p>That makes this solution impracticable, as messages with a high <code class="language-plaintext highlighter-rouge">TTL</code> would block messages that should be executed again shortly after they failed. So the quest continues.</p>
<h4 id="and-finally-our-solution">And, finally, our solution</h4>
<p>To solve this problem we came up with a solution that is very similar, but tackles this issue by dynamically creating new queues for each <code class="language-plaintext highlighter-rouge">TTL</code> value we have.</p>
<p><img src="/assets/images/final.png" /></p>
<p>The main difference here is the creation of new queues for each <code class="language-plaintext highlighter-rouge">TTL</code> we have. This solve the problem with blocked messages because now every message that goes to, say, <code class="language-plaintext highlighter-rouge">queue.5000</code>, are message with a <code class="language-plaintext highlighter-rouge">TTL</code> of
5000ms, so the first message in the queue will always be the next message to expire. The rest all continues the same, when a message in any of these queues expires, it will be dead-lettered and consumed again.</p>
<p>To avoid keeping a bunch of empty queues after all messages are consumed, we also define these dynamically created queues with an <code class="language-plaintext highlighter-rouge">x-expires</code> argument, meaning that after the last message is removed from this queue, the queue
itself is also deleted.</p>
<h4 id="show-me-the-code">Show me the code</h4>
<p>If you are disappointed that all you saw until now is a bunch of diagrams, <a href="https://github.com/alphasights/sneakers_handlers/blob/5dd21c27b6643a581ad9fd9da69850c3290872cd/lib/sneakers_handlers/exponential_backoff_handler.rb">here</a>
is the code we are using. It’s a <a href="https://github.com/jondot/sneakers"><code class="language-plaintext highlighter-rouge">Sneakers</code></a> handler that does its magic every time a message fails.<br />
The implementation, though, is just a detail, once you understand the architecture it should be simple to apply this to any other language.</p>
<h4 id="conclusion">Conclusion</h4>
<p>We try to make our <a href="http://martinfowler.com/articles/microservices.html#SmartEndpointsAndDumbPipes">pipes as dumb as possible</a>, and at first we had our doubts if this was adding too much complexity.
In the end, though, it’s all about finding the right balance. This solution does make the messaging system a little bit less dumb, but <em>just enough</em> to make our systems more reliable (and our lives easier),
which is what really matters.</p>
Speaking Rabbit: A look into AMQP's frame structure2016-04-20T00:00:00+00:00www.brianstorti.com/speaking-rabbit-amqps-frame-structure<p><code class="language-plaintext highlighter-rouge">RabbitMQ</code> supports several different messaging protocols, but there is no doubt that <code class="language-plaintext highlighter-rouge">AMQP</code> (0-9-1) is the one most commonly used (and what <code class="language-plaintext highlighter-rouge">RabbitMQ</code>
was originally developed for).<br />
It’s <code class="language-plaintext highlighter-rouge">AMQP</code> that defines how exchanges, queues, binding and most of the things that you, as an application developer, usually have to work with.</p>
<p><code class="language-plaintext highlighter-rouge">AMQP</code> is conceptually divided in two layers, the functional and the transport. Here I want to talk about an important part of the transport layer: Framing.</p>
<p>You will not normally need to deal directly with <code class="language-plaintext highlighter-rouge">RabbitMQ</code>’s frames, unless you are
building a client library, but that’s the foundation of every kind of communication that happens between your application and the broker, so getting a bit more familiar
with how things work under the hood doesn’t hurt. Also, next time you see an <code class="language-plaintext highlighter-rouge">unexpected_frame</code> error in your logs you will have a clue of what is going on.</p>
<h4 id="the-anatomy-of-a-frame">The anatomy of a frame</h4>
<p>A frame is <code class="language-plaintext highlighter-rouge">AMQP</code>’s basic unit. They are the chunks of data that are used to send information from <code class="language-plaintext highlighter-rouge">RabbitMQ</code> to your application and vice-versa. Let’s first take a look
at what a frame looks like and what are all the different types of frames that can be used.</p>
<p>Every frame will have the same basic structure:</p>
<p><img src="/assets/images/frame.png" /></p>
<p>These are the five parts of a frame, the first three being its <code class="language-plaintext highlighter-rouge">header</code>, followed by a payload and an end-byte marker, to determine the end of the frame.</p>
<p>The <code class="language-plaintext highlighter-rouge">header</code> defines the type of frame (one of the 5 listed bellow), the channel this frame belongs to, and its size, in bytes.
The payload varies accordingly with the frame type, so each type of frame will have a different payload format.</p>
<h4 id="the-frame-types">The frame types</h4>
<p>There are 5 types of frames defined in the <code class="language-plaintext highlighter-rouge">AMQP</code> specification, they are:</p>
<ul>
<li>
<p><strong>Protocol header</strong>: This is the frame sent to establish a new connection between the broker (<code class="language-plaintext highlighter-rouge">RabbitMQ</code>) and a client. It will not be used
anymore after the connection.</p>
</li>
<li>
<p><strong>Method frame</strong>: Carries a RPC request or response. <code class="language-plaintext highlighter-rouge">AMQP</code> uses a remote procedure call (RPC) pattern for nearly all kind of communication between
the broker and the client. For example, when we are publishing a message, our application calls <code class="language-plaintext highlighter-rouge">Basic.Publish</code>, and this message is carried in a method
frame, that will tell <code class="language-plaintext highlighter-rouge">RabbitMQ</code> that a client is going to publish a message.</p>
</li>
<li>
<p><strong>Content header</strong>: Certain specific methods carry a content (like <code class="language-plaintext highlighter-rouge">Basic.Publish</code>, for instance, that carries a message to be published), and the content
header frame is used to send the properties of this content. For example, this frame may have the content-type of a message that is going to be published and a timestamp.</p>
</li>
<li>
<p><strong>Body</strong>: This is the frame with the actual content of your message, and can be split into multiple different frames if the message is too big (131KB is the default frame size limit).</p>
</li>
<li>
<p><strong>Heartbeat</strong>: Used to confirm that a given client is still alive. If <code class="language-plaintext highlighter-rouge">RabbitMQ</code> sends a heartbeat to a client and it does not respond in timely fashion, the client will be
disconnected, as it’s considered dead.</p>
</li>
</ul>
<p>And that’s pretty much everything there’s to know about <code class="language-plaintext highlighter-rouge">AMQP</code>’s frames: 5 possible frame types, each frame being divided in 5 parts, that will allow your application and
<code class="language-plaintext highlighter-rouge">RabbitMQ</code> to talk about everything they need to know from each other. It’s also interesting to notice that <code class="language-plaintext highlighter-rouge">AMQP</code> is a bidirectional protocol, unlike <code class="language-plaintext highlighter-rouge">HTTP</code>, meaning both
<code class="language-plaintext highlighter-rouge">RabbitMQ</code> and your application can send remote procedure calls.</p>
<p>Now that we now what’s happening behind the curtains of our client libraries, let’s recap what happens when we publish or consume messages.</p>
<h4 id="publishing-and-consuming-messages">Publishing and consuming messages</h4>
<p>When publishing a message, the client application needs to send at least 3 frames: The method (<code class="language-plaintext highlighter-rouge">Basic.Publish</code>), the content header, and one or more body
frames, depending on the size of the message:</p>
<p><img src="/assets/images/sequence.png" /></p>
<p>And consuming messages is pretty much the same thing, but it’s the broker, <code class="language-plaintext highlighter-rouge">RabbitMQ</code>, that sends the frames to our client application:</p>
<p><img src="/assets/images/sequence-deliver.png" /></p>
<h4 id="diving-deeper">Diving deeper</h4>
<p>This was a very short overview of the way <code class="language-plaintext highlighter-rouge">RabbitMQ</code> sends data over the wire. To get more information about how <code class="language-plaintext highlighter-rouge">AMQP</code> works, the
<a href="https://www.rabbitmq.com/resources/specs/amqp0-9-1.pdf">specification</a> is quite readable and not that long. For a more <code class="language-plaintext highlighter-rouge">RabbitMQ</code>-specific approach,
<a href="https://www.manning.com/books/rabbitmq-in-depth">RabbitMQ in Depth</a> is also a great resource.</p>
Process registry in Elixir: a practical example2016-02-29T00:00:00+00:00www.brianstorti.com/process-registry-in-elixir<p>Processes in <code class="language-plaintext highlighter-rouge">Elixir</code> (and <code class="language-plaintext highlighter-rouge">Erlang</code>, for that matter) are identified with a unique process id, the <code class="language-plaintext highlighter-rouge">pid</code>.<br />
That’s what we use to interact with them. We send a message to a <code class="language-plaintext highlighter-rouge">pid</code> and the VM takes care of delivering it to the
correct process. Sometimes, though, relying on the <code class="language-plaintext highlighter-rouge">pid</code> of a process can be problematic.</p>
<p>Let’s create a simple application to see what issues we can have and what are some ways to solve them.</p>
<h4 id="starting-with-no-registry-at-all">Starting with no registry at all</h4>
<p>For this example we will create a simple chat application. Let’s go ahead and create a new <code class="language-plaintext highlighter-rouge">mix</code> project for it:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>mix new chat
</code></pre></div></div>
<p>And we can create a pretty standard <code class="language-plaintext highlighter-rouge">GenServer</code> that will be used throughout these examples:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># in lib/chat/server.ex</span>
<span class="k">defmodule</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">GenServer</span>
<span class="c1"># API</span>
<span class="k">def</span> <span class="n">start_link</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="p">[])</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">add_message</span><span class="p">(</span><span class="n">pid</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="n">pid</span><span class="p">,</span> <span class="p">{</span><span class="ss">:add_message</span><span class="p">,</span> <span class="n">message</span><span class="p">})</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">get_messages</span><span class="p">(</span><span class="n">pid</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">pid</span><span class="p">,</span> <span class="ss">:get_messages</span><span class="p">)</span>
<span class="k">end</span>
<span class="c1"># SERVER</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">messages</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">messages</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_cast</span><span class="p">({</span><span class="ss">:add_message</span><span class="p">,</span> <span class="n">new_message</span><span class="p">},</span> <span class="n">messages</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:noreply</span><span class="p">,</span> <span class="p">[</span><span class="n">new_message</span> <span class="o">|</span> <span class="n">messages</span><span class="p">]}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">(</span><span class="ss">:get_messages</span><span class="p">,</span> <span class="n">_from</span><span class="p">,</span> <span class="n">messages</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="n">messages</span><span class="p">,</span> <span class="n">messages</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<blockquote>
<p>If this still does not look familiar to you, <code class="language-plaintext highlighter-rouge">Elixir</code>’s getting started
guides has a <a href="http://elixir-lang.org/getting-started/mix-otp/genserver.html">great introduction</a> to <code class="language-plaintext highlighter-rouge">OTP</code>
that is worth checking.</p>
</blockquote>
<p>And we can now start an <code class="language-plaintext highlighter-rouge">iex</code> session to test this server:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">$</span> <span class="n">iex</span> <span class="o">-</span><span class="no">S</span> <span class="n">mix</span>
<span class="n">iex</span><span class="o">></span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">pid</span><span class="p">}</span> <span class="o">=</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">start_link</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.107.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span><span class="p">(</span><span class="n">pid</span><span class="p">,</span> <span class="s2">"foo"</span><span class="p">)</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span><span class="p">(</span><span class="n">pid</span><span class="p">,</span> <span class="s2">"bar"</span><span class="p">)</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">get_messages</span><span class="p">(</span><span class="n">pid</span><span class="p">)</span>
<span class="p">[</span><span class="s2">"bar"</span><span class="p">,</span> <span class="s2">"foo"</span><span class="p">]</span>
</code></pre></div></div>
<p>So far so good. We get a <code class="language-plaintext highlighter-rouge">pid</code> when we start this process, and then for every message we want to
send (<code class="language-plaintext highlighter-rouge">add_message/2</code> and <code class="language-plaintext highlighter-rouge">get_messages/1</code>) we pass this <code class="language-plaintext highlighter-rouge">pid</code> and everything works as expected.</p>
<p>Things start to get more interesting when we introduce a <code class="language-plaintext highlighter-rouge">Supervisor</code>.</p>
<h4 id="introducing-a-supervisor">Introducing a Supervisor</h4>
<p>If for some reason our <code class="language-plaintext highlighter-rouge">Chat.Server</code> process dies, we are left there, sad and alone in our <code class="language-plaintext highlighter-rouge">iex</code> session, without any choice other
than manually starting a new process and sending messages to this new <code class="language-plaintext highlighter-rouge">pid</code>. Let’s create a <code class="language-plaintext highlighter-rouge">Supervisor</code> so we don’t need to worry about that.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># in lib/chat/supervisor.ex</span>
<span class="k">defmodule</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">Supervisor</span>
<span class="k">def</span> <span class="n">start_link</span> <span class="k">do</span>
<span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="p">[])</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">_</span><span class="p">)</span> <span class="k">do</span>
<span class="n">children</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">worker</span><span class="p">(</span><span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="p">,</span> <span class="p">[])</span>
<span class="p">]</span>
<span class="n">supervise</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="ss">strategy:</span> <span class="ss">:one_for_one</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Creating a <code class="language-plaintext highlighter-rouge">Supervisor</code> is simple enough, but now we have a problem if we try to follow the same approach we did before. We are not
starting the <code class="language-plaintext highlighter-rouge">Chat.Server</code> process ourselves, the <code class="language-plaintext highlighter-rouge">Supervisor</code> is taking care of that and we just don’t have access to the <code class="language-plaintext highlighter-rouge">pid</code> of
the processes that the <code class="language-plaintext highlighter-rouge">Supervisor</code> creates.</p>
<p>This is a property of the <code class="language-plaintext highlighter-rouge">Supervisor</code> pattern. You can’t have access to its children’s <code class="language-plaintext highlighter-rouge">pid</code> as it will,
when necessary, restart these processes (which actually means it will kill and start a new process, with a different <code class="language-plaintext highlighter-rouge">pid</code>).</p>
<h4 id="registering-a-process-name">Registering a process name</h4>
<p>To access our <code class="language-plaintext highlighter-rouge">Chat.Server</code> process we need some way to reference it using something other than the <code class="language-plaintext highlighter-rouge">pid</code>, a reference that
will be the same even if the process is restarted by the <code class="language-plaintext highlighter-rouge">Supervisor</code>. We need to give it a name.</p>
<p>So let’s change <code class="language-plaintext highlighter-rouge">Chat.Server</code>:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># lib/chat/server.ex</span>
<span class="k">defmodule</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">GenServer</span>
<span class="k">def</span> <span class="n">start_link</span> <span class="k">do</span>
<span class="c1"># We now start the GenServer with a `name` option.</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="p">[],</span> <span class="ss">name:</span> <span class="ss">:chat_room</span><span class="p">)</span>
<span class="k">end</span>
<span class="c1"># And our function don't need to receive the pid anymore,</span>
<span class="c1"># as we can reference the process with its unique name.</span>
<span class="k">def</span> <span class="n">add_message</span><span class="p">(</span><span class="n">message</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="ss">:chat_room</span><span class="p">,</span> <span class="p">{</span><span class="ss">:add_message</span><span class="p">,</span> <span class="n">message</span><span class="p">})</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">get_messages</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="ss">:chat_room</span><span class="p">,</span> <span class="ss">:get_messages</span><span class="p">)</span>
<span class="k">end</span>
<span class="c1"># ...</span>
<span class="k">end</span>
</code></pre></div></div>
<p>And it should continue working the same way, except we don’t need to pass the <code class="language-plaintext highlighter-rouge">pid</code> around anymore:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">$</span> <span class="n">iex</span> <span class="o">-</span><span class="no">S</span> <span class="n">mix</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.94.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span><span class="p">(</span><span class="s2">"foo"</span><span class="p">)</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span><span class="p">(</span><span class="s2">"bar"</span><span class="p">)</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">get_messages</span>
<span class="p">[</span><span class="s2">"bar"</span><span class="p">,</span> <span class="s2">"foo"</span><span class="p">]</span>
</code></pre></div></div>
<p>And if the process is restarted we should be able to access it in the same way:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">iex</span><span class="o">></span> <span class="no">Process</span><span class="o">.</span><span class="n">whereis</span><span class="p">(</span><span class="ss">:chat_room</span><span class="p">)</span>
<span class="c1">#PID<0.111.0></span>
<span class="n">iex</span><span class="o">></span> <span class="no">Process</span><span class="o">.</span><span class="n">whereis</span><span class="p">(</span><span class="ss">:chat_room</span><span class="p">)</span> <span class="o">|></span> <span class="no">Process</span><span class="o">.</span><span class="k">exit</span><span class="p">(</span><span class="ss">:kill</span><span class="p">)</span>
<span class="no">true</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Process</span><span class="o">.</span><span class="n">whereis</span><span class="p">(</span><span class="ss">:chat_room</span><span class="p">)</span>
<span class="c1">#PID<0.114.0></span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span> <span class="s2">"foo"</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">get_messages</span>
<span class="p">[</span><span class="s2">"foo"</span><span class="p">]</span>
</code></pre></div></div>
<p>That will do the job for our current scenario, but let’s try to make things a bit more complex (and real).</p>
<h4 id="dynamic-process-creation">Dynamic process creation</h4>
<p>Imagine we want to support multiple chat rooms. A client will start a new room with a name, and should be able
to send messages to any room she wants. The interface would be something like this:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span><span class="o">.</span><span class="n">start_room</span><span class="p">(</span><span class="s2">"first room"</span><span class="p">)</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span><span class="o">.</span><span class="n">start_room</span><span class="p">(</span><span class="s2">"second room"</span><span class="p">)</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span><span class="p">(</span><span class="s2">"first room"</span><span class="p">,</span> <span class="s2">"foo"</span><span class="p">)</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span><span class="p">(</span><span class="s2">"second room"</span><span class="p">,</span> <span class="s2">"bar"</span><span class="p">)</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">get_messages</span><span class="p">(</span><span class="s2">"first room"</span><span class="p">)</span>
<span class="p">[</span><span class="s2">"foo"</span><span class="p">]</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">get_messages</span><span class="p">(</span><span class="s2">"second room"</span><span class="p">)</span>
<span class="p">[</span><span class="s2">"bar"</span><span class="p">]</span>
</code></pre></div></div>
<p>Let’s start by changing the <code class="language-plaintext highlighter-rouge">Supervisor</code> to support that:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># lib/chat/supervisor.ex</span>
<span class="k">defmodule</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">Supervisor</span>
<span class="k">def</span> <span class="n">start_link</span> <span class="k">do</span>
<span class="c1"># We are now registering our supervisor process with a name</span>
<span class="c1"># so we can reference it in the `start_room/1` function</span>
<span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="p">[],</span> <span class="ss">name:</span> <span class="ss">:chat_supervisor</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">start_room</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">Supervisor</span><span class="o">.</span><span class="n">start_child</span><span class="p">(</span><span class="ss">:chat_supervisor</span><span class="p">,</span> <span class="p">[</span><span class="n">name</span><span class="p">])</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">_</span><span class="p">)</span> <span class="k">do</span>
<span class="n">children</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">worker</span><span class="p">(</span><span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="p">,</span> <span class="p">[])</span>
<span class="p">]</span>
<span class="c1"># We also changed the `strategy` to `simple_one_for_one`.</span>
<span class="c1"># With this strategy, we define just a "template" for a child,</span>
<span class="c1"># no process is started during the Supervisor initialization, just</span>
<span class="c1"># when we call `start_child/2`</span>
<span class="n">supervise</span><span class="p">(</span><span class="n">children</span><span class="p">,</span> <span class="ss">strategy:</span> <span class="ss">:simple_one_for_one</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>And let’s make the <code class="language-plaintext highlighter-rouge">Chat.Server</code> accept a name in the <code class="language-plaintext highlighter-rouge">start_link</code> function:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># lib/chat/server.ex</span>
<span class="k">defmodule</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">GenServer</span>
<span class="c1"># Just accept a `name` parameter here for now</span>
<span class="k">def</span> <span class="n">start_link</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="p">[],</span> <span class="ss">name:</span> <span class="ss">:chat_room</span><span class="p">)</span>
<span class="k">end</span>
<span class="c1">#...</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The problem now is that, as we can have a bunch of <code class="language-plaintext highlighter-rouge">Chat.Server</code> processes, we can’t call all of them <code class="language-plaintext highlighter-rouge">:chat_room</code>.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">$</span> <span class="n">iex</span> <span class="o">-</span><span class="no">S</span> <span class="n">mix</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.107.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span><span class="o">.</span><span class="n">start_room</span> <span class="s2">"foo"</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.109.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span><span class="o">.</span><span class="n">start_room</span> <span class="s2">"bar"</span>
<span class="p">{</span><span class="ss">:error</span><span class="p">,</span> <span class="p">{</span><span class="ss">:already_started</span><span class="p">,</span> <span class="c1">#PID<0.109.0>}}</span>
</code></pre></div></div>
<p>Fair enough. When we try to start the second process it fails because a process named <code class="language-plaintext highlighter-rouge">:chat_room</code> is already started.
We need to register these process in another way.</p>
<p>The <code class="language-plaintext highlighter-rouge">name</code> option is quite restrictive to what it will accept, though, we can’t have a name like <code class="language-plaintext highlighter-rouge">{:chat_room, "room name"}</code>.
To quote the <a href="http://elixir-lang.org/docs/stable/elixir/GenServer.html">documentation</a>:</p>
<blockquote>
<p>The supported values are:</p>
</blockquote>
<blockquote>
<p><strong>an atom</strong> - the GenServer is registered locally with the given name using Process.register/2.</p>
<p><strong>{:global, term}</strong> - the GenServer is registered globally with the given term using the functions in the :global module.</p>
<p><strong>{:via, module, term}</strong> - the GenServer is registered with the given mechanism and name.</p>
</blockquote>
<p>The first option, an <code class="language-plaintext highlighter-rouge">atom</code>, is what we have been using so far and we know it’s not enough for our needs now.</p>
<p>The second option is used to register a process globally, across multiple nodes, and relies on a local <code class="language-plaintext highlighter-rouge">ETS</code> table. This also
means it requires synchronization across the entire cluster, which introduces some unnecessary overhead unless you actually need this behavior.</p>
<p>The third and last option is using what is called a <code class="language-plaintext highlighter-rouge">via tuple</code>, and that’s exactly what we need to solve our problem. That’s what the documentation says about it:</p>
<blockquote>
<p>The :via option expects a module that exports <code class="language-plaintext highlighter-rouge">register_name/2</code>, <code class="language-plaintext highlighter-rouge">unregister_name/1</code>, <code class="language-plaintext highlighter-rouge">whereis_name/1</code> and <code class="language-plaintext highlighter-rouge">send/2</code>.</p>
</blockquote>
<p>It’s hard to understand what this means without an example, so let’s see this in action.</p>
<h4 id="using-via-tuple">Using <code class="language-plaintext highlighter-rouge">via tuple</code></h4>
<p><code class="language-plaintext highlighter-rouge">via tuple</code> is basically a way to tell <code class="language-plaintext highlighter-rouge">Elixir</code> that we will use a custom module to register our processes. It expects this
module to know how to do a few things:</p>
<ul>
<li>How to register a name, that can be any <code class="language-plaintext highlighter-rouge">Elixir</code> term, using the function <code class="language-plaintext highlighter-rouge">register_name/2</code>;</li>
<li>How to unregister a name, using the function <code class="language-plaintext highlighter-rouge">unregister_name/1</code>;</li>
<li>How to find the <code class="language-plaintext highlighter-rouge">pid</code> of a process with a given name, using <code class="language-plaintext highlighter-rouge">whereis_name/1</code>;</li>
<li>And, finally, how to send a message to a given process, with <code class="language-plaintext highlighter-rouge">send/2</code>.</li>
</ul>
<p>For this to work, these functions are expected to return a response in a specific format, the same way <code class="language-plaintext highlighter-rouge">OTP</code> expects
our <code class="language-plaintext highlighter-rouge">handle_call/3</code>, <code class="language-plaintext highlighter-rouge">handle_cast/2</code> and friends to follow some rules.</p>
<p>So let’s implement a module that knows how to do that:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># in lib/chat/registry.ex</span>
<span class="k">defmodule</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">GenServer</span>
<span class="c1"># API</span>
<span class="k">def</span> <span class="n">start_link</span> <span class="k">do</span>
<span class="c1"># We register our registry (yeah, I know), with a simple name,</span>
<span class="c1"># just so we can reference it in the other functions.</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="no">nil</span><span class="p">,</span> <span class="ss">name:</span> <span class="ss">:registry</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">whereis_name</span><span class="p">(</span><span class="n">room_name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="ss">:registry</span><span class="p">,</span> <span class="p">{</span><span class="ss">:whereis_name</span><span class="p">,</span> <span class="n">room_name</span><span class="p">})</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">register_name</span><span class="p">(</span><span class="n">room_name</span><span class="p">,</span> <span class="n">pid</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="ss">:registry</span><span class="p">,</span> <span class="p">{</span><span class="ss">:register_name</span><span class="p">,</span> <span class="n">room_name</span><span class="p">,</span> <span class="n">pid</span><span class="p">})</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">unregister_name</span><span class="p">(</span><span class="n">room_name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="ss">:registry</span><span class="p">,</span> <span class="p">{</span><span class="ss">:unregister_name</span><span class="p">,</span> <span class="n">room_name</span><span class="p">})</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">send</span><span class="p">(</span><span class="n">room_name</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># If we try to send a message to a process</span>
<span class="c1"># that is not registered, we return a tuple in the format</span>
<span class="c1"># {:badarg, {process_name, error_message}}.</span>
<span class="c1"># Otherwise, we just forward the message to the pid of this room.</span>
<span class="k">case</span> <span class="n">whereis_name</span><span class="p">(</span><span class="n">room_name</span><span class="p">)</span> <span class="k">do</span>
<span class="ss">:undefined</span> <span class="o">-></span>
<span class="p">{</span><span class="ss">:badarg</span><span class="p">,</span> <span class="p">{</span><span class="n">room_name</span><span class="p">,</span> <span class="n">message</span><span class="p">}}</span>
<span class="n">pid</span> <span class="o">-></span>
<span class="no">Kernel</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">pid</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span>
<span class="n">pid</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="c1"># SERVER</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">_</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="no">Map</span><span class="o">.</span><span class="n">new</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">({</span><span class="ss">:whereis_name</span><span class="p">,</span> <span class="n">room_name</span><span class="p">},</span> <span class="n">_from</span><span class="p">,</span> <span class="n">state</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="no">Map</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">room_name</span><span class="p">,</span> <span class="ss">:undefined</span><span class="p">),</span> <span class="n">state</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">({</span><span class="ss">:register_name</span><span class="p">,</span> <span class="n">room_name</span><span class="p">,</span> <span class="n">pid</span><span class="p">},</span> <span class="n">_from</span><span class="p">,</span> <span class="n">state</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># Registering a name is just a matter of putting it in our Map.</span>
<span class="c1"># Our response tuple include a `:no` or `:yes` indicating if</span>
<span class="c1"># the process was included or if it was already present.</span>
<span class="k">case</span> <span class="no">Map</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">room_name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">nil</span> <span class="o">-></span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="ss">:yes</span><span class="p">,</span> <span class="no">Map</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">room_name</span><span class="p">,</span> <span class="n">pid</span><span class="p">)}</span>
<span class="n">_</span> <span class="o">-></span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="ss">:no</span><span class="p">,</span> <span class="n">state</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_cast</span><span class="p">({</span><span class="ss">:unregister_name</span><span class="p">,</span> <span class="n">room_name</span><span class="p">},</span> <span class="n">state</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># And unregistering is as simple as deleting an entry from our Map</span>
<span class="p">{</span><span class="ss">:noreply</span><span class="p">,</span> <span class="no">Map</span><span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">room_name</span><span class="p">)}</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Again, it’s up to us to decide how this registry is going to work. Here we are using a simple <code class="language-plaintext highlighter-rouge">Map</code> to relate the room name with its pid.<br />
The implementation is straightforward if you are familiar with how a <code class="language-plaintext highlighter-rouge">GenServer</code> works (except for the not so usual return values).</p>
<p>Let’s try that on <code class="language-plaintext highlighter-rouge">iex</code>:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">$</span> <span class="n">iex</span> <span class="o">-</span><span class="no">S</span> <span class="n">mix</span>
<span class="n">iex</span><span class="o">></span> <span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="n">pid</span><span class="p">}</span> <span class="o">=</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.107.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span><span class="o">.</span><span class="n">start_link</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.109.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span><span class="o">.</span><span class="n">whereis_name</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">)</span>
<span class="ss">:undefined</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span><span class="o">.</span><span class="n">register_name</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">,</span> <span class="n">pid</span><span class="p">)</span>
<span class="ss">:yes</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span><span class="o">.</span><span class="n">register_name</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">,</span> <span class="n">pid</span><span class="p">)</span>
<span class="ss">:no</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span><span class="o">.</span><span class="n">whereis_name</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">)</span>
<span class="c1">#PID<0.107.0></span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span><span class="o">.</span><span class="n">unregister_name</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">)</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span><span class="o">.</span><span class="n">whereis_name</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">)</span>
<span class="ss">:undefined</span>
</code></pre></div></div>
<p>The registry is working fine. We can register, unregister and find processes using their names, so let’s start using it.</p>
<p>Our original problem was that we now can have multiple <code class="language-plaintext highlighter-rouge">Chat.Server</code> processes that are initialized by a <code class="language-plaintext highlighter-rouge">Supervisor</code>.
In order to send a message to a specific room, we want to use <code class="language-plaintext highlighter-rouge">Chat.Server.add_message("room1", "my message")</code>, so we need
to register our rooms with names like <code class="language-plaintext highlighter-rouge">{:chat_room, "room1"}</code> and <code class="language-plaintext highlighter-rouge">{:chat_room, "room2"}</code>. Here’s how our <code class="language-plaintext highlighter-rouge">via tuple</code> implementation
makes it possible:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># in lib/chat/server.ex</span>
<span class="k">defmodule</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span> <span class="k">do</span>
<span class="kn">use</span> <span class="no">GenServer</span>
<span class="c1"># API</span>
<span class="k">def</span> <span class="n">start_link</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># Instead of passing an atom to the `name` option, we send </span>
<span class="c1"># a tuple. Here we extract this tuple to a private method</span>
<span class="c1"># called `via_tuple` that can be reused for every function</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">start_link</span><span class="p">(</span><span class="bp">__MODULE__</span><span class="p">,</span> <span class="p">[],</span> <span class="ss">name:</span> <span class="n">via_tuple</span><span class="p">(</span><span class="n">name</span><span class="p">))</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">add_message</span><span class="p">(</span><span class="n">room_name</span><span class="p">,</span> <span class="n">message</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># And the `GenServer` callbacks will accept this tuple the same way it</span>
<span class="c1"># accepts a `pid` or an atom.</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="n">via_tuple</span><span class="p">(</span><span class="n">room_name</span><span class="p">),</span> <span class="p">{</span><span class="ss">:add_message</span><span class="p">,</span> <span class="n">message</span><span class="p">})</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">get_messages</span><span class="p">(</span><span class="n">room_name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">GenServer</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">via_tuple</span><span class="p">(</span><span class="n">room_name</span><span class="p">),</span> <span class="ss">:get_messages</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">defp</span> <span class="n">via_tuple</span><span class="p">(</span><span class="n">room_name</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># And the tuple always follow the same format:</span>
<span class="c1"># {:via, module_name, term}</span>
<span class="p">{</span><span class="ss">:via</span><span class="p">,</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span><span class="p">,</span> <span class="p">{</span><span class="ss">:chat_room</span><span class="p">,</span> <span class="n">room_name</span><span class="p">}}</span>
<span class="k">end</span>
<span class="c1"># SERVER (no changes required here)</span>
<span class="c1"># ...</span>
<span class="k">end</span>
</code></pre></div></div>
<p>What happens here is that every time we send a message to <code class="language-plaintext highlighter-rouge">Chat.Server</code> passing a room name,
it will find the <code class="language-plaintext highlighter-rouge">pid</code> of the process we want <strong>via</strong> the module we are providing (in this case, <code class="language-plaintext highlighter-rouge">Chat.Registry</code>).<br />
And this solves our problem, we can have as many <code class="language-plaintext highlighter-rouge">Chat.Server</code> processes as we want and we never need to know their <code class="language-plaintext highlighter-rouge">pid</code>s.</p>
<p>There is still a big problem with this solution. Our registry never knows about processes that crashed and had to be restarted
by the <code class="language-plaintext highlighter-rouge">Supervisor</code>, and that means that when this happens the registry will hold a <code class="language-plaintext highlighter-rouge">pid</code> that is not valid anymore.<br />
Solving this issue should not be too hard, though, We will make our registry monitor all the process it is taking care of,
and when one of them crashed, we can safely remove it from our <code class="language-plaintext highlighter-rouge">Map</code>.</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># in lib/chat/registry.ex</span>
<span class="k">defmodule</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span> <span class="k">do</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="n">handle_call</span><span class="p">({</span><span class="ss">:register_name</span><span class="p">,</span> <span class="n">room_name</span><span class="p">,</span> <span class="n">pid</span><span class="p">},</span> <span class="n">_from</span><span class="p">,</span> <span class="n">state</span><span class="p">)</span> <span class="k">do</span>
<span class="k">case</span> <span class="no">Map</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">room_name</span><span class="p">)</span> <span class="k">do</span>
<span class="no">nil</span> <span class="o">-></span>
<span class="c1"># When a new process is registered, we start monitoring it</span>
<span class="no">Process</span><span class="o">.</span><span class="n">monitor</span><span class="p">(</span><span class="n">pid</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="ss">:yes</span><span class="p">,</span> <span class="no">Map</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">room_name</span><span class="p">,</span> <span class="n">pid</span><span class="p">)}</span>
<span class="n">_</span> <span class="o">-></span>
<span class="p">{</span><span class="ss">:reply</span><span class="p">,</span> <span class="ss">:no</span><span class="p">,</span> <span class="n">state</span><span class="p">}</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">handle_info</span><span class="p">({</span><span class="ss">:DOWN</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="ss">:process</span><span class="p">,</span> <span class="n">pid</span><span class="p">,</span> <span class="n">_</span><span class="p">},</span> <span class="n">state</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># When a monitored process dies, we will receive a `:DOWN` message</span>
<span class="c1"># that we can use to remove the dead pid from our registry</span>
<span class="p">{</span><span class="ss">:noreply</span><span class="p">,</span> <span class="n">remove_pid</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">pid</span><span class="p">)}</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">remove_pid</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">pid_to_remove</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># And here we just filter out the dead pid</span>
<span class="n">remove</span> <span class="o">=</span> <span class="k">fn</span> <span class="p">{</span><span class="n">_key</span><span class="p">,</span> <span class="n">pid</span><span class="p">}</span> <span class="o">-></span> <span class="n">pid</span> <span class="o">!=</span> <span class="n">pid_to_remove</span> <span class="k">end</span>
<span class="no">Enum</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">remove</span><span class="p">)</span> <span class="o">|></span> <span class="no">Enum</span><span class="o">.</span><span class="n">into</span><span class="p">(%{})</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>And let’s make sure it works:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">$</span> <span class="n">iex</span> <span class="o">-</span><span class="no">S</span> <span class="n">mix</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span><span class="o">.</span><span class="n">start_link</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.107.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.109.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span><span class="o">.</span><span class="n">start_room</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.111.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">,</span> <span class="s2">"message"</span><span class="p">)</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">get_messages</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">)</span>
<span class="p">[</span><span class="s2">"message"</span><span class="p">]</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Registry</span><span class="o">.</span><span class="n">whereis_name</span><span class="p">({</span><span class="ss">:chat_room</span><span class="p">,</span> <span class="s2">"room1"</span><span class="p">})</span> <span class="o">|></span> <span class="no">Process</span><span class="o">.</span><span class="k">exit</span><span class="p">(</span><span class="ss">:kill</span><span class="p">)</span>
<span class="no">true</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">,</span> <span class="s2">"message"</span><span class="p">)</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">get_messages</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">)</span>
<span class="p">[</span><span class="s2">"message"</span><span class="p">]</span>
</code></pre></div></div>
<p>And now it doesn’t matter how many times the <code class="language-plaintext highlighter-rouge">Supervisor</code> restart a <code class="language-plaintext highlighter-rouge">Chat.Server</code> process, when we send
a message to a room it will always find the correct <code class="language-plaintext highlighter-rouge">pid</code>.</p>
<h4 id="simplifying-with-gproc">Simplifying with gproc</h4>
<p>This is as far as we will go with our example, but I just want to show a tool that helps us to simplify
our <code class="language-plaintext highlighter-rouge">via tuple</code> registry. That is <a href="https://github.com/uwiger/gproc"><code class="language-plaintext highlighter-rouge">gproc</code></a>, an <code class="language-plaintext highlighter-rouge">Erlang</code> library.</p>
<p>Instead of telling <code class="language-plaintext highlighter-rouge">Elixir</code> to find the <code class="language-plaintext highlighter-rouge">Chat.Server</code> process via our <code class="language-plaintext highlighter-rouge">Chat.Registry</code> module, we will tell it
to find this process via <code class="language-plaintext highlighter-rouge">gproc</code>, and then we should be able to get rid of <code class="language-plaintext highlighter-rouge">Chat.Registry</code>.</p>
<p>Let’s start by adding this dependency on <code class="language-plaintext highlighter-rouge">mix.exs</code>:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># in mix.exs</span>
<span class="k">defmodule</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Mixfile</span> <span class="k">do</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="n">application</span> <span class="k">do</span>
<span class="p">[</span><span class="ss">applications:</span> <span class="p">[</span><span class="ss">:logger</span><span class="p">,</span> <span class="ss">:gproc</span><span class="p">]]</span>
<span class="k">end</span>
<span class="k">defp</span> <span class="n">deps</span> <span class="k">do</span>
<span class="p">[{</span><span class="ss">:gproc</span><span class="p">,</span> <span class="s2">"0.3.1"</span><span class="p">}]</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>And then running <code class="language-plaintext highlighter-rouge">mix deps.get</code> to fetch the new <code class="language-plaintext highlighter-rouge">gproc</code> dependency.</p>
<p>With that in place, we should be able to change our <code class="language-plaintext highlighter-rouge">via tuple</code> definition to make it use <code class="language-plaintext highlighter-rouge">gproc</code> instead of <code class="language-plaintext highlighter-rouge">Chat.Registry</code>:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># in lib/chat/server.ex</span>
<span class="k">defmodule</span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span> <span class="k">do</span>
<span class="c1"># ...</span>
<span class="c1"># The only thing we need to change is the `via_tuple/1` function,</span>
<span class="c1"># to make it use `gproc` instead of `Chat.Registry`</span>
<span class="k">defp</span> <span class="n">via_tuple</span><span class="p">(</span><span class="n">room_name</span><span class="p">)</span> <span class="k">do</span>
<span class="p">{</span><span class="ss">:via</span><span class="p">,</span> <span class="ss">:gproc</span><span class="p">,</span> <span class="p">{</span><span class="ss">:n</span><span class="p">,</span> <span class="ss">:l</span><span class="p">,</span> <span class="p">{</span><span class="ss">:chat_room</span><span class="p">,</span> <span class="n">room_name</span><span class="p">}}}</span>
<span class="k">end</span>
<span class="c1"># ...</span>
<span class="k">end</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">gproc</code> requires that keys are <code class="language-plaintext highlighter-rouge">tuples</code> with three values, in the form <code class="language-plaintext highlighter-rouge">{type, scope, key}</code>.<br />
Here we are using <code class="language-plaintext highlighter-rouge">:n</code> (for <em>name</em>, meaning that we can’t have more than one process registered under a given key) as the type
and <code class="language-plaintext highlighter-rouge">:l</code> (for <em>local</em>, meaning that the process is not registered across the entire cluster of nodes) as the scope. The key can be any <code class="language-plaintext highlighter-rouge">Elixir</code> term we want (e.g. <code class="language-plaintext highlighter-rouge">{:chat_room, "room1"}</code>).
I won’t get into the details of all the possible <code class="language-plaintext highlighter-rouge">gproc</code> values, but you can check that in the <a href="https://github.com/esl/gproc/blob/master/doc/gproc.md">documentation</a>.</p>
<p>With this change, we can now remove the <code class="language-plaintext highlighter-rouge">Chat.Registry</code> module completely, and check that things are still working in the same way:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">$</span> <span class="n">iex</span> <span class="o">-</span><span class="no">S</span> <span class="n">mix</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span><span class="o">.</span><span class="n">start_link</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.190.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span><span class="o">.</span><span class="n">start_room</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.192.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Supervisor</span><span class="o">.</span><span class="n">start_room</span><span class="p">(</span><span class="s2">"room2"</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:ok</span><span class="p">,</span> <span class="c1">#PID<0.194.0>}</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">,</span> <span class="s2">"first message"</span><span class="p">)</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span><span class="p">(</span><span class="s2">"room2"</span><span class="p">,</span> <span class="s2">"second message"</span><span class="p">)</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">get_messages</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">)</span>
<span class="p">[</span><span class="s2">"first message"</span><span class="p">]</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">get_messages</span><span class="p">(</span><span class="s2">"room2"</span><span class="p">)</span>
<span class="p">[</span><span class="s2">"second message"</span><span class="p">]</span>
<span class="n">iex</span><span class="o">></span> <span class="ss">:gproc</span><span class="o">.</span><span class="n">where</span><span class="p">({</span><span class="ss">:n</span><span class="p">,</span> <span class="ss">:l</span><span class="p">,</span> <span class="p">{</span><span class="ss">:chat_room</span><span class="p">,</span> <span class="s2">"room1"</span><span class="p">}})</span> <span class="o">|></span> <span class="no">Process</span><span class="o">.</span><span class="k">exit</span><span class="p">(</span><span class="ss">:kill</span><span class="p">)</span>
<span class="no">true</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">add_message</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">,</span> <span class="s2">"first message"</span><span class="p">)</span>
<span class="ss">:ok</span>
<span class="n">iex</span><span class="o">></span> <span class="no">Chat</span><span class="o">.</span><span class="no">Server</span><span class="o">.</span><span class="n">get_messages</span><span class="p">(</span><span class="s2">"room1"</span><span class="p">)</span>
<span class="p">[</span><span class="s2">"first message"</span><span class="p">]</span>
</code></pre></div></div>
<h4 id="where-to-go-from-here">Where to go from here</h4>
<p>We covered a lot of ground. The main takeaways are:</p>
<ul>
<li>Be careful when dealing with <code class="language-plaintext highlighter-rouge">pid</code>s directly, as they will change when a process is restarted and you may be holding a dead one;</li>
<li>When you need to reference a single process (like when we had just one chat room), registering a this process with an <code class="language-plaintext highlighter-rouge">atom</code> name is usually enough;</li>
<li>When you need to create processes dynamically (e.g. for multiple chat rooms) and have a way to reference them, using a <code class="language-plaintext highlighter-rouge">via tuple</code> is a viable solution;</li>
<li>There are tools out there (like <code class="language-plaintext highlighter-rouge">gproc</code> that we used in our example) that will help you with that so you don’t need to roll your own registry module.</li>
</ul>
<p>That’s not all, though. If you need global registration across all the nodes in a cluster, some other things should be considered as well.
<code class="language-plaintext highlighter-rouge">Erlang</code> has a <a href="http://erlang.org/doc/man/global.html"><code class="language-plaintext highlighter-rouge">global</code></a> module for global registration, <a href="http://erlang.org/doc/man/pg2.html"><code class="language-plaintext highlighter-rouge">pg2</code></a> for process groups,
and even <a href="https://github.com/uwiger/gproc"><code class="language-plaintext highlighter-rouge">gproc</code></a>, that we used in our examples, can help with that.</p>
<p>If this post piqued your interest, you should definitely check out <a href="https://www.manning.com/books/elixir-in-action">Elixir in Action</a>, by <a href="https://github.com/sasa1977">Saša Jurić</a>.
In this <a href="https://github.com/brianstorti/elixir-registry-example-chat-app">repository</a> you can find all the code we wrote in this example.</p>
Understanding Shell Script's idiom: 2>&12015-11-10T00:00:00+00:00www.brianstorti.com/understanding-shell-script-idiom-redirect<p>When we are working with a programming or scripting language, we are constantly
using some idioms, some things that are done in <em>this certain way</em>, the common
solution to a problem. With Shell Script this is not different, and a quite
common idiom, but not so well understood, is the <code class="language-plaintext highlighter-rouge">2>&1</code>, like in<br />
<code class="language-plaintext highlighter-rouge">ls foo > /dev/null 2>&1</code>.<br />
Let me explain what is going on here and why this works the way it does.</p>
<h4 id="a-quick-introduction-to-io-redirection">A quick introduction to I/O redirection</h4>
<p>Simply put, redirection is the mechanism used to send the output of a command to
another place. For instance, if we just <code class="language-plaintext highlighter-rouge">cat</code> a file, its output will be printed
in the screen, by default:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat </span>foo.txt
foo
bar
baz
</code></pre></div></div>
<p>But we can redirect this output to another place. Here, for example, we are
redirecting it to a file called <code class="language-plaintext highlighter-rouge">output.txt</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat </span>foo.txt <span class="o">></span> output.txt
<span class="nv">$ </span><span class="nb">cat </span>output.txt
foo
bar
baz
</code></pre></div></div>
<p>Note that in the first <code class="language-plaintext highlighter-rouge">cat</code>, we don’t see any output in the screen. We changed
the <strong>standard output</strong> (<code class="language-plaintext highlighter-rouge">stdout</code>) location to a file, so it doesn’t use the
screen anymore.</p>
<p>It’s also important to know that there are this other place, called <strong>standard
error</strong> (<code class="language-plaintext highlighter-rouge">stderr</code>), to where programs can send their error messages. So if we
try to <code class="language-plaintext highlighter-rouge">cat</code> a file that doesn’t exist, like this:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat </span>nop.txt <span class="o">></span> output.txt
<span class="nb">cat</span>: nop.txt: No such file or directory
</code></pre></div></div>
<p>Even if we redirect the <code class="language-plaintext highlighter-rouge">stdout</code> to a file, we still see the error output in the
screen, because we are redirecting just the standard output, not the standard
error.</p>
<h4 id="and-a-quick-introduction-to-file-descriptors">And a quick introduction to file descriptors</h4>
<p>A file descriptor is nothing more that a positive integer that represents an
open file. If you have 100 open files, you will have 100 file descriptors for
them.</p>
<p>The only caveat is that, in Unix systems, <a href="https://en.wikipedia.org/wiki/Everything_is_a_file"><em>everything is a
file</em></a>. But that’s not
really important now, we just need to know that there are file descriptors for
the Standard Output (<code class="language-plaintext highlighter-rouge">stdout</code>) and Standard Error (<code class="language-plaintext highlighter-rouge">stderr</code>).</p>
<p>In plain English, it means that there are “ids” that identify these two
locations, and it will always be <code class="language-plaintext highlighter-rouge">1</code> for <code class="language-plaintext highlighter-rouge">stdout</code> and <code class="language-plaintext highlighter-rouge">2</code> for <code class="language-plaintext highlighter-rouge">stderr</code>.</p>
<h4 id="putting-the-pieces-together">Putting the pieces together</h4>
<p>Going back to our first example, when we redirected the output of <code class="language-plaintext highlighter-rouge">cat foo.txt</code>
to <code class="language-plaintext highlighter-rouge">output.txt</code>, we could rewrite the command like this:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat </span>foo.txt 1> output.txt
</code></pre></div></div>
<p>This <code class="language-plaintext highlighter-rouge">1</code> is just the file descriptor for <code class="language-plaintext highlighter-rouge">stdout</code>. The syntax for redirecting is
<code class="language-plaintext highlighter-rouge">[FILE_DESCRIPTOR]></code>, leaving the file descriptor out is just a shortcut to
<code class="language-plaintext highlighter-rouge">1></code>.</p>
<p>So, to redirect <code class="language-plaintext highlighter-rouge">stderr</code>, it should be just a matter of adding the right file
descriptor in place:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Using stderr file descriptor (2) to redirect the errors to a file</span>
<span class="nv">$ </span><span class="nb">cat </span>nop.txt 2> error.txt
<span class="nv">$ </span><span class="nb">cat </span>error.txt
<span class="nb">cat</span>: nop.txt: No such file or directory
</code></pre></div></div>
<p>At this point you probably already know what the <code class="language-plaintext highlighter-rouge">2>&1</code> idiom is doing, but
let’s make it official.</p>
<p>You use <code class="language-plaintext highlighter-rouge">&1</code> to reference the value of the file descriptor 1 (<code class="language-plaintext highlighter-rouge">stdout</code>). So when
you use <code class="language-plaintext highlighter-rouge">2>&1</code> you are basically saying “Redirect the <code class="language-plaintext highlighter-rouge">stderr</code> to the same place
we are redirecting the <code class="language-plaintext highlighter-rouge">stdout</code>”. And that’s why we can do something like this
to redirect both <code class="language-plaintext highlighter-rouge">stdout</code> and <code class="language-plaintext highlighter-rouge">stderr</code> to the same place:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat </span>foo.txt <span class="o">></span> output.txt 2>&1
<span class="nv">$ </span><span class="nb">cat </span>output.txt
foo
bar
baz
<span class="nv">$ </span><span class="nb">cat </span>nop.txt <span class="o">></span> output.txt 2>&1
<span class="nv">$ </span><span class="nb">cat </span>output.txt
<span class="nb">cat</span>: nop.txt: No such file or directory
</code></pre></div></div>
<h4 id="recap">Recap</h4>
<ul>
<li>There are two places programs send output to: Standard output (<code class="language-plaintext highlighter-rouge">stdout</code>) and Standard Error (<code class="language-plaintext highlighter-rouge">stderr</code>);</li>
<li>You can redirect these outputs to a different place (like a file);</li>
<li>File descriptors are used to identify <code class="language-plaintext highlighter-rouge">stdout</code> (1) and <code class="language-plaintext highlighter-rouge">stderr</code> (2);</li>
<li><code class="language-plaintext highlighter-rouge">command > output</code> is just a shortcut for <code class="language-plaintext highlighter-rouge">command 1> output</code>;</li>
<li>You can use <code class="language-plaintext highlighter-rouge">&[FILE_DESCRIPTOR]</code> to reference a file descriptor value;</li>
<li>Using <code class="language-plaintext highlighter-rouge">2>&1</code> will redirect <code class="language-plaintext highlighter-rouge">stderr</code> to whatever value is set to <code class="language-plaintext highlighter-rouge">stdout</code> (and <code class="language-plaintext highlighter-rouge">1>&2</code> will do the opposite).</li>
</ul>
<p>And if you want to learn more about Shell Script, I highly recommend the <a href="https://amzn.to/3oeMniV">Classic
Shell Scripting</a> book.</p>
Getting started with Plug in Elixir2015-10-25T00:00:00+00:00www.brianstorti.com/getting-started-with-plug-elixir<p>In the <code class="language-plaintext highlighter-rouge">Elixir</code> world, <a href="https://github.com/elixir-lang/plug"><code class="language-plaintext highlighter-rouge">Plug</code></a> is the specification that enables different frameworks to talk to different web servers in the <code class="language-plaintext highlighter-rouge">Erlang</code> VM.
If you are familiar with <code class="language-plaintext highlighter-rouge">Ruby</code>, <code class="language-plaintext highlighter-rouge">Plug</code> tries to solve the same problem that <code class="language-plaintext highlighter-rouge">Rack</code> does, just with a different approach.<br />
Understanding the basics of how <code class="language-plaintext highlighter-rouge">Plug</code> works will make it easier to get up to speed with <code class="language-plaintext highlighter-rouge">Phoenix</code>, and probably any other web framework
that is created for <code class="language-plaintext highlighter-rouge">Elixir</code>.</p>
<h3 id="the-role-of-a-plug">The role of a plug</h3>
<p>You can think of a <code class="language-plaintext highlighter-rouge">Plug</code> as a <em>piece of code</em> that receives a data structure, does some sort of transformation, and returns
this same data structure, slightly modified. This data structure that a <code class="language-plaintext highlighter-rouge">Plug</code> receives and returns is usually called <code class="language-plaintext highlighter-rouge">connection</code>,
and represents everything that there is to know about a request.</p>
<p>As plugs always receive and return a <code class="language-plaintext highlighter-rouge">connection</code>, they can be easily composable, forming what is called a <em>Plug pipeline</em>. Actually,
that is what usually happens. We receive a request, then each <code class="language-plaintext highlighter-rouge">plug</code> transforms this request a little bit and pass the result to the next plug, until we
get a response.</p>
<p>This <code class="language-plaintext highlighter-rouge">connection</code> that our plugs will be dealing with all the time is a simple <a href="http://elixir-lang.org/getting-started/structs.html"><code class="language-plaintext highlighter-rouge">Elixir</code> struct</a>, called <code class="language-plaintext highlighter-rouge">%Plug.Conn{}</code>, which is <a href="http://hexdocs.pm/plug/Plug.Conn.html">very well documented</a>.</p>
<p><img src="/assets/images/plug.png" /></p>
<h3 id="the-two-types-of-plugs">The two types of Plugs</h3>
<p>There are two types of <code class="language-plaintext highlighter-rouge">Plug</code>s we can have: Function plugs and module plugs.</p>
<p>A <strong>function plug</strong> is any function that receives a <code class="language-plaintext highlighter-rouge">connection</code> (that is a <code class="language-plaintext highlighter-rouge">%Plug.Conn{}</code>) and a set of options, and returns a <code class="language-plaintext highlighter-rouge">connection</code>. Here is a simple example of a valid <code class="language-plaintext highlighter-rouge">Plug</code>:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="n">my_plug</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">conn</span>
<span class="k">end</span>
</code></pre></div></div>
<p>A <strong>module plug</strong> is any module that implements two functions: <code class="language-plaintext highlighter-rouge">init/1</code> and <code class="language-plaintext highlighter-rouge">call/2</code>, like this:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">module</span> <span class="no">MyPlug</span> <span class="k">do</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">opts</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">call</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">conn</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>One interesting characteristic of module plugs is that <code class="language-plaintext highlighter-rouge">init/1</code> is executed in compile time, while <code class="language-plaintext highlighter-rouge">call/2</code> happens at run time.</p>
<p>The value returned by <code class="language-plaintext highlighter-rouge">init/1</code> will be passed to <code class="language-plaintext highlighter-rouge">call/2</code>, making <code class="language-plaintext highlighter-rouge">init/1</code> the perfect place to do any heavy lifting and let
<code class="language-plaintext highlighter-rouge">call/2</code> run as fast as possible at run time.</p>
<h3 id="a-simple-example">A simple example</h3>
<p>To try to make things more concrete, let’s create a simple application that uses a plug to handle an http request.</p>
<p>First, create a project with <code class="language-plaintext highlighter-rouge">mix</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mix new learning_plug
</code></pre></div></div>
<p>Then <code class="language-plaintext highlighter-rouge">cd</code> into the project’s directory and edit <code class="language-plaintext highlighter-rouge">mix.exs</code> adding <code class="language-plaintext highlighter-rouge">Plug</code> and <code class="language-plaintext highlighter-rouge">Cowboy</code> (the web server) as dependencies:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defp</span> <span class="n">deps</span> <span class="k">do</span>
<span class="p">[{</span><span class="ss">:plug</span><span class="p">,</span> <span class="s2">"~> 1.0"</span><span class="p">},</span>
<span class="p">{</span><span class="ss">:cowboy</span><span class="p">,</span> <span class="s2">"~> 1.0"</span><span class="p">}]</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Now run <code class="language-plaintext highlighter-rouge">mix deps.get</code> to install these dependencies and we should be good to start.</p>
<p>Our first plug will simply return a “Hello, World!” text:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">LearningPlug</span> <span class="k">do</span>
<span class="c1"># The Plug.Conn module gives us the main functions</span>
<span class="c1"># we will use to work with our connection, which is</span>
<span class="c1"># a %Plug.Conn{} struct, also defined in this module.</span>
<span class="kn">import</span> <span class="no">Plug</span><span class="o">.</span><span class="no">Conn</span>
<span class="k">def</span> <span class="n">init</span><span class="p">(</span><span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># Here we just add a new entry in the opts map, that we can use</span>
<span class="c1"># in the call/2 function</span>
<span class="no">Map</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">opts</span><span class="p">,</span> <span class="ss">:my_option</span><span class="p">,</span> <span class="s2">"Hello"</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">call</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="c1"># And we send a response back, with a status code and a body</span>
<span class="n">send_resp</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="s2">"</span><span class="si">#{</span><span class="n">opts</span><span class="p">[</span><span class="ss">:my_option</span><span class="p">]</span><span class="si">}</span><span class="s2">, World!"</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>To use this plug, open <code class="language-plaintext highlighter-rouge">iex -S mix</code> and run:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Plug.Adapters.Cowboy.http(LearningPlug, %{})
{:ok, #PID<0.150.0>}
</code></pre></div></div>
<p>Here we use the <code class="language-plaintext highlighter-rouge">Cowboy</code> adapter, and tell it to use our plug. We also need to pass an <code class="language-plaintext highlighter-rouge">options</code> value that will
be used by <code class="language-plaintext highlighter-rouge">init/1</code>.<br />
This should have started a <code class="language-plaintext highlighter-rouge">Cowboy</code> web server on port 4000, so if you open <code class="language-plaintext highlighter-rouge">http://localhost:4000</code> you should see the “Hello, World!” message.</p>
<p>This was simple enough. Let’s just try to make this <code class="language-plaintext highlighter-rouge">plug</code> a bit smarter and return a response based on the URL we hit,
so if we access <code class="language-plaintext highlighter-rouge">http://localhost:4000/Name</code>, we should see “Hello, Name”.</p>
<p>I said that a <code class="language-plaintext highlighter-rouge">connetion</code> represents everything there is to know about a request, and that includes the request path. We can just pattern match
on this request path to create the response we want. Let’s change the <code class="language-plaintext highlighter-rouge">call/2</code> function to be like this:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="n">call</span><span class="p">(%</span><span class="no">Plug</span><span class="o">.</span><span class="no">Conn</span><span class="p">{</span><span class="ss">request_path:</span> <span class="s2">"/"</span> <span class="o"><></span> <span class="n">name</span><span class="p">}</span> <span class="o">=</span> <span class="n">conn</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">send_resp</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="s2">"Hello, </span><span class="si">#{</span><span class="n">name</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>And that’s it. We pattern match the <code class="language-plaintext highlighter-rouge">connection</code> to extract just the information we want, the name, and then send the response we want back to
the web server.</p>
<h3 id="pipelines-because-one-ant-is-no-ant">Pipelines, because one ant is no ant</h3>
<p><code class="language-plaintext highlighter-rouge">Plug</code> gets more interesting when you start composing multiple plugs together, each one doing a small task and handing a modified <code class="language-plaintext highlighter-rouge">connection</code> to the next.<br />
<code class="language-plaintext highlighter-rouge">Phoenix</code>, the web framework, uses these pipelines in a clever way. By default, if we are handling a normal browser request, we have a pipeline like this:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pipeline</span> <span class="ss">:browser</span> <span class="k">do</span>
<span class="n">plug</span> <span class="ss">:accepts</span><span class="p">,</span> <span class="p">[</span><span class="s2">"html"</span><span class="p">]</span>
<span class="n">plug</span> <span class="ss">:fetch_session</span>
<span class="n">plug</span> <span class="ss">:fetch_flash</span>
<span class="n">plug</span> <span class="ss">:protect_from_forgery</span>
<span class="n">plug</span> <span class="ss">:put_secure_browser_headers</span>
<span class="k">end</span>
</code></pre></div></div>
<p>In case we are handling an api request, we don’t need most of these things, so we can have a simpler pipeline just for our api:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pipeline</span> <span class="ss">:api</span> <span class="k">do</span>
<span class="n">plug</span> <span class="ss">:accepts</span><span class="p">,</span> <span class="p">[</span><span class="s2">"json"</span><span class="p">]</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Now, this <code class="language-plaintext highlighter-rouge">pipeline</code> is a <code class="language-plaintext highlighter-rouge">Phoenix</code> abstraction, but <code class="language-plaintext highlighter-rouge">Plug</code> gives us an easy way to build our own pipelines: <code class="language-plaintext highlighter-rouge">Plug.Builder</code>.</p>
<p>Here’s an example of how it works:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">defmodule</span> <span class="no">MyPipeline</span> <span class="k">do</span>
<span class="c1"># We use Plug.Builder to have access to the plug/2 macro.</span>
<span class="c1"># This macro can receive a function or a module plug and an</span>
<span class="c1"># optional parameter that will be passed unchanged to the </span>
<span class="c1"># given plug.</span>
<span class="kn">use</span> <span class="no">Plug</span><span class="o">.</span><span class="no">Builder</span>
<span class="n">plug</span> <span class="no">Plug</span><span class="o">.</span><span class="no">Logger</span>
<span class="n">plug</span> <span class="ss">:extract_name</span>
<span class="n">plug</span> <span class="ss">:greet</span><span class="p">,</span> <span class="p">%{</span><span class="ss">my_option:</span> <span class="s2">"Hello"</span><span class="p">}</span>
<span class="k">def</span> <span class="n">extract_name</span><span class="p">(%</span><span class="no">Plug</span><span class="o">.</span><span class="no">Conn</span><span class="p">{</span><span class="ss">request_path:</span> <span class="s2">"/"</span> <span class="o"><></span> <span class="n">name</span><span class="p">}</span> <span class="o">=</span> <span class="n">conn</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">assign</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="ss">:name</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="n">greet</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">opts</span><span class="p">)</span> <span class="k">do</span>
<span class="n">conn</span>
<span class="o">|></span> <span class="n">send_resp</span><span class="p">(</span><span class="mi">200</span><span class="p">,</span> <span class="s2">"</span><span class="si">#{</span><span class="n">opts</span><span class="p">[</span><span class="ss">:my_option</span><span class="p">]</span><span class="si">}</span><span class="s2">, </span><span class="si">#{</span><span class="n">conn</span><span class="o">.</span><span class="n">assigns</span><span class="o">.</span><span class="n">name</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Here we combined three plugs, <code class="language-plaintext highlighter-rouge">Plug.Logger</code>, <code class="language-plaintext highlighter-rouge">extract_name</code> and <code class="language-plaintext highlighter-rouge">greet</code>.<br />
The <code class="language-plaintext highlighter-rouge">extract_name</code> uses <code class="language-plaintext highlighter-rouge">assign/3</code> to assign a value to a key in this connection. <code class="language-plaintext highlighter-rouge">assign/3</code> returns a modified <code class="language-plaintext highlighter-rouge">connection</code>, that
is then handed to the <code class="language-plaintext highlighter-rouge">greet</code> plug, that basically reads this assigned value to create the response we want.</p>
<p><code class="language-plaintext highlighter-rouge">Plug.Logger</code> is shipped with <code class="language-plaintext highlighter-rouge">Plug</code> and, as you probably guessed, is used to log our http requests. A bunch of useful plugs like this
are available out of the box, you can find the list and descriptions in the <a href="https://hexdocs.pm/plug/readme.html">docs</a> (“Available Plugs” section).</p>
<p>Using this pipeline is as simple as using a single plug:</p>
<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Plug</span><span class="o">.</span><span class="no">Adapters</span><span class="o">.</span><span class="no">Cowboy</span><span class="o">.</span><span class="n">http</span> <span class="no">MyPipeline</span><span class="p">,</span> <span class="p">%{}</span>
</code></pre></div></div>
<p>One important thing to keep in mind is that the plugs will always be executed in the order they are defined in the pipeline.</p>
<p>Another interesting thing is that these pipelines created with <code class="language-plaintext highlighter-rouge">Plug.Builder</code> are also plugs, so we can have pipelines that are
composed by other pipelines.</p>
<h3 id="and-to-sum-up">And to sum up</h3>
<p>The main idea is that we have our request represented as a <code class="language-plaintext highlighter-rouge">%Plug.Conn{}</code>, and this struct is passed from function to function, being
slightly modified in each step, until we have a response that can be sent back. <code class="language-plaintext highlighter-rouge">Plug</code> is a specification that defines how this should work
and creates an abstractions so multiple frameworks can talk to multiple web server, as long as they are respecting the specification.</p>
<p>It also ships with these convenience modules that make it easier to do a lot of things that are common to most applications, like creating pipelines,
simple routers, dealing with cookies, headers, etc.</p>
<p>In the end of the day, it’s just that simple functional programming idea of passing data through functions until we get the result we want,
and in this case the data happens to be an http request.</p>
The actor model in 10 minutes2015-07-09T00:00:00+00:00www.brianstorti.com/the-actor-model<p>Our CPUs are not getting any faster. What’s happening is that we now have
multiple cores on them. If we want to take advantage of all this hardware we
have available now, we need a way to run our code concurrently. Decades of
untraceable bugs and developers’ depression have shown that <code class="language-plaintext highlighter-rouge">threads</code> are not
the way to go. But fear not, there are great alternatives out there and today I
want to show you one of them: The actor model.</p>
<h3 id="the-model">The model</h3>
<p>The actor model is a conceptual model to deal with concurrent computation. It
defines some general rules for how the system’s components should behave and
interact with each other. The most famous language that uses this model is
probably <code class="language-plaintext highlighter-rouge">Erlang</code>. I’ll try to focus more on the model itself and not in how
it’s implemented in different languages or libraries.</p>
<h3 id="actors">Actors</h3>
<p>An actor is the primitive unit of computation. It’s the <em>thing</em> that receives a
message and does some kind of computation based on it.</p>
<p>The idea is very similar to what we have in object-oriented languages: An object
receives a message (a method call) and does something depending on which message
it receives (which method we are calling).</p>
<p>The main difference is that actors are completely isolated from each other and
they will never share memory. It’s also worth noting that an actor can maintain
a private state that can never be changed directly by another actor.</p>
<h5 id="one-ant-is-no-ant">One ant is no ant</h5>
<p>And one actor is no actor. They come in systems. In the actor model everything
is an actor and they need to have addresses so one actor can send a message to
another.</p>
<h5 id="actors-have-mailboxes">Actors have mailboxes</h5>
<p>It’s important to understand that, although multiple actors can run at the same
time, an actor will process a given message sequentially.
This means that if you send 3 messages to the <strong>same</strong> actor, it will just
execute one at a time. To have these 3 messages being executed concurrently, you
need to create 3 actors and send one message each.</p>
<p>Messages are sent asynchronously to an actor, that needs to store them somewhere
while it’s processing another message. The mailbox is the place where these
messages are stored.</p>
<p><img src="/assets/images/actors.png" /></p>
<div class="image-description">
Actors communicate with each other by sending asynchronous messages. Those messages are stored in other actors' mailboxes until they're processed.
</div>
<hr />
<h5 id="what-actors-do">What actors do</h5>
<p>When an actor receives a message, it can do one of these 3 things:</p>
<ul>
<li>Create more actors</li>
<li>Send messages to other actors</li>
<li>Designate what to do with the next message</li>
</ul>
<p>The first two bullet points are pretty straightforward, but the last one is interesting.<br />
I said before that an actor can maintain a private state. “Designating what to
do with the next message” basically means defining how this state will look like
for the next message it receives. Or, more clearly, it’s how actors mutate
state.</p>
<p>Let’s imagine we have an actor that behaves like a calculator and that its
initial state is simply the number <code class="language-plaintext highlighter-rouge">0</code>. When this actor receives the <code class="language-plaintext highlighter-rouge">add(1)</code>
message, instead of mutating its original state, it designates that for the next
message it receives, the state will be <code class="language-plaintext highlighter-rouge">1</code>.</p>
<h3 id="fault-tolerance">Fault tolerance</h3>
<p><code class="language-plaintext highlighter-rouge">Erlang</code> introduced the “let it crash” philosophy. The idea is that you
shouldn’t need to program defensively, trying to anticipate all the possible
problems that could happen and find a way to handle them, simply because there
is no way to think about every single failure point.</p>
<p>What <code class="language-plaintext highlighter-rouge">Erlang</code> does is simply letting it crash, but make this critical code be
supervised by someone whose only responsibility is to know what to do when this
crash happens (like resetting this unit of code to a stable state), and what
makes it all possible is the actor model.</p>
<p>Every code run inside a <code class="language-plaintext highlighter-rouge">process</code> (that is basically how <code class="language-plaintext highlighter-rouge">Erlang</code> calls its
actors). This <code class="language-plaintext highlighter-rouge">process</code> is completely isolated, meaning its state is not going
to influence any other <code class="language-plaintext highlighter-rouge">process</code>. We have a supervisor, that is basically
another <code class="language-plaintext highlighter-rouge">process</code> (everything is an actor, remember?), that will be notified
when the supervised <code class="language-plaintext highlighter-rouge">process</code> crashes and then can do something about it.</p>
<p>This makes it possible to create systems that “self heal”, meaning that if an
actor gets to an exceptional state and crashes, by whatever reason, a supervisor
can do something about it to try to put it in a consistent state again (and
there are multiple strategies to do that, the most common being just to restart
the actor with its initial state).</p>
<h3 id="distribution">Distribution</h3>
<p>Another interesting aspect of the actor model is that it doesn’t matter if the
actor that I’m sending a message to is running locally or in another node.</p>
<p>Think about it, if an actor is just this unit of code with a mailbox and an
internal state, and it just respond to messages, who cares in which machine it’s
actually running? As long as we can make the <strong>message</strong> get there we are fine.<br />
This allows us to create systems that leverage multiple computers and helps us
to recover if one of them fail.</p>
<h3 id="next-steps-and-other-resources">Next steps and other resources</h3>
<p>This was a quick overview of the conceptual model that is the base of great
languages like <code class="language-plaintext highlighter-rouge">Erlang</code> and <code class="language-plaintext highlighter-rouge">Elixir</code> and libraries like <code class="language-plaintext highlighter-rouge">Akka</code> (for the <code class="language-plaintext highlighter-rouge">JVM</code>)
and <code class="language-plaintext highlighter-rouge">Celluloid</code> (for <code class="language-plaintext highlighter-rouge">Ruby</code>).</p>
<p>If I was successful in making you curious about how this model is implemented
and used in the real world, this is the list of books that I read or am reading
about this topic and can recommend:</p>
<ul>
<li><a href="https://amzn.to/3rznII1">Seven Concurrency Models in Seven Weeks: When Threads Unravel</a></li>
<li><a href="https://amzn.to/3xMZcUL">Programming Elixir</a></li>
<li><a href="https://amzn.to/3EhOpV2">Elixir in Action</a></li>
</ul>
<p>And if you are interested in more details about the conceptual idea itself, I
can’t recommend this video enough:</p>
<iframe width="100%" height="500" src="https://www.youtube.com/embed/7erJ1DV_Tlo" frameborder="0" allowfullscreen=""></iframe>
Rethinking your shebang2015-04-22T00:00:00+00:00www.brianstorti.com/rethinking-your-shebang<p>Are you using <code class="language-plaintext highlighter-rouge">#!/bin/{bash,zsh,sh}</code> in your shebang? Most scripts that I have to deal with are, and it sucks. Let me show you why using <code class="language-plaintext highlighter-rouge">#!/usr/bin/env {bash,zsh,sh}</code> is better, most of the time. If you are already using <code class="language-plaintext highlighter-rouge">#!/usr/bin/env</code>,
learn why you shouldn’t just use it blindly.</p>
<h4 id="why-is-binbash-bad">Why is <code class="language-plaintext highlighter-rouge">/bin/bash</code> bad?</h4>
<p>First, it assumes that <code class="language-plaintext highlighter-rouge">bash</code> (or whatever you are using) is installed in that specific location, in every system it’s going to run. Although this is the case in most systems, there are exceptions. In OpenBSD, for example, <code class="language-plaintext highlighter-rouge">bash</code> is
an optional package and is located at <code class="language-plaintext highlighter-rouge">/usr/local/bin/bash</code>.</p>
<p>But even if I’m using a system where <code class="language-plaintext highlighter-rouge">bash</code> is installed at <code class="language-plaintext highlighter-rouge">/bin/bash</code>, I might be using a different <code class="language-plaintext highlighter-rouge">bash</code> version, installed in a different location.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>bash <span class="nt">--version</span>
version 4.3.33
<span class="nv">$ </span>/bin/bash <span class="nt">--version</span>
version 3.2.51
</code></pre></div></div>
<p>What’s happening here is that I used <code class="language-plaintext highlighter-rouge">homebrew</code> to install a newer version of <code class="language-plaintext highlighter-rouge">bash</code>, and all these <code class="language-plaintext highlighter-rouge">homebrew</code> packages are installed at <code class="language-plaintext highlighter-rouge">/usr/local/bin/</code>. If this script uses a <code class="language-plaintext highlighter-rouge">bash</code> 4 feature, it will fail just because it’s pointing directly to <code class="language-plaintext highlighter-rouge">bash</code> 3.</p>
<h4 id="using-the-path-instead">Using the <code class="language-plaintext highlighter-rouge">$PATH</code> instead</h4>
<p>So, how does <code class="language-plaintext highlighter-rouge">/usr/bin/env</code> solve this problem? Well, instead of hard-coding a location, you are telling your script to look for <code class="language-plaintext highlighter-rouge">bash</code> in the system’s <code class="language-plaintext highlighter-rouge">$PATH</code>. That means it doesn’t matter where <code class="language-plaintext highlighter-rouge">bash</code> is, or how many different versions you have installed,
as long as it’s in your <code class="language-plaintext highlighter-rouge">$PATH</code>, it will be found. This will solve both problems that I showed before.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span>/bin/:<span class="nv">$PATH</span>
<span class="nv">$ </span>/usr/bin/env bash <span class="nt">--version</span>
version 3.2.51
<span class="nv">$ </span><span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span>/usr/local/bin:<span class="nv">$PATH</span>
<span class="nv">$ </span>/usr/bin/env bash <span class="nt">--version</span>
version 4.3.33
</code></pre></div></div>
<h4 id="security-concerns">Security concerns</h4>
<p>As we are now using the <code class="language-plaintext highlighter-rouge">$PATH</code> to find what we want to execute, there are some (minor, I’d say) security concerns that need to be taken into consideration when we are dealing with a multi-user environment.<br />
Let’s say I create a malicious script at <code class="language-plaintext highlighter-rouge">/home/brianstorti/evil/bash</code> and somehow trick you to add this directory to your path (e.g. <code class="language-plaintext highlighter-rouge">export PATH=/home/brianstorti/evil/:$PATH</code>). Now every time that <code class="language-plaintext highlighter-rouge">env</code> looks for
<code class="language-plaintext highlighter-rouge">bash</code> in your <code class="language-plaintext highlighter-rouge">$PATH</code>, it will actually find my evil script.</p>
<h4 id="and-a-last-portability-concern">And a last portability concern</h4>
<p>Also, ironically, there is a portability issue that you have to keep in mind when using <code class="language-plaintext highlighter-rouge">/usr/bin/env</code>.<br />
In some systems the shebang line processing will accept just one interpreter and one argument. This means that if, for instance, you want to run a <code class="language-plaintext highlighter-rouge">Ruby</code> script in warning mode, using <code class="language-plaintext highlighter-rouge">#!/usr/bin/env ruby -w</code> might fail.<br />
<code class="language-plaintext highlighter-rouge">env</code> is the interpreter and <code class="language-plaintext highlighter-rouge">ruby</code> is the argument, so it tries to parse <code class="language-plaintext highlighter-rouge">-w</code>
as a file name.</p>
<h4 id="summarizing">Summarizing</h4>
<ul>
<li>In most cases, using <code class="language-plaintext highlighter-rouge">/usr/bin/env bash</code> will be better than <code class="language-plaintext highlighter-rouge">/bin/bash</code>;</li>
<li>If you are running in a multi-user environment and security is a big concern, forget about <code class="language-plaintext highlighter-rouge">/usr/bin/env</code> (or anything that uses the <code class="language-plaintext highlighter-rouge">$PATH</code>, actually);</li>
<li>If you need an extra argument to your interpreter and you care about portability, <code class="language-plaintext highlighter-rouge">/usr/bin/env</code> may also give you some headaches.</li>
</ul>
Stop using tail -f (mostly)2015-03-12T00:00:00+00:00www.brianstorti.com/stop-using-tail<p>I still see a lot of people using <code class="language-plaintext highlighter-rouge">tail -f</code> to monitor files that are changing, mostly log files. If you are one of them, let me show you a better alternative: <code class="language-plaintext highlighter-rouge">less +F</code></p>
<p>The <code class="language-plaintext highlighter-rouge">less</code> documentation explains well what this <code class="language-plaintext highlighter-rouge">+F</code> is all about:</p>
<blockquote>
<p>Scroll forward, and keep trying to read when the end of file is reached. Normally this command would be used when already at the end of the file. It is a way to monitor the tail of a file which is
growing while it is being viewed. (The behavior is similar to the “tail -f” command.)</p>
</blockquote>
<p>So it says that it’s similar to <code class="language-plaintext highlighter-rouge">tail -f</code>, but why I think it’s better?</p>
<p>Simply put, it allows you to switch between navigation and watching mode. We all have been there: You are watching a file with <code class="language-plaintext highlighter-rouge">tail -f</code>, and then you need to search for something in this file, or just navigate up and down.
Now you need to exit <code class="language-plaintext highlighter-rouge">tail</code> (or open a new shell), and <code class="language-plaintext highlighter-rouge">ack</code> this file or open it with <code class="language-plaintext highlighter-rouge">vim</code> to find what you are looking for. After that, you run <code class="language-plaintext highlighter-rouge">tail</code> again to continue watching the file. There’s no need to do that when
you are using <code class="language-plaintext highlighter-rouge">less</code>.</p>
<p>Let’s say you want to watch the file <code class="language-plaintext highlighter-rouge">production.log</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>less +F production.log
Important
log
information
here
Waiting <span class="k">for </span>data... <span class="o">(</span>interrupt to abort<span class="o">)</span>
</code></pre></div></div>
<p>Here you have pretty much the same behaviour you’d get with <code class="language-plaintext highlighter-rouge">tail</code>.</p>
<p>Now let’s say something interesting appears, and you want to search all the occurrences of “foo”. You can just hit <code class="language-plaintext highlighter-rouge">Ctrl-c</code> to go to “normal” <code class="language-plaintext highlighter-rouge">less</code>
mode (as if you had opened the file without the <code class="language-plaintext highlighter-rouge">+F</code> flag), and then you have all the normal <code class="language-plaintext highlighter-rouge">less</code> features you’d expect, including the search with <code class="language-plaintext highlighter-rouge">/foo</code>. You can go to the next or previous occurrence with <code class="language-plaintext highlighter-rouge">n</code> or <code class="language-plaintext highlighter-rouge">N</code>,
up and down with <code class="language-plaintext highlighter-rouge">j</code> and <code class="language-plaintext highlighter-rouge">k</code>, create marks with <code class="language-plaintext highlighter-rouge">m</code> and do all sort of things that <code class="language-plaintext highlighter-rouge">less(1)</code> says you can do.</p>
<p>Once you are done, just hit <code class="language-plaintext highlighter-rouge">F</code> to go back to watching mode again. It’s that easy.</p>
<h1 id="when-not-to-use-less">When not to use less</h1>
<p>When you need to watch multiple files at the same time, <code class="language-plaintext highlighter-rouge">tail -f</code> can actually give you a better output. It will show you something like this:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">tail</span> <span class="nt">-f</span> <span class="k">*</span>.txt
<span class="o">==></span> file1.txt <<span class="o">==</span>
content <span class="k">for </span>first file
<span class="o">==></span> file2.txt <<span class="o">==</span>
content <span class="k">for </span>second file
<span class="o">==></span> file3.txt <<span class="o">==</span>
content <span class="k">for </span>third file
</code></pre></div></div>
<p>When a change happens, it prints the file name and the new content, which is quite handy.</p>
<p>With <code class="language-plaintext highlighter-rouge">less</code>, it would be like this:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>less +F <span class="k">*</span>.txt
content <span class="k">for </span>first file
</code></pre></div></div>
<p>It shows the content of just one file at a time. If you want to see what’s happening in the second file, you need to first <code class="language-plaintext highlighter-rouge">Ctrl-c</code> to go to normal mode, then type <code class="language-plaintext highlighter-rouge">:n</code> to go to the next buffer, and then <code class="language-plaintext highlighter-rouge">F</code> again to go back to the watching mode.</p>
<p>Depending on your needs, it might still be worth to use <code class="language-plaintext highlighter-rouge">less</code> for multiple files, but most of the time I just go with <code class="language-plaintext highlighter-rouge">tail</code> for these cases. The important thing is to know the tools that we have available and use the right one
for the job at hand.</p>
<blockquote>
<blockquote>
<p>Статья на сайте softdroid.net: <a href="http://softdroid.net/perestante-ispolzovat-f-chasto">Блог о файлах и данных: Перестаньте использовать -f (часто)</a></p>
</blockquote>
</blockquote>
Creating a RubyGems plugin2015-03-10T00:00:00+00:00www.brianstorti.com/creating-a-rubygems-plugin<p>Do you know when you install a <code class="language-plaintext highlighter-rouge">gem</code> and it adds a custom command to <code class="language-plaintext highlighter-rouge">RubyGems</code>? Then you can just run <code class="language-plaintext highlighter-rouge">gem <custom-command> <params></code> and it does something cool?
Well, that is just a <code class="language-plaintext highlighter-rouge">RubyGems</code> plugin, and although it’s not very well documented, it’s not that hard to create one.</p>
<h1 id="our-goal">Our goal</h1>
<p>Our goal here is just to understand what are the pieces that we need to put together to create one of these plugins. We are not going to create something that is amazingly useful.
Here’s what it will do: It will add a <code class="language-plaintext highlighter-rouge">repo</code> command, that will just open a GitHub repository in your browser.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gem repo ruby/ruby
<span class="c"># should open http://github.com/ruby/ruby in your browser</span>
</code></pre></div></div>
<p>So let’s get our hands dirty.</p>
<h1 id="its-just-a-gem">It’s just a gem</h1>
<p>A <code class="language-plaintext highlighter-rouge">RubyGems</code> plugin is just a normal <code class="language-plaintext highlighter-rouge">gem</code>, with some specific characteristics. To create a <code class="language-plaintext highlighter-rouge">gem</code> you can use any template or generator you like. I’ll use <code class="language-plaintext highlighter-rouge">bundle gem repo</code> to create the skeleton for our plugin.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>bundle gem repo
├── Gemfile
├── LICENSE.txt
├── README.md
├── Rakefile
├── bin
│ ├── console
│ └── setup
├── lib
│ ├── repo
│ │ └── version.rb
│ └── repo.rb
└── repo.gemspec
</code></pre></div></div>
<p>And the first step is to update the <code class="language-plaintext highlighter-rouge">repo.gemspec</code> file with your plugin details. It could look something like this:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># coding: utf-8</span>
<span class="n">lib</span> <span class="o">=</span> <span class="no">File</span><span class="p">.</span><span class="nf">expand_path</span><span class="p">(</span><span class="s1">'../lib'</span><span class="p">,</span> <span class="kp">__FILE__</span><span class="p">)</span>
<span class="vg">$LOAD_PATH</span><span class="p">.</span><span class="nf">unshift</span><span class="p">(</span><span class="n">lib</span><span class="p">)</span> <span class="k">unless</span> <span class="vg">$LOAD_PATH</span><span class="p">.</span><span class="nf">include?</span><span class="p">(</span><span class="n">lib</span><span class="p">)</span>
<span class="nb">require</span> <span class="s1">'repo/version'</span>
<span class="no">Gem</span><span class="o">::</span><span class="no">Specification</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span> <span class="o">|</span><span class="n">spec</span><span class="o">|</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">name</span> <span class="o">=</span> <span class="s2">"repo"</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">version</span> <span class="o">=</span> <span class="no">Repo</span><span class="o">::</span><span class="no">VERSION</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">authors</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"Your name"</span><span class="p">]</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">email</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"your@email.com"</span><span class="p">]</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">summary</span> <span class="o">=</span> <span class="sx">%q{Opens github repo}</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">description</span> <span class="o">=</span> <span class="sx">%q{Opens github repo}</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">license</span> <span class="o">=</span> <span class="s2">"MIT"</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">files</span> <span class="o">=</span> <span class="sb">`git ls-files -z`</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="s2">"</span><span class="se">\x0</span><span class="s2">"</span><span class="p">).</span><span class="nf">reject</span> <span class="p">{</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span> <span class="n">f</span><span class="p">.</span><span class="nf">match</span><span class="p">(</span><span class="sr">%r{^(test|spec|features)/}</span><span class="p">)</span> <span class="p">}</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">bindir</span> <span class="o">=</span> <span class="s2">"exe"</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">executables</span> <span class="o">=</span> <span class="n">spec</span><span class="p">.</span><span class="nf">files</span><span class="p">.</span><span class="nf">grep</span><span class="p">(</span><span class="sr">%r{^exe/}</span><span class="p">)</span> <span class="p">{</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span> <span class="no">File</span><span class="p">.</span><span class="nf">basename</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="p">}</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">require_paths</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"lib"</span><span class="p">]</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">add_development_dependency</span> <span class="s2">"bundler"</span><span class="p">,</span> <span class="s2">"~> 1.8"</span>
<span class="n">spec</span><span class="p">.</span><span class="nf">add_development_dependency</span> <span class="s2">"rake"</span><span class="p">,</span> <span class="s2">"~> 10.0"</span>
<span class="k">end</span>
</code></pre></div></div>
<p>After you run <code class="language-plaintext highlighter-rouge">git add .</code>, you should be able to build your gem with <code class="language-plaintext highlighter-rouge">gem build repo.gemspec</code> and check that a file called <code class="language-plaintext highlighter-rouge">repo-0.1.0.gem</code> was created. We are good to go!</p>
<h1 id="the-rubygems-requirements">The RubyGems requirements</h1>
<p><code class="language-plaintext highlighter-rouge">RubyGems</code> will look for a file called <code class="language-plaintext highlighter-rouge">rubygems_plugin.rb</code> in the root of the <code class="language-plaintext highlighter-rouge">require_path</code> that was defined in the gemspec. In our case, it’s in the <code class="language-plaintext highlighter-rouge">lib</code> directory, so we will create this file there:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># lib/rubygems_plugin.rb</span>
<span class="nb">require</span> <span class="s2">"rubygems/command_manager"</span>
<span class="no">Gem</span><span class="o">::</span><span class="no">CommandManager</span><span class="p">.</span><span class="nf">instance</span><span class="p">.</span><span class="nf">register_command</span><span class="p">(</span><span class="ss">:repo</span><span class="p">)</span>
</code></pre></div></div>
<p>Here we are just registering a new command, so <code class="language-plaintext highlighter-rouge">RubyGems</code> will be able to find it when someone tries to execute our <code class="language-plaintext highlighter-rouge">gem repo</code>.<br />
That’s the same way the builtin commands are registered, as you can see <a href="https://github.com/rubygems/rubygems/blob/master/lib/rubygems/command_manager.rb#L99">here</a>.</p>
<p>After our custom command is registered, we need to create the class that will be executed when someone calls this command. <code class="language-plaintext highlighter-rouge">RubyGems</code> will look for a class in <code class="language-plaintext highlighter-rouge">rubygems/commands</code>, that
matches our command name. In our case, <code class="language-plaintext highlighter-rouge">repo_command.rb</code>.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># lib/rubygems/commands/repo_command.rb</span>
<span class="k">class</span> <span class="nc">Gem::Commands::RepoCommand</span> <span class="o"><</span> <span class="no">Gem</span><span class="o">::</span><span class="no">Command</span>
<span class="k">def</span> <span class="nf">initialize</span>
<span class="k">super</span><span class="p">(</span><span class="s2">"repo"</span><span class="p">,</span> <span class="s2">"Open github repository"</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">execute</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>We create our <code class="language-plaintext highlighter-rouge">Gem::Commands::RepoCommand</code>, that extends from <code class="language-plaintext highlighter-rouge">Gem::Command</code>. The <code class="language-plaintext highlighter-rouge">execute</code> method is our guy, it’s the one that will be called when we run the command.<br />
Again, that’s exactly how the builtin commands work. If you check <a href="https://github.com/rubygems/rubygems/tree/master/lib/rubygems/commands">this directory</a>, you will see all these commands.
It’s also a great place to find inspiration and see how the commands that you use every day work.</p>
<h1 id="implementing-the-functionality">Implementing the functionality</h1>
<p>Implementing our functionality is just a matter of calling a command in this <code class="language-plaintext highlighter-rouge">execute</code> method. I’ll just use the <code class="language-plaintext highlighter-rouge">open</code> command here, that should work just for OS X, feel free to implement it the way you like.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">execute</span>
<span class="n">repo</span> <span class="o">=</span> <span class="n">options</span><span class="p">[</span><span class="ss">:args</span><span class="p">].</span><span class="nf">first</span>
<span class="nb">system</span> <span class="s2">"open http://github.com/</span><span class="si">#{</span><span class="n">repo</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Notice that we have this <code class="language-plaintext highlighter-rouge">options</code> hash with some useful information, like the list of arguments we received. In this case, we just need the first one, that is the repository name.</p>
<p>And that should be it! Here’s the final structure that we should have:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>├── Gemfile
├── LICENSE.txt
├── README.md
├── Rakefile
├── lib
│ ├── repo
│ │ └── version.rb
│ ├── rubygems
│ │ └── commands
│ │ └── repo_command.rb
│ └── rubygems_plugin.rb
├── repo-0.1.0.gem
└── repo.gemspec
</code></pre></div></div>
<h1 id="installing-the-plugin">Installing the plugin</h1>
<p>Let’s install this plugin to make sure it works.<br />
First, remove the old <code class="language-plaintext highlighter-rouge">repo-0.1.0.gem</code> that we created before:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">rm </span>repo-0.1.0.gem
</code></pre></div></div>
<p>Then make sure all your files are tracked:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git add <span class="nb">.</span>
</code></pre></div></div>
<p>Rebuild you gem:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gem build repo.gemspec
</code></pre></div></div>
<p>And install the plugin:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gem <span class="nb">install </span>repo-0.1.0.gem
<span class="c"># Successfully installed repo-0.1.0</span>
<span class="c"># Parsing documentation for repo-0.1.0</span>
<span class="c"># Done installing documentation for repo after 0 seconds</span>
<span class="c"># 1 gem installed</span>
</code></pre></div></div>
<p>Now the <code class="language-plaintext highlighter-rouge">repo</code> command should already be available:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gem repo ruby/ruby
<span class="c"># should open http://github.com/ruby/ruby in your browser</span>
</code></pre></div></div>
<h1 id="extra">Extra</h1>
<p>There are a few methods that you can override in your class to better explain how the command works. I couldn’t find them documented anywhere, but you can just check the <a href="https://github.com/rubygems/rubygems/blob/master/lib/rubygems/command.rb">base command class</a>.
The methods that you can override have a comment explaining its purpose.<br />
One example is the <code class="language-plaintext highlighter-rouge">usage</code> method, that I probably don’t need to explain. These information are shown when someone runs <code class="language-plaintext highlighter-rouge">gem help <command></code>. You can check <code class="language-plaintext highlighter-rouge">gem help install</code> for an example of a very well documented command.</p>
<p>In the <a href="http://guides.rubygems.org/plugins/">RubyGems website</a> you can find a list of plugins. There are certainly hundreds more out there, but this is a good list to start with and see how things are done.</p>
<h1 id="update">Update</h1>
<p>Since people seem to be more interested in building <code class="language-plaintext highlighter-rouge">RubyGems</code> plugins than I thought, I decided to create a plugin generator. You can find it on <a href="https://github.com/brianstorti/rubygems_plugin_generator">my github</a>.
It’s basically an automation for the things I covered here.</p>
Understanding Bundler's setup process2015-02-22T00:00:00+00:00www.brianstorti.com/understanding-bundler-setup-process<p>If you work with <code class="language-plaintext highlighter-rouge">Ruby</code>, chances are that you are using <a href="http://bundler.io"><code class="language-plaintext highlighter-rouge">Bundler</code></a> quite a lot. It’s the <em>de facto</em> solution for
dependency management, and it’s hard to find a project without a <code class="language-plaintext highlighter-rouge">Gemfile</code>. What is not part of the common knowledge,
though, is how it works. More specifically, how does it make your code see just the dependencies that it should see and nothing else?
Let’s look into <code class="language-plaintext highlighter-rouge">Bundler</code>’s code to find out.</p>
<h3 id="the-example-project">The example project</h3>
<p>To make it easier to understand what <code class="language-plaintext highlighter-rouge">Bundler</code> is doing, I’ll create a simple <code class="language-plaintext highlighter-rouge">sinatra</code> project.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># app.rb</span>
<span class="nb">require</span> <span class="s1">'sinatra'</span>
<span class="n">get</span> <span class="s1">'/test'</span> <span class="k">do</span>
<span class="s1">'test'</span>
<span class="k">end</span>
</code></pre></div></div>
<p>So far so good. As long as I have <code class="language-plaintext highlighter-rouge">sinatra</code> installed, it should work just fine.<br />
The problem is that we don’t like the idea that everyone that is going to run this code needs to <strong>know</strong> what the dependencies are (<code class="language-plaintext highlighter-rouge">sinatra</code> in version <code class="language-plaintext highlighter-rouge">1.4.5</code>),
so we create a <code class="language-plaintext highlighter-rouge">Gemfile</code> to let <code class="language-plaintext highlighter-rouge">Bundler</code> do that for us:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Gemfile</span>
<span class="n">source</span> <span class="s2">"https://rubygems.org"</span>
<span class="n">gem</span> <span class="s2">"sinatra"</span>
</code></pre></div></div>
<p>Now anyone that gets this code can just run <code class="language-plaintext highlighter-rouge">bundle install</code> and all the dependencies should be there, right? Well, not so fast.</p>
<h5 id="the-hidden-dependency">The hidden dependency</h5>
<p>Someday I decide that this <code class="language-plaintext highlighter-rouge">/test</code> route is too boring, and it should now actually returns Metallica’s “The Unforgiven” lyrics. So I just go there and run <code class="language-plaintext highlighter-rouge">gem install vagalume</code>
to get a <code class="language-plaintext highlighter-rouge">gem</code> that does that, and change my code:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'sinatra'</span>
<span class="nb">require</span> <span class="s1">'vagalume'</span>
<span class="n">get</span> <span class="s1">'/test'</span> <span class="k">do</span>
<span class="n">result</span> <span class="o">=</span> <span class="no">Vagalume</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="s2">"Metallica"</span><span class="p">,</span> <span class="s2">"The Unforgiven"</span><span class="p">)</span>
<span class="n">result</span><span class="p">.</span><span class="nf">song</span><span class="p">.</span><span class="nf">lyric</span>
<span class="k">end</span>
</code></pre></div></div>
<p>I run the app and everything seems to be working fine. I commit my code.</p>
<p>As soon as someone else tries to run the app, it breaks badly, saying that it <code class="language-plaintext highlighter-rouge">cannot load such file -- vagalume</code>.</p>
<h5 id="what-just-happened-here">What just happened here?</h5>
<p>The problem is that, although you have a <code class="language-plaintext highlighter-rouge">Gemfile</code> where you list your dependencies, you didn’t tell <code class="language-plaintext highlighter-rouge">Bundler</code> that your app should see <strong>just</strong> those <code class="language-plaintext highlighter-rouge">gems</code>.<br />
This <code class="language-plaintext highlighter-rouge">require 'vagalume'</code> is actually checking all the <code class="language-plaintext highlighter-rouge">gems</code> that you have installed in your system, not just the ones listed in the <code class="language-plaintext highlighter-rouge">Gemfile</code>, and that is not good.</p>
<h3 id="enters-bundlersetup">Enters <code class="language-plaintext highlighter-rouge">bundler/setup</code></h3>
<p>Let’s start to fix this. If we go there and add this line in the top of the file:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'bundler/setup'</span>
</code></pre></div></div>
<p>You should see that the app starts to break with that same error (<code class="language-plaintext highlighter-rouge">cannot load such file -- vagalume (LoadError)</code>), even if you have <code class="language-plaintext highlighter-rouge">vagalume</code> installed. That’s good,
<code class="language-plaintext highlighter-rouge">Bundler</code> is now making sure that our code sees just what it should see, that is, the <code class="language-plaintext highlighter-rouge">gems</code> listed in the <code class="language-plaintext highlighter-rouge">Gemfile</code>.</p>
<h5 id="understanding-what-is-happening">Understanding what is happening</h5>
<p>To put it shortly, what <code class="language-plaintext highlighter-rouge">Bundler</code> is doing is removing from the <code class="language-plaintext highlighter-rouge">$LOAD_PATH</code> everything that is not defined in the <code class="language-plaintext highlighter-rouge">Gemfile</code>. The <code class="language-plaintext highlighter-rouge">$LOAD_PATH</code> (or just <code class="language-plaintext highlighter-rouge">$:</code>) is
the global variable that tells <code class="language-plaintext highlighter-rouge">Ruby</code> where it should look for things that are <code class="language-plaintext highlighter-rouge">require</code>d, so if a dependency is not in the <code class="language-plaintext highlighter-rouge">Gemfile</code>, it’s not going to be in the <code class="language-plaintext highlighter-rouge">$LOAD_PATH</code>,
and then <code class="language-plaintext highlighter-rouge">Ruby</code> has no way to find it.</p>
<h5 id="show-me-the-code">Show me the code</h5>
<p><a href="https://github.com/bundler/bundler/blob/master/lib/bundler/setup.rb">This</a> is the file that is loaded when we <code class="language-plaintext highlighter-rouge">require 'bundler/setup'</code>, and the important thing here is the
<a href="https://github.com/bundler/bundler/blob/master/lib/bundler/setup.rb#L8"><code class="language-plaintext highlighter-rouge">Bundler.setup</code></a> call. This setup first <a href="https://github.com/bundler/bundler/blob/master/lib/bundler/runtime.rb#L11">cleans the load path</a>,
and then <a href="https://github.com/bundler/bundler/blob/master/lib/bundler/runtime.rb#L18">activates</a> just the <code class="language-plaintext highlighter-rouge">gems</code> that are defined in the <code class="language-plaintext highlighter-rouge">Gemfile</code>, which basically means
<a href="https://github.com/bundler/bundler/blob/master/lib/bundler/runtime.rb#L39">adding them to the <code class="language-plaintext highlighter-rouge">$LOAD_PATH</code> variable</a>.</p>
<h5 id="and-that-is-also-what-happens-with-bundle-exec">And that is also what happens with <code class="language-plaintext highlighter-rouge">bundle exec</code></h5>
<p>This is a good moment to understand what happens when we use <code class="language-plaintext highlighter-rouge">bundle exec</code> to run a command.<br />
<code class="language-plaintext highlighter-rouge">Bundler</code> will simply add the value <code class="language-plaintext highlighter-rouge">-rbundler/setup</code> to the environment variable <code class="language-plaintext highlighter-rouge">$RUBYOPT</code>. <a href="https://github.com/bundler/bundler/blob/master/lib/bundler/shared_helpers.rb#L81">Here is where it’s done</a>.<br />
This will tell <code class="language-plaintext highlighter-rouge">ruby</code> to require <code class="language-plaintext highlighter-rouge">bundle/setup</code> before running any command, and that will let <code class="language-plaintext highlighter-rouge">Bundler</code> do its magic to the <code class="language-plaintext highlighter-rouge">$LOAD_PATH</code>, as we just checked.</p>
<h3 id="bundler-on-rails">Bundler on Rails</h3>
<p>As you probably guessed, when you are working with <code class="language-plaintext highlighter-rouge">Rails</code> you don’t really need to worry about this. There’s no magic, <code class="language-plaintext highlighter-rouge">Rails</code> is just calling the same <code class="language-plaintext highlighter-rouge">bundler/setup</code> for you.<br />
You can check in <code class="language-plaintext highlighter-rouge">config/boot.rb</code>, that is where this is done. Also, in <code class="language-plaintext highlighter-rouge">config/application.rb</code>, <code class="language-plaintext highlighter-rouge">Rails</code> will call <code class="language-plaintext highlighter-rouge">Bundler.require</code> for you, that is just a convenience that will auto require
all the gems that are in the <code class="language-plaintext highlighter-rouge">$LOAD_PATH</code> so you don’t need to.<br />
You could do the same thing in that simple <code class="language-plaintext highlighter-rouge">sinatra</code> app, and then remove all those <code class="language-plaintext highlighter-rouge">requires</code>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># app.rb</span>
<span class="nb">require</span> <span class="s1">'bundler/setup'</span>
<span class="no">Bundler</span><span class="p">.</span><span class="nf">require</span>
<span class="c1"># there is no need to manually require the dependencies</span>
<span class="c1"># anymore, as we just called Bundler.require</span>
<span class="c1"># require 'sinatra'</span>
<span class="c1"># require 'vagalume'</span>
<span class="n">get</span> <span class="s1">'/test'</span> <span class="k">do</span>
<span class="n">result</span> <span class="o">=</span> <span class="no">Vagalume</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="s2">"Metallica"</span><span class="p">,</span> <span class="s2">"The Unforgiven"</span><span class="p">)</span>
<span class="n">result</span><span class="p">.</span><span class="nf">song</span><span class="p">.</span><span class="nf">lyric</span>
<span class="k">end</span>
</code></pre></div></div>
<h3 id="wrapping-up">Wrapping up</h3>
<p>As we can see, the mechanism that makes <code class="language-plaintext highlighter-rouge">Bundler</code> work the way it does is not that complex. It’s just changing the <code class="language-plaintext highlighter-rouge">$LOAD_PATH</code> (that is not to say that <code class="language-plaintext highlighter-rouge">Bundler</code> itself is not complex, it actually
does a lot more that what I showed here). Not understanding how it works, though, could make debugging a problem much more painful.<br />
It is worth to take some time to understand at least the basics that make the tools you deal with every day work. It will almost certainly save you some precious time in the future.</p>
Vim as the poor man's sed2015-02-17T00:00:00+00:00www.brianstorti.com/vim-as-the-poor-mans-sed<p>Not long ago I <a href="/enough-sed-to-be-useful/">wrote</a> about <code class="language-plaintext highlighter-rouge">sed</code>, a powerful non-interactive editor that can be used to edit multiple files in a fairly easy way.
Today I want to show how we could use <code class="language-plaintext highlighter-rouge">vim</code>’s not so well known <code class="language-plaintext highlighter-rouge">ex</code> mode to do some of these same tasks, and what are the benefits and shortcomings.</p>
<h3 id="the-ex-mode">The <code class="language-plaintext highlighter-rouge">ex</code> mode</h3>
<p>The <code class="language-plaintext highlighter-rouge">ex</code> mode is very similar to the <code class="language-plaintext highlighter-rouge">command</code> mode: It allows you to enter <a href="http://en.wikipedia.org/wiki/Ex_(text_editor)">ex</a> commands. The main difference
is that you won’t be back to <code class="language-plaintext highlighter-rouge">normal</code> mode after the command is executed. You can enter the <code class="language-plaintext highlighter-rouge">ex</code> mode with <code class="language-plaintext highlighter-rouge">Q</code>, and go back to <code class="language-plaintext highlighter-rouge">normal</code> mode
with <code class="language-plaintext highlighter-rouge">:visual</code>.</p>
<p>It’s a mode designed for batch processing, and we can start <code class="language-plaintext highlighter-rouge">vim</code> with <code class="language-plaintext highlighter-rouge">-e</code>, if that’s all we need.</p>
<h3 id="the-usage">The usage</h3>
<p>The usage is very similar to what we did with <code class="language-plaintext highlighter-rouge">sed</code>. We just need to give it a file path and a set of commands to be executed:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">echo</span> <span class="s2">"foo bar baz"</span> <span class="o">></span> testing-ex.txt
<span class="nv">$ </span>vim <span class="nt">-e</span> testing-ex.txt <span class="o"><<-</span><span class="no">SCRIPT</span><span class="sh">
%s/foo/new-value
w
</span><span class="no">SCRIPT
</span></code></pre></div></div>
<p>The same way, you could just pipe the result of an <code class="language-plaintext highlighter-rouge">echo</code> to vim:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Note that, in vim, "|" is used to execute multiple commands at once:</span>
<span class="nv">$ </span><span class="nb">echo</span> <span class="s2">"%s/foo/new-value/ | w"</span> | vim <span class="nt">-e</span> testing-ex.txt
</code></pre></div></div>
<p>Or we could just move this script to its own file, and then execute the command as:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>vim <span class="nt">-e</span> testing-ex.txt < command.vim
</code></pre></div></div>
<p>This script is just a bunch of <code class="language-plaintext highlighter-rouge">vim</code> commands, the same commands you would execute if you were editing this file by hand. The only caveat is to remember that you need to
save the file (<code class="language-plaintext highlighter-rouge">w</code>) in the end.</p>
<p>For comparison, the equivalent <code class="language-plaintext highlighter-rouge">sed</code> command would be something like this:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sed</span> <span class="nt">-i</span><span class="s1">''</span> <span class="nt">-f</span> command.sed testing-ex.txt
</code></pre></div></div>
<h3 id="the-benefit">The benefit</h3>
<p>The benefit is that it’s just <code class="language-plaintext highlighter-rouge">vim</code>, and you probably already know the commands to edit a file. If you have a map to do some kind of editing, you are all set, just execute these commands
in <code class="language-plaintext highlighter-rouge">ex</code> mode. What if you want to join all the lines? Just execute <code class="language-plaintext highlighter-rouge">%join</code>, as you would if you were editing a single file.</p>
<blockquote>
<p>Check :help ex-cmd-index for the list of all the ex commands available</p>
</blockquote>
<h3 id="the-shortcomings">The shortcomings</h3>
<p>I know you thought about that as soon as you read the title of this post: Performance.<br />
And yes, you are right, <code class="language-plaintext highlighter-rouge">Vim</code> won’t be that fast if you need to edit a lot of files. Let’s measure that by running the same script that substitutes a string in 10.000 files:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Creates 10.000 files to test</span>
<span class="nv">$ </span><span class="k">for </span>i <span class="k">in</span> <span class="o">{</span>1..1000<span class="o">}</span><span class="p">;</span> <span class="k">do </span><span class="nb">echo</span> <span class="s2">"test"</span> <span class="o">></span> <span class="nv">$i</span>.txt<span class="p">;</span> <span class="k">done</span>
<span class="nv">$ </span><span class="nb">time </span><span class="k">for </span>i <span class="k">in</span> <span class="o">{</span>1..1000<span class="o">}</span><span class="p">;</span> <span class="k">do </span><span class="nb">echo</span> <span class="s2">"%s/test/new-value/g | w"</span> | vim <span class="nt">-e</span> <span class="nv">$i</span>.txt<span class="p">;</span> <span class="k">done</span>
<span class="c"># Executes in 22.13s</span>
<span class="nv">$ </span><span class="nb">time </span><span class="k">for </span>i <span class="k">in</span> <span class="o">{</span>1..1000<span class="o">}</span><span class="p">;</span> <span class="k">do </span><span class="nb">sed</span> <span class="nt">-i</span><span class="s1">''</span> <span class="nt">-e</span> <span class="s1">'s/test/new-value/g'</span> <span class="nv">$i</span>.txt<span class="p">;</span> <span class="k">done</span>
<span class="c"># Executes in 3.12s</span>
</code></pre></div></div>
<p>In this simple test, <code class="language-plaintext highlighter-rouge">sed</code> is about 8x faster.</p>
<h3 id="conclusion">Conclusion</h3>
<p>There is some overlap in what you can do with <code class="language-plaintext highlighter-rouge">sed</code> and with <code class="language-plaintext highlighter-rouge">vim</code> in <code class="language-plaintext highlighter-rouge">ex</code> mode. There is no right or wrong, they are just options, and, as always, it’s important to know
the trade offs and when it’s worth to use one option over another.</p>
Vim registers: The basics and beyond2015-02-09T00:00:00+00:00www.brianstorti.com/vim-registers<p>Vim’s register are that kind of thing you don’t think you need until you learn
about it. Then, it’s hard to imagine life without it and it becomes essential in
your workflow. It’s still common for people to use <code class="language-plaintext highlighter-rouge">vim</code> for years without
knowing how to properly work with registers, which I think is a shame, and just
a basic understanding of what they are and how they work can make you a lot more
productive (and avoid a couple of annoyances).</p>
<h3 id="if-you-have-no-idea-what-im-talking-about">If you have no idea what I’m talking about</h3>
<p>You can think of registers as a bunch of spaces in memory that <code class="language-plaintext highlighter-rouge">vim</code> uses to
store some text. Each of these spaces have a identifier, so it can be accessed
later. It’s no different than when you copy some text to your clipboard, except
you usually have just one clipboard to copy to, while <code class="language-plaintext highlighter-rouge">vim</code> allows you to have
multiple places to store different texts.</p>
<h3 id="the-basic-usage">The basic usage</h3>
<p>Every register is accessed using a double quote before its name. For example, we
can access the content that is in the register <code class="language-plaintext highlighter-rouge">r</code> with <code class="language-plaintext highlighter-rouge">"r</code>.</p>
<p>You could add the selected text to the register <code class="language-plaintext highlighter-rouge">r</code> by doing <code class="language-plaintext highlighter-rouge">"ry</code>. You are
copying (<code class="language-plaintext highlighter-rouge">y</code>anking) the selected text, and then adding it to the register <code class="language-plaintext highlighter-rouge">"r</code>.
To paste the content of this register, the logic is the same: <code class="language-plaintext highlighter-rouge">"rp</code>. You are
<code class="language-plaintext highlighter-rouge">p</code>asting the data that is in this register.</p>
<p>You can also access the registers in insert/command mode with <code class="language-plaintext highlighter-rouge">Ctrl-r</code> +
register name, like in <code class="language-plaintext highlighter-rouge">Ctrl-r r</code>. It will just paste the text in your current
buffer. You can use the <code class="language-plaintext highlighter-rouge">:reg</code> command to see all the registers and their
content, or filter just the ones that you are interested with <code class="language-plaintext highlighter-rouge">:reg a b c</code>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>:reg a b c
--- Registers ---
"a register a content
"b register b content
"c register c content
</code></pre></div></div>
<h3 id="the-unnamed-register">The unnamed register</h3>
<p><code class="language-plaintext highlighter-rouge">vim</code> has a unnamed (or default) register that can be accessed with <code class="language-plaintext highlighter-rouge">""</code>. Any
text that you delete (with <code class="language-plaintext highlighter-rouge">d</code>, <code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">s</code> or <code class="language-plaintext highlighter-rouge">x</code>) or yank (with <code class="language-plaintext highlighter-rouge">y</code>) will be
placed there, and that’s what <code class="language-plaintext highlighter-rouge">vim</code> uses to <code class="language-plaintext highlighter-rouge">p</code>aste, when no explicit register
is given. A simple <code class="language-plaintext highlighter-rouge">p</code> is the same thing as doing <code class="language-plaintext highlighter-rouge">""p</code>.</p>
<h5 id="never-lose-a-yanked-text-again">Never lose a yanked text again</h5>
<p>It has happened to all of us. We yank some text, than delete some other, and
when we try to paste the yanked text, it’s not there anymore, <code class="language-plaintext highlighter-rouge">vim</code> replaced it
with the text that you deleted, then you need to go there and yanked that text
again.<br />
Well, as I said, <code class="language-plaintext highlighter-rouge">vim</code> will always replace the unnamed register, but of course
we didn’t lose the yanked text, <code class="language-plaintext highlighter-rouge">vim</code> would not have survived that long if it
was that dumb, right?</p>
<p><code class="language-plaintext highlighter-rouge">vim</code> automatically populates what is called the <strong>numbered registers</strong> for us.
As expected, these are registers from <code class="language-plaintext highlighter-rouge">"0</code> to <code class="language-plaintext highlighter-rouge">"9</code>.<br />
<code class="language-plaintext highlighter-rouge">"0</code> will always have the content of the latest yank, and the others will have
last 9 deleted text, being <code class="language-plaintext highlighter-rouge">"1</code> the newest, and <code class="language-plaintext highlighter-rouge">"9</code> the oldest. So if you
yanked some text, you can always refer to it using <code class="language-plaintext highlighter-rouge">"0p</code>.</p>
<h3 id="the-read-only-registers">The read only registers</h3>
<p>There are 4 read only registers: <code class="language-plaintext highlighter-rouge">".</code>, <code class="language-plaintext highlighter-rouge">"%</code>, <code class="language-plaintext highlighter-rouge">":</code> and <code class="language-plaintext highlighter-rouge">"#</code><br />
The last inserted text is stored on <code class="language-plaintext highlighter-rouge">".</code>, and it’s quite handy if you need to
write the same text twice, in different places, not needing to yank and paste.</p>
<p><code class="language-plaintext highlighter-rouge">"%</code> has the current file path, starting from the directory where <code class="language-plaintext highlighter-rouge">vim</code> was
first opened. What I usually use it for is to copy the current file to the
clipboard, so I can use it externally (running a script in another terminal, for
instance). You could execute <code class="language-plaintext highlighter-rouge">:let @+=@%</code> to do that. <code class="language-plaintext highlighter-rouge">let</code> is used to write to
a register, and <code class="language-plaintext highlighter-rouge">"+</code> is the clipboard register, so we are copying the current
file path to the clipboard.</p>
<p><code class="language-plaintext highlighter-rouge">":</code> is the most recently executed command. If you save the current buffer with
<code class="language-plaintext highlighter-rouge">:w</code>, “w” will be in this register. A good way to use it is with <code class="language-plaintext highlighter-rouge">@:</code>, to
execute this command again. For example, if you execute a substitute command in
one line, like in <code class="language-plaintext highlighter-rouge">:s/foo/bar</code>, you can just to go another line and execute <code class="language-plaintext highlighter-rouge">@:</code>
to run this substitution again.</p>
<p><code class="language-plaintext highlighter-rouge">"#</code> is the name of the alternate file, that you can think of it as the last
edited file (it’s a bit more complex than that, go to <code class="language-plaintext highlighter-rouge">:h alternate-file</code> if you
want to understand it better). It’s what <code class="language-plaintext highlighter-rouge">vim</code> uses to switch between files
when you use <code class="language-plaintext highlighter-rouge">Ctrl-^</code>, and you could do the same thing with <code class="language-plaintext highlighter-rouge">:e Ctrl-r #</code>. I
rarely use this, but hopefully you are more creative than I am.</p>
<h3 id="the-expression-and-the-search-registers">The expression and the search registers</h3>
<p>The expression register (<code class="language-plaintext highlighter-rouge">"=</code>) is used to deal with results of expressions. This
is easier to understand with an example. If, in insert mode, you type <code class="language-plaintext highlighter-rouge">Ctrl-r
=</code>, you will see a “=” sign in the command line. Then if you type <code class="language-plaintext highlighter-rouge">2+2 <enter></code>,
<code class="language-plaintext highlighter-rouge">4</code> will be printed. This can be used to execute all sort of expressions, even
calling external commands. To give another example, if you type <code class="language-plaintext highlighter-rouge">Ctrl-r =</code> and
then, in the command line, <code class="language-plaintext highlighter-rouge">system('ls') <enter></code>, the output of the <code class="language-plaintext highlighter-rouge">ls</code>
command will be pasted in your buffer.</p>
<p>The search register, as you may have imagined, is where the latest text that you
searched with <code class="language-plaintext highlighter-rouge">/</code>, <code class="language-plaintext highlighter-rouge">?</code>, <code class="language-plaintext highlighter-rouge">*</code> or <code class="language-plaintext highlighter-rouge">#</code> is. If, for example, you just searched for
<code class="language-plaintext highlighter-rouge">/Nietzsche</code>, and now you want to replace it with something else, there is no
way you are going to type “Nietzsche” again, just do <code class="language-plaintext highlighter-rouge">:%s/<Ctrl-r />/mustache/g</code>
and you are good to go.</p>
<h3 id="macros">Macros</h3>
<p>You may already be familiar with <code class="language-plaintext highlighter-rouge">vim</code>’s macros. It’s a way to record a set of
actions that can be executed multiple times (<code class="language-plaintext highlighter-rouge">:h recording</code> if you need more
information). What you probably didn’t know is that <code class="language-plaintext highlighter-rouge">vim</code> uses a register to
store these actions, so if you use <code class="language-plaintext highlighter-rouge">qw</code> to record a macro, the register <code class="language-plaintext highlighter-rouge">"w</code>
will have all the things that you did, it’s all just plain text.</p>
<p>The cool thing about this is that, as it is just a normal register, you can
manipulate it as you want. How many times have you forgotten that step in the
middle of a macro recording and had to do it all over again? Well, fixing that
is as simple as editing a register.</p>
<p>For example, if you forgot to add a semicolon in the end of that <code class="language-plaintext highlighter-rouge">w</code> macro, just
do something like <code class="language-plaintext highlighter-rouge">:let @W='i;'</code>. Noticed the upcased <code class="language-plaintext highlighter-rouge">W</code>? That’s just how we
append a value to a register, using its upcased name, so here we are just
appending the command <code class="language-plaintext highlighter-rouge">i;</code> to the register, to enter insert mode (<code class="language-plaintext highlighter-rouge">i</code>) and add a
semicolon. If you need to edit something in the middle of the register, just do
<code class="language-plaintext highlighter-rouge">:let @w='<Ctrl-r w></code>, change what you want, and close the quotes in the end.
Done, no more recording a macro 10 times before you get it right.</p>
<p>Another cool thing about this is that, as it’s just plain text in a register,
you can easily move macros around, applying it in other <code class="language-plaintext highlighter-rouge">vim</code> instance, or
sharing it with someone else. Think about it, if you have that register in your
clipboard, you can just execute it with <code class="language-plaintext highlighter-rouge">@+</code> (<code class="language-plaintext highlighter-rouge">"+</code> is the clipboard register).
Try it, just write “ivim is awesome” anywhere, then copy it to your clipboard,
and execute <code class="language-plaintext highlighter-rouge">@+</code> in a <code class="language-plaintext highlighter-rouge">vim</code> buffer. How cool is that?</p>
<h3 id="wrapping-up">Wrapping up</h3>
<p>Understanding how registers work is quite simple, and although you are not going
to use them every 5 minutes, it certainly will avoid some annoyances, like
losing a yanked text, of having to record a macro again.
I covered the things that I use the most, but there is more. If you are curious
about what a small delete or a black hole register is, you should definitely
read the short and easy to follow documentation in <code class="language-plaintext highlighter-rouge">:h registers</code>. And if you
want to learn more about <code class="language-plaintext highlighter-rouge">vim</code> in general, the book <a href="https://amzn.to/3xLlaau">Practical
Vim</a> is a great resource.</p>
Implementing a Priority Queue in Ruby2015-01-31T00:00:00+00:00www.brianstorti.com/implementing-a-priority-queue-in-ruby<p>The other day I had to use a priority queue to solve a problem. It was a <code class="language-plaintext highlighter-rouge">Java</code> project, so I already had the <code class="language-plaintext highlighter-rouge">PriorityQueue</code> class ready to be used.
After the code was done, I started to wonder what a solution in <code class="language-plaintext highlighter-rouge">Ruby</code> would look like. And then I discovered that <code class="language-plaintext highlighter-rouge">Ruby</code> does not have a
priority queue implementation in its standard library. How hard could it be to implement my own?</p>
<h3 id="first-some-definitions">First, some definitions</h3>
<p>I just want to define what a queue and a priority queue are before we go to the implementation. If you are already comfortable with these definitions,
feel free to jump to the next section.</p>
<p>A queue is a data structure in which the items added first, will be the first to be removed, also known as first-in first-out. Ruby has a queue implementation
in its standard library, and the usage is quite simple:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">q</span> <span class="o">=</span> <span class="no">Queue</span><span class="p">.</span><span class="nf">new</span>
<span class="n">q</span> <span class="o"><<</span> <span class="mi">1</span>
<span class="n">q</span> <span class="o"><<</span> <span class="mi">2</span>
<span class="n">q</span> <span class="o"><<</span> <span class="mi">3</span>
<span class="n">q</span><span class="p">.</span><span class="nf">pop</span> <span class="c1"># => 1</span>
</code></pre></div></div>
<p>A priority queue is like a queue, where you remove items from the front of the list. The difference is that each element has a priority, and the order of the items
inside the queue is determined by this priority, so the first item to be removed will be the one with the highest priority.</p>
<h3 id="introducing-our-test-element">Introducing our test element</h3>
<p>A priority queue should work with any type of element, not just numbers. As long as there is a way to determine their priority, we should be able use them.<br />
I created this <code class="language-plaintext highlighter-rouge">Element</code> class, which has a <code class="language-plaintext highlighter-rouge">name</code> and a <code class="language-plaintext highlighter-rouge">priority</code>, just so we can use in our tests:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Element</span>
<span class="kp">include</span> <span class="no">Comparable</span>
<span class="nb">attr_accessor</span> <span class="ss">:name</span><span class="p">,</span> <span class="ss">:priority</span>
<span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="nb">name</span><span class="p">,</span> <span class="n">priority</span><span class="p">)</span>
<span class="vi">@name</span><span class="p">,</span> <span class="vi">@priority</span> <span class="o">=</span> <span class="nb">name</span><span class="p">,</span> <span class="n">priority</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf"><</span><span class="o">=></span><span class="p">(</span><span class="n">other</span><span class="p">)</span>
<span class="vi">@priority</span> <span class="o"><=></span> <span class="n">other</span><span class="p">.</span><span class="nf">priority</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>It includes the <code class="language-plaintext highlighter-rouge">Comparable</code> module and implements <code class="language-plaintext highlighter-rouge"><=></code>, so we can compare two <code class="language-plaintext highlighter-rouge">Element</code>s.</p>
<h3 id="the-naive-implementation">The naive implementation</h3>
<p>Let’s start with a very simple (and naive) implementation of a priority queue. The idea is that, every time that we need to remove an item, we will sort
the entire list of elements by their priority, in ascending order, and then we can just return the last element, that will be the one with the highest priority:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">NaivePriorityQueue</span>
<span class="k">def</span> <span class="nf">initialize</span>
<span class="vi">@elements</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf"><<</span><span class="p">(</span><span class="n">element</span><span class="p">)</span>
<span class="vi">@elements</span> <span class="o"><<</span> <span class="n">element</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">pop</span>
<span class="n">last_element_index</span> <span class="o">=</span> <span class="vi">@elements</span><span class="p">.</span><span class="nf">size</span> <span class="o">-</span> <span class="mi">1</span>
<span class="vi">@elements</span><span class="p">.</span><span class="nf">sort!</span>
<span class="vi">@elements</span><span class="p">.</span><span class="nf">delete_at</span><span class="p">(</span><span class="n">last_element_index</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>And we can check that it works:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">q</span> <span class="o">=</span> <span class="no">NaivePriorityQueue</span><span class="p">.</span><span class="nf">new</span>
<span class="n">q</span> <span class="o"><<</span> <span class="no">Element</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"bar"</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">q</span> <span class="o"><<</span> <span class="no">Element</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"foo"</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">q</span> <span class="o"><<</span> <span class="no">Element</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"baz"</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="nb">p</span> <span class="n">q</span><span class="p">.</span><span class="nf">pop</span><span class="p">.</span><span class="nf">name</span> <span class="c1"># => "foo"</span>
</code></pre></div></div>
<p>The problem with this approach is the performance, as you might have imagined. Although we can insert in constant time (<code class="language-plaintext highlighter-rouge">O(1)</code>), the <code class="language-plaintext highlighter-rouge">pop</code> operation is linear (<code class="language-plaintext highlighter-rouge">O(n)</code>) in the best case, meaning that
the operation time will grow linearly and in direct proportion to the size of the <code class="language-plaintext highlighter-rouge">elements</code> list. As the size of the list doubles, the time to perform the operation also
is expected to double.<br />
We can do better.</p>
<h3 id="the-binary-heap">The binary heap</h3>
<p>The most common data structure used to implement a priority queue is the binary heap, that is basically a binary tree with some additional properties.
The binary heap is a <strong>complete binary tree</strong>, meaning that it’s fully balanced, or, in other words, that all the levels of the tree are filled with elements, except possibly for the
last level of the tree.</p>
<p>The other thing that distinguishes a binary heap is that it complies with the <strong>heap property</strong>, meaning that all the nodes are greater (or equal) than their children.
<img src="/assets/images/heap.svg" /></p>
<div class="image-description">
Example of a binary heap. Notice that it's a fully balanced binary tree, where all the nodes are greater than their children
</div>
<p>One thing that is very interesting about binary heaps is that they can be represented as a simple array. There is no need for links or any complex data structure, just a
simple array. If you think about it, it makes a lot of sense. The children of an element at a given index <code class="language-plaintext highlighter-rouge">i</code> will always be in <code class="language-plaintext highlighter-rouge">2i</code> and <code class="language-plaintext highlighter-rouge">2i + 1</code>. The same way, the parent
of this node will be at the index <code class="language-plaintext highlighter-rouge">i/2</code>.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 0 1 2 3 4 5 6 7 8 9</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">19</span><span class="p">,</span> <span class="mi">36</span><span class="p">,</span> <span class="mi">17</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">25</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">7</span><span class="p">]</span>
</code></pre></div></div>
<p>This array represents the tree in the previous image. For instance, if you get the element at the index 4 (<code class="language-plaintext highlighter-rouge">17</code>), you can check that its parent is at the index 2 (<code class="language-plaintext highlighter-rouge">19</code>), and
that its children are at 8 (<code class="language-plaintext highlighter-rouge">2</code>) and 9 (<code class="language-plaintext highlighter-rouge">7</code>). The only caveat here is that we add a <code class="language-plaintext highlighter-rouge">0</code> in the first position of this array, that will never be used, but make our
calculations a bit easier.<br />
You can find the nodes relation by doing simple arithmetic on their indexes. How cool is that?</p>
<h3 id="implementing-a-real-priority-queue">Implementing a real priority queue</h3>
<p>After we understand how a binary heap works, it’s easy to see how it can be used to implement a priority queue. The element with highest priority will always be in the root
of our tree. When we add elements to this queue, we just need to make sure it is placed in the right place to comply with the heap property.</p>
<h5 id="adding-items-to-the-queue">Adding items to the queue</h5>
<p>First we will just append the item to our array:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">PriorityQueue</span>
<span class="k">def</span> <span class="nf">initialize</span>
<span class="vi">@elements</span> <span class="o">=</span> <span class="p">[</span><span class="kp">nil</span><span class="p">]</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf"><<</span><span class="p">(</span><span class="n">element</span><span class="p">)</span>
<span class="vi">@elements</span> <span class="o"><<</span> <span class="n">element</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Just by doing this we already have a complete binary tree. The problem is that it violates the heap property. We need to make sure the node is in the right place of the tree,
meaning that it is greater than its children, and smaller than its parent. This operation of putting a node in its place has many names, the most common being <code class="language-plaintext highlighter-rouge">bubble up</code> or
<code class="language-plaintext highlighter-rouge">heapify up</code>. So let’s implement it:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">bubble_up</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
<span class="n">parent_index</span> <span class="o">=</span> <span class="p">(</span><span class="n">index</span> <span class="o">/</span> <span class="mi">2</span><span class="p">)</span>
<span class="c1"># return if we reach the root element</span>
<span class="k">return</span> <span class="k">if</span> <span class="n">index</span> <span class="o"><=</span> <span class="mi">1</span>
<span class="c1"># or if the parent is already greater than the child</span>
<span class="k">return</span> <span class="k">if</span> <span class="vi">@elements</span><span class="p">[</span><span class="n">parent_index</span><span class="p">]</span> <span class="o">>=</span> <span class="vi">@elements</span><span class="p">[</span><span class="n">index</span><span class="p">]</span>
<span class="c1"># otherwise we exchange the child with the parent</span>
<span class="n">exchange</span><span class="p">(</span><span class="n">index</span><span class="p">,</span> <span class="n">parent_index</span><span class="p">)</span>
<span class="c1"># and keep bubbling up</span>
<span class="n">bubble_up</span><span class="p">(</span><span class="n">parent_index</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">exchange</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
<span class="vi">@elements</span><span class="p">[</span><span class="n">source</span><span class="p">],</span> <span class="vi">@elements</span><span class="p">[</span><span class="n">target</span><span class="p">]</span> <span class="o">=</span> <span class="vi">@elements</span><span class="p">[</span><span class="n">target</span><span class="p">],</span> <span class="vi">@elements</span><span class="p">[</span><span class="n">source</span><span class="p">]</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Now we just need to call it after we add a new element:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf"><<</span><span class="p">(</span><span class="n">element</span><span class="p">)</span>
<span class="vi">@elements</span> <span class="o"><<</span> <span class="n">element</span>
<span class="c1"># bubble up the element that we just added</span>
<span class="n">bubble_up</span><span class="p">(</span><span class="vi">@elements</span><span class="p">.</span><span class="nf">size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>By doing this we already have a complete binary tree that complies with the heap property. Let’s confirm that:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">q</span> <span class="o">=</span> <span class="no">PriorityQueue</span><span class="p">.</span><span class="nf">new</span>
<span class="n">q</span> <span class="o"><<</span> <span class="mi">2</span>
<span class="n">q</span> <span class="o"><<</span> <span class="mi">3</span>
<span class="n">q</span> <span class="o"><<</span> <span class="mi">1</span>
<span class="nb">p</span> <span class="n">q</span><span class="p">.</span><span class="nf">elements</span> <span class="c1"># => [nil, 3, 2, 1]</span>
</code></pre></div></div>
<h5 id="removing-items-from-the-queue">Removing items from the queue</h5>
<p>The only thing missing to have a fully working priority queue is the ability to dequeue items.<br />
As we are using a binary heap, we can assume that the root element will always be the one with the highest priority.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">pop</span>
<span class="c1"># the first element will always be the max, because of the heap constraint</span>
<span class="vi">@elements</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Now we can already retrieve the element with the highest priority. The only problem is that we are not actually removing it from
the queue.<br />
What we are going to do is something similar to what we did when we were adding items to the queue. We will exchange the root element
with the last element of the queue, and then perform a process called <code class="language-plaintext highlighter-rouge">bubble down</code> (or <code class="language-plaintext highlighter-rouge">heapify-down</code>) in the new root element, to put
it in the correct place of the tree.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">pop</span>
<span class="c1"># exchange the root with the last element</span>
<span class="n">exchange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="vi">@elements</span><span class="p">.</span><span class="nf">size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1"># remove the last element of the list</span>
<span class="n">max</span> <span class="o">=</span> <span class="vi">@elements</span><span class="p">.</span><span class="nf">pop</span>
<span class="c1"># and make sure the tree is ordered again</span>
<span class="n">bubble_down</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">max</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">bubble_down</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
<span class="n">child_index</span> <span class="o">=</span> <span class="p">(</span><span class="n">index</span> <span class="o">*</span> <span class="mi">2</span><span class="p">)</span>
<span class="c1"># stop if we reach the bottom of the tree</span>
<span class="k">return</span> <span class="k">if</span> <span class="n">child_index</span> <span class="o">></span> <span class="vi">@elements</span><span class="p">.</span><span class="nf">size</span> <span class="o">-</span> <span class="mi">1</span>
<span class="c1"># make sure we get the largest child</span>
<span class="n">not_the_last_element</span> <span class="o">=</span> <span class="n">child_index</span> <span class="o"><</span> <span class="vi">@elements</span><span class="p">.</span><span class="nf">size</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">left_element</span> <span class="o">=</span> <span class="vi">@elements</span><span class="p">[</span><span class="n">child_index</span><span class="p">]</span>
<span class="n">right_element</span> <span class="o">=</span> <span class="vi">@elements</span><span class="p">[</span><span class="n">child_index</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">child_index</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">not_the_last_element</span> <span class="o">&&</span> <span class="n">right_element</span> <span class="o">></span> <span class="n">left_element</span>
<span class="c1"># there is no need to continue if the parent element is already bigger</span>
<span class="c1"># then its children</span>
<span class="k">return</span> <span class="k">if</span> <span class="vi">@elements</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">>=</span> <span class="vi">@elements</span><span class="p">[</span><span class="n">child_index</span><span class="p">]</span>
<span class="n">exchange</span><span class="p">(</span><span class="n">index</span><span class="p">,</span> <span class="n">child_index</span><span class="p">)</span>
<span class="c1"># repeat the process until we reach a point where the parent</span>
<span class="c1"># is larger than its children</span>
<span class="n">bubble_down</span><span class="p">(</span><span class="n">child_index</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The only caveat here is that logic to make sure we are always comparing against the largest child.</p>
<p>And that’s all we need to have a working priority queue!</p>
<h3 id="comparing-the-two-implementations">Comparing the two implementations</h3>
<p>Just out of curiosity, let’s run a simple benchmark to compare the performance of our real implementation, using the binary heap, with
the naive implementation, that just sorts the array for every <code class="language-plaintext highlighter-rouge">pop</code> operation.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'benchmark/ips'</span>
<span class="nb">require_relative</span> <span class="s1">'element'</span>
<span class="nb">require_relative</span> <span class="s1">'naive_priority_queue'</span>
<span class="nb">require_relative</span> <span class="s1">'priority_queue'</span>
<span class="n">naive</span> <span class="o">=</span> <span class="no">NaivePriorityQueue</span><span class="p">.</span><span class="nf">new</span>
<span class="n">real</span> <span class="o">=</span> <span class="no">PriorityQueue</span><span class="p">.</span><span class="nf">new</span>
<span class="mi">100_000</span><span class="p">.</span><span class="nf">times</span> <span class="k">do</span> <span class="o">|</span><span class="n">i</span><span class="o">|</span>
<span class="n">naive</span> <span class="o"><<</span> <span class="no">Element</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"Foo </span><span class="si">#{</span><span class="n">i</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
<span class="n">real</span> <span class="o"><<</span> <span class="no">Element</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"Foo </span><span class="si">#{</span><span class="n">i</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
<span class="k">end</span>
<span class="no">Benchmark</span><span class="p">.</span><span class="nf">ips</span> <span class="k">do</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span>
<span class="n">x</span><span class="p">.</span><span class="nf">report</span><span class="p">(</span><span class="s2">"naive"</span><span class="p">)</span> <span class="p">{</span> <span class="n">naive</span><span class="p">.</span><span class="nf">pop</span> <span class="p">}</span>
<span class="n">x</span><span class="p">.</span><span class="nf">report</span><span class="p">(</span><span class="s2">"real"</span><span class="p">)</span> <span class="p">{</span> <span class="n">real</span><span class="p">.</span><span class="nf">pop</span> <span class="p">}</span>
<span class="n">x</span><span class="p">.</span><span class="nf">compare!</span>
<span class="k">end</span>
</code></pre></div></div>
<p>And the results:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Calculating -------------------------------------
naive 4.000 i/100ms
real 17.516k i/100ms
-------------------------------------------------
naive 46.521 (± 2.1%) i/s - 236.000
real 1.756M (± 8.6%) i/s - 8.723M
Comparison:
real: 1755960.4 i/s
naive: 46.5 i/s - 37745.64x slower
</code></pre></div></div>
<p>In my machine, the naive implementation is about <strong>37,745 times slower</strong> than the binary heap one. That’s a pretty big difference.</p>
<h3 id="wrapping-up">Wrapping up</h3>
<p>The priority queue is a very useful data structure, that can be used to solve a bunch of problems, from <a href="http://en.wikipedia.org/wiki/Scheduling_%28computing%29">thread scheduling</a>
to <a href="http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm">graph</a> <a href="http://en.wikipedia.org/wiki/Prim%27s_algorithm">searching</a> algorithms.<br />
We can have a fully working (and with a quite good performance) priority queue with less than 50 lines of Ruby. And even if you are
working with another language, that has a priority queue implementation in its standard library, implementing your own is always a good
exercise to understand how things work under the hood.</p>
<p>You can see the entire solution in <a href="https://gist.github.com/brianstorti/e20300eb2e7d62b87849">this gist</a>.</p>
Enough sed to be useful2015-01-20T00:00:00+00:00www.brianstorti.com/enough-sed-to-be-useful<p><code class="language-plaintext highlighter-rouge">sed</code> is a text editor that is probably already installed in your machine and can help you be more productive. It can make the boring and time consuming task
of editing multiple files a breeze, and it shouldn’t take more than a few minutes to learn the basics.</p>
<h3 id="the-stream-editor">The stream editor</h3>
<p><code class="language-plaintext highlighter-rouge">sed</code> is a non-interactive editor. It means that, unlike most of the text editors that you’re probably used to, like <code class="language-plaintext highlighter-rouge">vim</code> or <code class="language-plaintext highlighter-rouge">sublime</code>, it just reads
a set of commands from a script and execute these commands in a given file.<br />
This is the command syntax that is the most commonly used:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sed</span> <span class="nt">-e</span> <span class="s1">'script commands here'</span> file.txt
</code></pre></div></div>
<p>This way we can pass an inline command. We could also change the <code class="language-plaintext highlighter-rouge">-e</code> flag for a <code class="language-plaintext highlighter-rouge">-f</code> and give it a filename where the script is defined:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sed</span> <span class="nt">-f</span> scriptfile file.txt
</code></pre></div></div>
<blockquote>
<p>Just read sed’s manpage if you want to learn more about the other arguments that it accepts.</p>
</blockquote>
<h3 id="the-structure-of-a-sed-command">The structure of a sed command</h3>
<p>A <code class="language-plaintext highlighter-rouge">sed</code> command consists of an <strong>address</strong> and an <strong>editing instruction</strong>.<br />
The address tells <code class="language-plaintext highlighter-rouge">sed</code> in which lines the command should be executed, and the editing instruction tells it what to do with these lines.</p>
<h5 id="the-address">The address</h5>
<p>An address can be a line number, a regular expression or a special symbol, like <code class="language-plaintext highlighter-rouge">$</code> for the last line and <code class="language-plaintext highlighter-rouge">\n</code> for newlines.</p>
<p>This address is optional, and if no address is provided, <code class="language-plaintext highlighter-rouge">sed</code> will just execute the command for every line. You can also provide one or two addresses,
one meaning that the command should be executed just for lines that match that given address, and two meaning that the command should be executed for the
lines between the first and the second address (inclusive). These addresses are separated by a comma.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sed</span> <span class="nt">-e</span> <span class="s1">'1,3 command'</span> file.txt
<span class="c"># will execute the command for the first, second and third line.</span>
<span class="nv">$ </span><span class="nb">sed</span> <span class="nt">-e</span> <span class="s1">'/PATTERN/ command'</span> file.txt
<span class="c"># will execute the command just for lines that match the pattern</span>
<span class="nv">$ </span><span class="nb">sed</span> <span class="nt">-e</span> <span class="s1">'/BEGIN/,/END/ command'</span> file.txt
<span class="c"># will execute the command starting in the line that matches BEGIN, until the lines that matches END</span>
</code></pre></div></div>
<h5 id="the-editing-instruction">The editing instruction</h5>
<p>These instructions are single characters that tell <code class="language-plaintext highlighter-rouge">sed</code> what to do with the current line. The most used (and, maybe, the most useful)
editing instructions is <code class="language-plaintext highlighter-rouge">s</code>, to substitute a pattern:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">echo</span> <span class="s1">'foo bar foo baz'</span> | <span class="nb">sed</span> <span class="nt">-e</span> <span class="s1">'s/foo/FOO/'</span>
<span class="c"># FOO bar foo baz</span>
</code></pre></div></div>
<p>These commands can also receive an argument (or flag), that will tell them how to operate. For instance, we could tell <code class="language-plaintext highlighter-rouge">s</code> to substitute all occurrences in that line, instead
of just the first one, passing the <code class="language-plaintext highlighter-rouge">g</code> flag:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">echo</span> <span class="s1">'foo bar foo baz'</span> | <span class="nb">sed</span> <span class="nt">-e</span> <span class="s1">'s/foo/FOO/g'</span>
<span class="c"># FOO bar FOO baz</span>
</code></pre></div></div>
<p>If you want to execute more than one command for a matched address, just put them all around curly brackets (<code class="language-plaintext highlighter-rouge">{}</code>):</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sed</span> <span class="nt">-e</span> <span class="s1">'/bar/{
s/a/A/g
a\
this line was appended
}'</span> file.txt
</code></pre></div></div>
<p>This command will read <code class="language-plaintext highlighter-rouge">file.txt</code>, and for every line that matches the pattern <code class="language-plaintext highlighter-rouge">/bar/</code>, it will <code class="language-plaintext highlighter-rouge">s</code>ubstitute “a” for “A”, and then <code class="language-plaintext highlighter-rouge">a</code>ppend a new line that says “this line was appended”.
You can also do different things to different addresses, in the same script:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sed</span> <span class="nt">-e</span> <span class="s1">'/bar/{
s/a/A/g
a\
this line was appended
}
/foo/d'</span> file.txt
</code></pre></div></div>
<p>So this script will do the same things as the one before, but it will also <code class="language-plaintext highlighter-rouge">d</code>elete all the lines that match <code class="language-plaintext highlighter-rouge">/foo/</code>.</p>
<p>And that’s pretty much it. You know how to provide addresses and tell <code class="language-plaintext highlighter-rouge">sed</code> what to do with these lines, that’s the basic structure of how you will interact with <code class="language-plaintext highlighter-rouge">sed</code> most of the time.
Of course there are a lot more editing instructions, and a quick look at the manpage can show you more about how powerful <code class="language-plaintext highlighter-rouge">sed</code> can be.</p>
<h3 id="some-caveats">Some caveats</h3>
<p>If you are following the examples, you probably have noticed that <code class="language-plaintext highlighter-rouge">sed</code> never actually changes the original file, it just shows you the result of the script execution.<br />
<code class="language-plaintext highlighter-rouge">sed</code> stands for “stream editor” for a reason. Like most UNIX programs, it receives an input and directs the results of the script execution to the standard output. What if you
really want to change the file with your script?</p>
<p>We can pass a <code class="language-plaintext highlighter-rouge">-i</code> argument to <code class="language-plaintext highlighter-rouge">sed</code>, that’s pretty useful. It tells <code class="language-plaintext highlighter-rouge">sed</code> to create a backup, with the given extension, that has the content of the file before the script was applied:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sed</span> <span class="nt">-i</span> <span class="s1">'.bkp'</span> <span class="nt">-e</span> <span class="s1">'/foo/d'</span> file.txt
</code></pre></div></div>
<p>This command will execute the script (removing all the lines that match <code class="language-plaintext highlighter-rouge">/foo/</code>), and will create a <code class="language-plaintext highlighter-rouge">file.txt.bkp</code>, with the original content.<br />
If you are confident your script works fine, just pass an empty string, and <code class="language-plaintext highlighter-rouge">sed</code> will replace the original file:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sed</span> <span class="nt">-i</span><span class="s1">''</span> <span class="nt">-e</span> <span class="s1">'/foo/d'</span> file.txt
<span class="c"># file.txt is changed, and not backup is created.</span>
</code></pre></div></div>
<h5 id="sed-works-better-with-friends"><code class="language-plaintext highlighter-rouge">sed</code> works better with friends</h5>
<p>And its best friend is usually <code class="language-plaintext highlighter-rouge">find</code>. This is a utility to, well, find something in your filesystem. We’ll usually use it to find files that we want to change.</p>
<p>Just pretend you made a terrible mistake and logged some users’ password. You are rotating your logs, so you can have dozens of log files, and now
you need to find and remove all the lines with the word “password”, from all your log files. Now that you know how <code class="language-plaintext highlighter-rouge">sed</code> works, this should be easy:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>find <span class="nb">.</span> <span class="nt">-name</span> <span class="s2">"*.log"</span> <span class="nt">-exec</span> <span class="nb">sed</span> <span class="nt">-i</span><span class="s1">''</span> <span class="nt">-e</span> <span class="s1">'/password/d'</span> <span class="o">{}</span> <span class="se">\;</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">find</code> will locate every log file, and then execute a <code class="language-plaintext highlighter-rouge">sed</code> command, and what it does is probably already familiar for you by now. <code class="language-plaintext highlighter-rouge">{}</code> just means “add the file that you found here”,
so there is nothing different from the examples that we have seen so far.</p>
<h3 id="conclusion">Conclusion</h3>
<p><code class="language-plaintext highlighter-rouge">sed</code> is a very powerful tool that can save us precious time with just a few keystrokes. We always try to use the best tool for the job, and <code class="language-plaintext highlighter-rouge">sed</code> is a good candidate for a lot
of common jobs that we have to do quite frequently.</p>
<p>I covered here just the basics of how to use it to perform some simple tasks, but your
imagination is the limit. If you want to get inspiration, or just want to see some cool things that can be done, <a href="http://sed.sourceforge.net/sed1line.txt">this file</a> has a bunch of
interesting one-liners.</p>
Understanding Ruby's idiom: array.map(&:method)2015-01-10T00:00:00+00:00www.brianstorti.com/understanding-ruby-idiom-map-with-symbol<p>Ruby has some idioms that are used pretty commonly, but not very often understood. <code class="language-plaintext highlighter-rouge">array.map(&:method_name)</code> is one of them.
We can see it being used everywhere to call a method on every <code class="language-plaintext highlighter-rouge">array</code> element, but why does this work? What’s really happening under the hood?</p>
<h2 id="in-case-you-dont-know-rubys-map">In case you don’t know Ruby’s <code class="language-plaintext highlighter-rouge">map</code></h2>
<p><code class="language-plaintext highlighter-rouge">map</code> is used to execute a block of code for each element of a given <code class="language-plaintext highlighter-rouge">Enumerable</code> object, like an <code class="language-plaintext highlighter-rouge">Array</code>. Here’s an example:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Foo</span>
<span class="k">def</span> <span class="nf">method_name</span>
<span class="nb">puts</span> <span class="s2">"method called for </span><span class="si">#{</span><span class="nb">object_id</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="p">[</span><span class="no">Foo</span><span class="p">.</span><span class="nf">new</span><span class="p">,</span> <span class="no">Foo</span><span class="p">.</span><span class="nf">new</span><span class="p">].</span><span class="nf">map</span> <span class="k">do</span> <span class="o">|</span><span class="n">element</span><span class="o">|</span>
<span class="n">element</span><span class="p">.</span><span class="nf">method_name</span>
<span class="k">end</span>
<span class="c1"># => method called for 70339841711300</span>
<span class="c1"># => method called for 70339841711280</span>
</code></pre></div></div>
<p>As we are just calling <code class="language-plaintext highlighter-rouge">method_name</code> for each element of the list, Ruby allows us to use this idiom:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="no">Foo</span><span class="p">.</span><span class="nf">new</span><span class="p">,</span> <span class="no">Foo</span><span class="p">.</span><span class="nf">new</span><span class="p">].</span><span class="nf">map</span><span class="p">(</span><span class="o">&</span><span class="ss">:method_name</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="what-ruby-does-when-it-sees-">What Ruby does when it sees <code class="language-plaintext highlighter-rouge">&</code></h2>
<p>The first thing that happens is that, whenever Ruby sees a <code class="language-plaintext highlighter-rouge">&</code> for a parameter, it wants this parameter to be a <code class="language-plaintext highlighter-rouge">Proc</code>. If this is not the case already, Ruby calls <code class="language-plaintext highlighter-rouge">#to_proc</code> on this
object to convert it. Let’s confirm this is true:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">MyClass</span>
<span class="k">def</span> <span class="nf">to_proc</span>
<span class="nb">puts</span> <span class="s2">"trying to convert to a proc"</span>
<span class="no">Proc</span><span class="p">.</span><span class="nf">new</span> <span class="p">{}</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="p">[].</span><span class="nf">map</span><span class="p">(</span><span class="o">&</span><span class="no">MyClass</span><span class="p">.</span><span class="nf">new</span><span class="p">)</span>
<span class="c1"># => trying to convert to a proc</span>
</code></pre></div></div>
<blockquote>
<p>If you don’t know what a <code class="language-plaintext highlighter-rouge">Proc</code> is, you can consider it to be just like a <code class="language-plaintext highlighter-rouge">lambda</code> or a <code class="language-plaintext highlighter-rouge">closure</code>.
It’s a piece of code that can be moved around and executed (by calling <code class="language-plaintext highlighter-rouge">call()</code> on it, for instance).</p>
</blockquote>
<p>As we passed a <code class="language-plaintext highlighter-rouge">MyClass</code> instance with <code class="language-plaintext highlighter-rouge">&</code> to <code class="language-plaintext highlighter-rouge">map</code>, it tried to call <code class="language-plaintext highlighter-rouge">to_proc</code> on it. This holds true for any method call, not just <code class="language-plaintext highlighter-rouge">map</code>.</p>
<p>Back to the previous example, we are calling <code class="language-plaintext highlighter-rouge">map</code> with <code class="language-plaintext highlighter-rouge">&:method_name</code>. So we know that Ruby will see that <code class="language-plaintext highlighter-rouge">&</code> and try to call <code class="language-plaintext highlighter-rouge">:method_name.to_proc</code>. The next step
is to understand what <code class="language-plaintext highlighter-rouge">Symbol#to_proc</code> does.</p>
<h2 id="symbols-smart-to_proc-implementation">Symbol’s smart <code class="language-plaintext highlighter-rouge">to_proc</code> implementation</h2>
<p>What <code class="language-plaintext highlighter-rouge">Symbol#to_proc</code> does is quite clever. It tries to calls a method with the same name (in our example, <code class="language-plaintext highlighter-rouge">method_name</code>) on the given object.</p>
<p>Maybe an example will make more sense:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="ss">:upcase</span><span class="p">.</span><span class="nf">to_proc</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="s2">"string"</span><span class="p">)</span>
<span class="c1"># => STRING</span>
</code></pre></div></div>
<p>When we call <code class="language-plaintext highlighter-rouge">to_proc</code> on the <code class="language-plaintext highlighter-rouge">:upcase</code> symbol, it will return a <code class="language-plaintext highlighter-rouge">Proc</code> object that just call the <code class="language-plaintext highlighter-rouge">upcase</code> method for the given parameter (“string”).</p>
<h2 id="implementing-our-own-version">Implementing our own version</h2>
<p>One of the approaches that I like to take to understand how something works is to create my own dumb implementation of it. After we understand all the building blocks
that make this idiom work, this should not be that hard.</p>
<p>First, let’s implement our own <code class="language-plaintext highlighter-rouge">map</code> method:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">my_map</span><span class="p">(</span><span class="n">enumerable</span><span class="p">,</span> <span class="o">&</span><span class="n">block</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">enumerable</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">element</span><span class="o">|</span> <span class="n">result</span> <span class="o"><<</span> <span class="n">block</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="n">element</span><span class="p">)</span> <span class="p">}</span>
<span class="n">result</span>
<span class="k">end</span>
</code></pre></div></div>
<p>We iterate over the <code class="language-plaintext highlighter-rouge">Enumerable</code> object and execute that given block. We know that <code class="language-plaintext highlighter-rouge">block</code> is going to be a <code class="language-plaintext highlighter-rouge">Proc</code>, because Ruby called <code class="language-plaintext highlighter-rouge">to_proc</code> on it, so we can just <code class="language-plaintext highlighter-rouge">call</code> it.<br />
And this works.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">my_map</span><span class="p">([</span><span class="s2">"foo"</span><span class="p">,</span> <span class="s2">"bar"</span><span class="p">],</span> <span class="o">&</span><span class="ss">:upcase</span><span class="p">)</span>
<span class="c1"># => ["FOO", "BAR"]</span>
</code></pre></div></div>
<p>Now let’s implement our own <code class="language-plaintext highlighter-rouge">Symbol</code> functionality:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">MySymbol</span>
<span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">method_name</span><span class="p">)</span>
<span class="vi">@method_name</span> <span class="o">=</span> <span class="n">method_name</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">to_proc</span>
<span class="no">Proc</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span> <span class="o">|</span><span class="n">element</span><span class="o">|</span>
<span class="n">element</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="vi">@method_name</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>We know that we just need to implement the <code class="language-plaintext highlighter-rouge">to_proc</code> method that Ruby is going to call and make it return a <code class="language-plaintext highlighter-rouge">Proc</code> object.<br />
As this is not really a <code class="language-plaintext highlighter-rouge">Symbol</code>, we will define the method to be called in the constructor. The method name is dynamic, so we
need to use Ruby’s <code class="language-plaintext highlighter-rouge">send</code> to call it.<br />
And this works.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">my_map</span><span class="p">([</span><span class="s2">"foo"</span><span class="p">,</span> <span class="s2">"bar"</span><span class="p">],</span> <span class="o">&</span><span class="no">MySymbol</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"upcase"</span><span class="p">))</span>
<span class="c1"># => ["FOO", "BAR"]</span>
</code></pre></div></div>
<h2 id="summarizing">Summarizing</h2>
<ul>
<li>Ruby instantiates a <code class="language-plaintext highlighter-rouge">MySymbol</code> object;</li>
<li>Ruby checks that there is a <code class="language-plaintext highlighter-rouge">&</code> and calls <code class="language-plaintext highlighter-rouge">to_proc</code> on this object;</li>
<li><code class="language-plaintext highlighter-rouge">MySymbol#to_proc</code> returns a <code class="language-plaintext highlighter-rouge">Proc</code> object, that expects a parameter (<code class="language-plaintext highlighter-rouge">element</code>) and calls a method on it (<code class="language-plaintext highlighter-rouge">upcase</code>);</li>
<li><code class="language-plaintext highlighter-rouge">my_map</code> iterates over the received list (<code class="language-plaintext highlighter-rouge">['foo', 'bar']</code>) and calls the received <code class="language-plaintext highlighter-rouge">Proc</code> on each element, passing it as a parameter (<code class="language-plaintext highlighter-rouge">block.call(element)</code>);</li>
<li>The <code class="language-plaintext highlighter-rouge">Proc</code> then executes <code class="language-plaintext highlighter-rouge">element.send("upcase")</code>, that is basically the same as <code class="language-plaintext highlighter-rouge">"foo".upcase</code>, and will return the expected result.</li>
</ul>
The role of a reverse proxy to protect your application against slow clients2015-01-06T00:00:00+00:00www.brianstorti.com/the-role-of-a-reverse-proxy-to-protect-your-application-against-slow-clients<p>When you are running an application server that uses a forking model, slow clients can make your application simply stop handling new requests.
Slow clients can be just users with a slow connection sending a large request, or an attacker, being slow on purpose. I’ll try to explain what
these slow clients are and how a reverse proxy can be used to protect your application server against them.</p>
<p>But first we need to understand what a forking model is.</p>
<h2 id="the-forking-model">The forking model</h2>
<p>Saying that an application server uses a forking model, simply put, means that it will spawn processes to handle new requests. As long as its
concurrency strategy is based on using new processes to handle more requests, we can consider it to be using a forking model.</p>
<p>A well known application server that follows this strategy is <a href="http://unicorn.bogomips.org/">Unicorn</a>.</p>
<h2 id="the-problem-with-slow-clients">The problem with slow clients</h2>
<p>Let’s try to imagine this scenario: There is an user trying to access you application. This user has a really terrible connection,
maybe he/she is using a mobile network (or a 56k modem, who knows?), and this user is trying to send you a large request, a 5MB picture, for instance.</p>
<p>When your application server receives this requests, it spawns a new process to handle it, and start receiving that large request. The process will be
bottlenecked by the speed of the client connection, and it will be blocked until the slow client finishes sending that large picture. Being blocked means that
this worker process can not handle any other request in the meantime, it’s just there, idle, waiting to receive the entire request so it can start really processing it.<br />
The same problem happens when it needs to send back a response. If the client is slow to receive it, the process will keep blocked, not being able to handle other requests.</p>
<p>When all of your worker processes are busy (maybe just because they are blocked by slow clients), your application stops receiving any new requests. That’s not good.</p>
<h2 id="buffering-reverse-proxies-for-the-rescue">Buffering reverse proxies for the rescue</h2>
<p>A reverse proxy (like <a href="http://nginx.com/">Nginx</a>) seats in front of your application server (say, Unicorn), and can offer a sort of buffering system.</p>
<p>This buffering reverse proxy can handle an “unlimited” number of requests, and is not affected by slow clients.<br />
Nginx, for instance, uses a non-blocking Evented I/O model (rather than the I/O blocking forking model that Unicorn uses), which means that,
when it receives a new request, it will perform a read call (I/O operation), and will not be blocked waiting for a response, being immediately
available to handle new requests. When the read operation finishes, the operational system will send an event notification, and the appropriate event handler can be called
(passing the request to the application server, for instance).</p>
<p>The scenario above, with a buffering reverse proxy, would be something like this: The slow client makes a large request. The buffering reverse proxy will wait
until it gets the entire request, then it will pass this request to the application server, which will just process it and deliver the response back to the
reverse proxy, being free to receive new requests. The reverse proxy then will send this response back to the slow client.<br />
It doesn’t really matter much that it will take a long time to receive and deliver the requests/responses to these clients, as the reverse proxy will not be
blocked by these I/O operations (due to its concurrency model nature).</p>
<p>Now the application server processes are responsible just for processing the request, not being blocked by these slow clients anymore.</p>
<p>The conclusion here is that, if you are using an application server that is blocked by I/O operations, it’s a pretty good idea to put a reverse proxy in front
of it, that can handle this kind of situations (and, possibly, do a lot more).</p>
Working with HTTP cache2014-09-27T00:00:00+00:00www.brianstorti.com/working-with-http-cache<p>The fastest network request is a request not performed. That’s the job a HTTP cache: avoid unnecessary work. By understanding
how it works, we can create web applications and APIs that are more responsive, by reducing the latency and the amount of used
bandwidth.</p>
<p>There are two main types of cache: The <em>private</em> and the <em>shared</em>.</p>
<p>A private cache is what the web browser (or any other HTTP agent) stores locally, in each client’s computer.<br />
A shared cache is something that sits between the client and the origin server, and can serve multiple clients. It acts as a proxy,
that intercepts requests and decides if the origin server needs to be called.</p>
<p>There are two aspects of a request that are analysed before asking a new version of a representation: <strong>Freshness</strong> and <strong>validity</strong>.</p>
<h2 id="freshness">Freshness</h2>
<p>When a representation that is stored in the cache is considered fresh, there is no need to even perform a request to the origin server, it can be
served right away.<br />
There are two <a href="http://tools.ietf.org/html/rfc2616#section-4.2">HTTP headers</a> used to indicate if a representation is fresh or not: <code class="language-plaintext highlighter-rouge">Expires</code> and <code class="language-plaintext highlighter-rouge">Cache-Control</code>.</p>
<h4 id="expires">Expires</h4>
<p>The <code class="language-plaintext highlighter-rouge">Expires</code> header indicates when that representation should be considered stale (not fresh). It expects a specific HTTP date. Here’s an example:</p>
<div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span>
<span class="na">Content-Length</span><span class="p">:</span> <span class="s">31225</span>
<span class="na">Content-type</span><span class="p">:</span> <span class="s">text/html</span>
<span class="na">Expires</span><span class="p">:</span> <span class="s">Mon, 29 Sep 2014 10:00:00 GMT</span>
[RESPONSE BODY]
</code></pre></div></div>
<p>Notice that if the date format is not correct, it will be considered stale. Also, you need to make sure that your web server clock and the cache
are synchronized.</p>
<h4 id="cache-control">Cache-Control</h4>
<p>In HTTP 1.1, the <code class="language-plaintext highlighter-rouge">Cache-Control</code> header is an alternative to <code class="language-plaintext highlighter-rouge">Expires</code>. If both the <code class="language-plaintext highlighter-rouge">Expires</code> and <code class="language-plaintext highlighter-rouge">Cache-Control</code> headers are found, <code class="language-plaintext highlighter-rouge">Expires</code>
will be ignored.</p>
<p><code class="language-plaintext highlighter-rouge">Cache-Control</code> works with a bunch of directives to specify how it should behave. We will talk about three of them: <code class="language-plaintext highlighter-rouge">max-age</code>, <code class="language-plaintext highlighter-rouge">private</code> and <code class="language-plaintext highlighter-rouge">no-cache</code>.
You can see the entire list <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9">here</a>.</p>
<p><strong>max-age</strong>: This directive specifies for how many seconds (from the request time) the representation should be considered fresh. It works like the <code class="language-plaintext highlighter-rouge">Expires</code> header,
but without the date issues.</p>
<p><strong>private</strong>: Allows just a private cache to store it, but never a shared cache. This directive is used when the response is intended for a single user,
so it makes no sense to store it in a shared cache.</p>
<p><strong>no-cache</strong>: As the name says, it makes the request always be sent to the origin server.</p>
<p>Here’s an example:</p>
<div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span>
<span class="na">Content-Length</span><span class="p">:</span> <span class="s">31225</span>
<span class="na">Content-Type</span><span class="p">:</span> <span class="s">text/html</span>
<span class="na">Cache-Control</span><span class="p">:</span> <span class="s">max-age=3600; private</span>
[RESPONSE BODY]
</code></pre></div></div>
<h2 id="validation">Validation</h2>
<p>When a representation is considered stale (e.g. the <code class="language-plaintext highlighter-rouge">max-age</code> was exceeded), a request must be sent to the origin server. Although we need to pay the price of
a network request, if we can identify that the representation is still the same, we can save some bandwidth by not sending this representation again.
That’s the job of the validation process, and this is done with what is called a <strong>conditional request</strong>.</p>
<p>There are two headers that can be used to support conditional requests, <code class="language-plaintext highlighter-rouge">Last-Modified</code> and <code class="language-plaintext highlighter-rouge">Etag</code>.</p>
<h4 id="last-modified">Last-Modified</h4>
<p>The <code class="language-plaintext highlighter-rouge">Last-Modified</code> header contains a date that tells the client when this representation last changed.</p>
<div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span>
<span class="na">Content-Length</span><span class="p">:</span> <span class="s">44181</span>
<span class="na">Content-type</span><span class="p">:</span> <span class="s">text/html</span>
<span class="na">Last-Modified</span><span class="p">:</span> <span class="s">Sun, 28 Set 2014 10:00:00 GMT</span>
[RESPONSE BODY]
</code></pre></div></div>
<p>When a client receives a response that includes a <code class="language-plaintext highlighter-rouge">Last-Modified</code> header, it takes note of that, and, when it needs to perform the same request again,
it includes a <code class="language-plaintext highlighter-rouge">If-Modified-Since</code> in the request headers, with the date that it received before:</p>
<div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">GET</span> <span class="nn">/</span> <span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">If-Modified-Since</span><span class="p">:</span> <span class="s">Sun, 28 Set 2014 10:00:00 GMT</span>
[REQUEST BODY]
</code></pre></div></div>
<p>The origin server then checks if the representation was changed after the date received in the <code class="language-plaintext highlighter-rouge">If-Modified-Since</code> header, and, if it was not changed, it
just sends a <code class="language-plaintext highlighter-rouge">304 Not Modified</code> response:</p>
<div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">304</span> <span class="ne">Not Modified</span>
<span class="na">Content-Length</span><span class="p">:</span> <span class="s">0</span>
<span class="na">Last-Modified</span><span class="p">:</span> <span class="s">Sun, 28 Set 2014 10:00:00 GMT</span>
</code></pre></div></div>
<p>Even thought we still had to perform a network request, we avoid sending the same representation in the body, saving some bandwidth.</p>
<h4 id="etag">Etag</h4>
<p>This is an “entity tag” that contains a string that changes whenever the representation changes. Usually a MD5 hash is used but it can be whatever you want.<br />
It will work in the same way <code class="language-plaintext highlighter-rouge">Last-Modified</code> does. The benefit is that you don’t need to keep track of the modification date of a representation, as long as you
use always the same algorithm to generate the <code class="language-plaintext highlighter-rouge">Etag</code> value (and you should be using), it can be regenerated when you need.</p>
<div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span>
<span class="na">Content-Length</span><span class="p">:</span> <span class="s">44181</span>
<span class="na">Content-type</span><span class="p">:</span> <span class="s">text/html</span>
<span class="na">Etag</span><span class="p">:</span> <span class="s">"78q9y7-b37r-0o9a3bc"</span>
[RESPONSE BODY]
</code></pre></div></div>
<p>The client will save this <code class="language-plaintext highlighter-rouge">Etag</code> value and send it back in a <code class="language-plaintext highlighter-rouge">If-None-Match</code> header for the next requests.</p>
<div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">GET</span> <span class="nn">/</span> <span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">If-None-Match</span><span class="p">:</span> <span class="s">"78q9y7-b37r-0o9a3bc"</span>
[REQUEST BODY]
</code></pre></div></div>
<p>Then, if the origin server determines that the received value is still the same for the generated representation,
it can just sent a <code class="language-plaintext highlighter-rouge">304 Not Modified</code>, saving some bandwidth.</p>
<h2 id="http-cache-at-work-step-by-step">HTTP cache at work, step by step</h2>
<p>Putting the pieces together, we can have this scenario:</p>
<p>1) A request is performed to <code class="language-plaintext highlighter-rouge">/</code>;</p>
<p>2) The HTTP agent checks if there is a <code class="language-plaintext highlighter-rouge">fresh</code> copy of the requested representation. It does so my looking at the <code class="language-plaintext highlighter-rouge">Cache-Control</code> or <code class="language-plaintext highlighter-rouge">Expires</code> headers.
If it finds a fresh copy, it just serves it to the client, and the origin server won’t even know this request existed.</p>
<p>3) If a <code class="language-plaintext highlighter-rouge">fresh</code> copy is not found, the origin server will be asked to revalidate the representation, through a <strong>conditional request</strong>. This is done with the
<code class="language-plaintext highlighter-rouge">If-Modified-Since</code> and/or <code class="language-plaintext highlighter-rouge">If-None-Match</code> headers.</p>
<p>4) If the origin server can validate the request, it will just return a <code class="language-plaintext highlighter-rouge">304 Not Modified</code> response, and the client will keep using the representation
it already has stored.</p>
<h2 id="http-cache-at-work-a-practical-example">HTTP cache at work, a practical example</h2>
<p>To understand better this scenario, we will create a simple API that will incrementally add some cache capability.<br />
I am going to use <a href="http://www.sinatrarb.com/">sinatra</a> to create this API, and <a href="http://rtomayko.github.io/rack-cache/">rack-cache</a> as a reverse proxy
cache. The same concepts could be applied with any other stack, I choose these two tools because they are pretty simple and won’t get in our way to understand
how the cache is working, as this is our goal here.</p>
<p>First, install <code class="language-plaintext highlighter-rouge">sinatra</code> and <code class="language-plaintext highlighter-rouge">rack-cache</code>, in case you don’t have them installed already:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gem <span class="nb">install </span>sinatra rack-cache
</code></pre></div></div>
<p>Then, we will create a simple <code class="language-plaintext highlighter-rouge">sinatra</code> app, without any caching capability:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># server.rb</span>
<span class="nb">require</span> <span class="s1">'sinatra'</span>
<span class="n">set</span> <span class="ss">:port</span><span class="p">,</span> <span class="mi">1234</span>
<span class="n">get</span> <span class="s1">'/'</span> <span class="k">do</span>
<span class="c1"># some interesting code would be executed here</span>
<span class="c1"># for now, we are just sleeping for 5 seconds</span>
<span class="nb">sleep</span> <span class="mi">5</span>
<span class="s2">"the resource representation"</span>
<span class="k">end</span>
</code></pre></div></div>
<p>To run this server, just run <code class="language-plaintext highlighter-rouge">ruby server.rb</code>. It should be accessible at <code class="language-plaintext highlighter-rouge">http://localhost:1234</code>. Notice that you’ll need to kill and start the server
again after each change.</p>
<p>When we send a request to this endpoint, you will notice that it will take 5 seconds until we get a response back. To create this request, I’m going to
use <code class="language-plaintext highlighter-rouge">curl(1)</code> (with the <code class="language-plaintext highlighter-rouge">-i</code> parameter, so we can see the headers).</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>curl <span class="nt">-i</span> http://localhost:1234
HTTP/1.1 200 OK
Content-Type: text/html<span class="p">;</span><span class="nv">charset</span><span class="o">=</span>utf-8
Content-Length: 27
X-Xss-Protection: 1<span class="p">;</span> <span class="nv">mode</span><span class="o">=</span>block
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Server: WEBrick/1.3.1 <span class="o">(</span>Ruby/2.1.2/2014-05-08<span class="o">)</span>
Date: Sun, 28 Sep 2014 19:54:16 GMT
Connection: Keep-Alive
the resource representation
</code></pre></div></div>
<p>We can see that there’s no cache-related header in this response. Every time we send this request, it’ll hit the origin server, and we’ll have to wait at least
5 seconds to get the response. Also, we are always receiving the response body, even if it didn’t change, causing unnecessary use of bandwidth.</p>
<p>So let’s start to fix this.</p>
<p>First, we are going to add <code class="language-plaintext highlighter-rouge">rack-cache</code> as our reverse proxy cache. It should be pretty simple, as it’s just a rack middleware:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># server.rb</span>
<span class="nb">require</span> <span class="s1">'sinatra'</span>
<span class="c1"># we require rack-cache</span>
<span class="nb">require</span> <span class="s1">'rack-cache'</span>
<span class="n">set</span> <span class="ss">:port</span><span class="p">,</span> <span class="mi">1234</span>
<span class="c1"># and start using it</span>
<span class="n">use</span> <span class="no">Rack</span><span class="o">::</span><span class="no">Cache</span>
<span class="n">get</span> <span class="s1">'/'</span> <span class="k">do</span>
<span class="nb">sleep</span> <span class="mi">5</span>
<span class="s2">"the resource representation"</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Now that we have <code class="language-plaintext highlighter-rouge">rack-cache</code> in place, we can start to take advantage of it. First we’ll add a <code class="language-plaintext highlighter-rouge">Cache-Control</code> header,
that is going to tell the client that this representation should be considered fresh for 10 seconds:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># server.rb</span>
<span class="nb">require</span> <span class="s1">'sinatra'</span>
<span class="nb">require</span> <span class="s1">'rack-cache'</span>
<span class="n">set</span> <span class="ss">:port</span><span class="p">,</span> <span class="mi">1234</span>
<span class="n">use</span> <span class="no">Rack</span><span class="o">::</span><span class="no">Cache</span>
<span class="n">get</span> <span class="s1">'/'</span> <span class="k">do</span>
<span class="nb">sleep</span> <span class="mi">5</span>
<span class="c1"># add a Cache-Controller header, setting the max-age to 10 seconds</span>
<span class="n">cache_control</span> <span class="ss">:public</span><span class="p">,</span> <span class="ss">max_age: </span><span class="mi">10</span>
<span class="s2">"the resource representation"</span>
<span class="k">end</span>
</code></pre></div></div>
<p>And that’s it. If you try to hit this endpoint again, here’s what you get:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>curl <span class="nt">-i</span> http://localhost:1234
HTTP/1.1 200 OK
Content-Type: text/html<span class="p">;</span><span class="nv">charset</span><span class="o">=</span>utf-8
Cache-Control: public, max-age<span class="o">=</span>10
Content-Length: 27
Date: Sun, 28 Sep 2014 20:17:05 GMT
X-Content-Digest: 904c355ca45f6806b252aa62329fa8ac149011ac
Age: 0
X-Rack-Cache: stale, invalid, store
X-Xss-Protection: 1<span class="p">;</span> <span class="nv">mode</span><span class="o">=</span>block
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Server: WEBrick/1.3.1 <span class="o">(</span>Ruby/2.1.2/2014-05-08<span class="o">)</span>
Connection: Keep-Alive
the resource representation
</code></pre></div></div>
<p>Now that we have the header <code class="language-plaintext highlighter-rouge">Cache-Control</code> in place, the next request should return instantaneously, as it’s not hitting
the origin server. That’s all it takes to have the <code class="language-plaintext highlighter-rouge">freshness</code> process working. The next step is the <code class="language-plaintext highlighter-rouge">validation</code> process,
and it is almost as easy.</p>
<p>So we are already saving some network traffic by avoiding unnecessary requests while the representation is still fresh, but once it gets
stale, we are still retrieving the entire representation in the response body, even if it didn’t change at all. Let’s fix that.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># server.rb</span>
<span class="nb">require</span> <span class="s1">'sinatra'</span>
<span class="nb">require</span> <span class="s1">'rack-cache'</span>
<span class="n">set</span> <span class="ss">:port</span><span class="p">,</span> <span class="mi">1234</span>
<span class="n">use</span> <span class="no">Rack</span><span class="o">::</span><span class="no">Cache</span>
<span class="n">get</span> <span class="s1">'/'</span> <span class="k">do</span>
<span class="nb">sleep</span> <span class="mi">5</span>
<span class="n">representation</span> <span class="o">=</span> <span class="s2">"the resource representation"</span>
<span class="n">cache_control</span> <span class="ss">:public</span><span class="p">,</span> <span class="ss">max_age: </span><span class="mi">10</span>
<span class="c1"># we add the Etag header with a MD5 hash of</span>
<span class="c1"># the representation</span>
<span class="n">etag</span> <span class="no">Digest</span><span class="o">::</span><span class="no">MD5</span><span class="p">.</span><span class="nf">hexdigest</span><span class="p">(</span><span class="n">representation</span><span class="p">)</span>
<span class="n">representation</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Now, performing the same request to this endpoint, when the representation is stale (10 seconds after the first request), this is what we get:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>curl <span class="nt">-i</span> http://localhost:1234
HTTP/1.1 200 OK
Content-Type: text/html<span class="p">;</span><span class="nv">charset</span><span class="o">=</span>utf-8
Cache-Control: public, max-age<span class="o">=</span>10
Etag: <span class="s2">"f8d36c97fa01826fe14c1989e373d6e4"</span>
Content-Length: 27
Date: Sun, 28 Sep 2014 20:29:48 GMT
X-Content-Digest: 904c355ca45f6806b252aa62329fa8ac149011ac
Age: 0
X-Rack-Cache: miss, store
X-Xss-Protection: 1<span class="p">;</span> <span class="nv">mode</span><span class="o">=</span>block
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Server: WEBrick/1.3.1 <span class="o">(</span>Ruby/2.1.2/2014-05-08<span class="o">)</span>
Connection: Keep-Alive
the resource representation
</code></pre></div></div>
<p>We can see the <code class="language-plaintext highlighter-rouge">Etag</code> header there. All we need to do is to save that value, and send it in the <code class="language-plaintext highlighter-rouge">If-None-Match</code> header for the next request:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-i</span> http://localhost:1234 <span class="nt">--header</span> <span class="s1">'If-None-Match: "f8d36c97fa01826fe14c1989e373d6e4"'</span>
HTTP/1.1 304 Not Modified
Cache-Control: public, max-age<span class="o">=</span>10
Etag: <span class="s2">"f8d36c97fa01826fe14c1989e373d6e4"</span>
X-Content-Digest: 904c355ca45f6806b252aa62329fa8ac149011ac
Date: Sun, 28 Sep 2014 20:31:02 GMT
Age: 0
X-Rack-Cache: stale, valid, store
X-Content-Type-Options: nosniff
Server: WEBrick/1.3.1 <span class="o">(</span>Ruby/2.1.2/2014-05-08<span class="o">)</span>
Connection: Keep-Alive
</code></pre></div></div>
<p>Now, instead of getting a <code class="language-plaintext highlighter-rouge">200 OK</code> response, with the entire representation in the body, we get a <code class="language-plaintext highlighter-rouge">304 Not Modified</code>, that does not include
a body message. That saves us some bandwidth, as we don’t need to send that entire representation, that can be pretty big, in the response.</p>
<h2 id="conclusion">Conclusion</h2>
<p>In a time where performance is a feature, doing good use of HTTP caching is one of the simplest ways to create applications and APIs
that are more responsive. With the tools that we have available today, it’s becoming easier and easier to use these well-established
HTTP capabilities, but understanding how they work is the first step, as none of these tools will be able to understand your specific
requirements.</p>
An introduction to UNIX processes2014-09-17T00:00:00+00:00www.brianstorti.com/an_introduction_to_unix_processes<p>Processes are a very important piece in the UNIX world. Basically, almost every program that you execute is running
in a process.<br />
Although you may not need to interact directly with them all the time, you are certainly depending on
them to get anything done in a UNIX system.</p>
<h2 id="first-things-first-what-is-a-process">First things first: What is a process?</h2>
<p>This is not the formal definition of a process, but I like to imagine them as a container. Inside this container
there is program running (<code class="language-plaintext highlighter-rouge">vim</code>, for instance), a bunch of metadata properties that describe the program (who’s running it,
what is its id and so on), and this container can receive and send messages to other containers.<br />
A more formal definition is that a process is, quoting wikipedia, “an instance of a computer program that is being executed”.</p>
<h2 id="processes-properties">Processes properties</h2>
<p>The basic command to see the list of running processes is <code class="language-plaintext highlighter-rouge">ps</code> (process status).<br />
Run it in your terminal and you should see something like this:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ps
PID TTY TIME CMD
28838 ttys000 0:00.16 <span class="nt">-zsh</span>
13833 ttys002 0:00.90 <span class="nt">-zsh</span>
27267 ttys002 0:06.82 vim
</code></pre></div></div>
<p>We can pass a <code class="language-plaintext highlighter-rouge">-o</code> parameter to format the output with the information that we want. Let’s run it again asking
for all the metadata we want to talk about:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ps <span class="nt">-o</span> pid,ppid,tty,uid,args
PID PPID TTY UID ARGS
28838 28836 ttys000 1935087709 <span class="nt">-zsh</span>
13833 13832 ttys002 1935087709 <span class="nt">-zsh</span>
27267 13833 ttys002 1935087709 vim
</code></pre></div></div>
<p><strong>PID</strong>: Every process has an id associated to it. It’s an unique identifier, and that’s how we can reference a specific process.</p>
<p><strong>PPID</strong>: That’s the parent’s PID. Every (well, almost) process has a parent process, the process that was responsible for its creation.</p>
<p><strong>TTY</strong>: This is a identifier of the terminal session that triggered this process. That’s called the <code class="language-plaintext highlighter-rouge">controlling terminal</code>.
Almost every process will be attached to a terminal (except for daemons, that we’ll talk about later). In my example you can see that
I have two terminal sessions running (<code class="language-plaintext highlighter-rouge">ttys000</code> and <code class="language-plaintext highlighter-rouge">ttys0002</code>). You can check your current tty with the, surprise, <code class="language-plaintext highlighter-rouge">tty</code> command:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">tty</span>
/dev/ttys000
</code></pre></div></div>
<p><strong>UID</strong>: This is the user id. It’s the identifier for the user that’s the owner of this process, and that’s what will define the permissions
this process will have.<br />
You can check your user id with the command <code class="language-plaintext highlighter-rouge">id</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">id</span> <span class="nt">-u</span> brianstorti
1935087709
</code></pre></div></div>
<p><strong>ARGS</strong>: The command (followed by its arguments) that’s running in this process.</p>
<p>There are many more properties related to a process, like the CPU / memory usage percentage, the start time and so on. You can check the entire
list in the <code class="language-plaintext highlighter-rouge">ps</code> manpage:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>man ps
</code></pre></div></div>
<p>Just search for <code class="language-plaintext highlighter-rouge">KEYWORDS</code> and you see a huge list of properties. The ones that I just described are the more commonly used, though.</p>
<h2 id="how-processes-are-born">How processes are born</h2>
<p>Processes creation is achieved in 2 steps in a UNIX system: the <code class="language-plaintext highlighter-rouge">fork</code> and the <code class="language-plaintext highlighter-rouge">exec</code>.</p>
<p>Every process is created using the <code class="language-plaintext highlighter-rouge">fork</code> system call. We won’t cover system calls in this post, but you can imagine them as a way
for a program to send a message to the kernel (in this case, asking for the creation of a new process).</p>
<p>What <code class="language-plaintext highlighter-rouge">fork</code> does is create a copy of the calling process. The newly created process is called the child, and the caller is the parent. This
child process inherits everything that the parent has in memory, it’s an almost exact copy (<code class="language-plaintext highlighter-rouge">pid</code> and <code class="language-plaintext highlighter-rouge">ppid</code> are different, for instance).<br />
One thing to be aware of is that if a process is using 200MB of memory, when it forks a child, the newly created process will use more 200MB.
This can easily become an accidental “fork bomb”, that will consume all the available resources of the machine.</p>
<p>The second step is the <code class="language-plaintext highlighter-rouge">exec</code>. What <code class="language-plaintext highlighter-rouge">exec</code> does is <strong>replace</strong> the current process with a new one. The caller process is gone forever, and the new
process takes its place. If you try to run this command in a terminal session:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">exec </span>vim
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">vim</code> will be opened normally, as it was a direct call to it, but as soon as you close it, you will see that the terminal is gone as well. So
here’s what happened:<br />
You had a shell process running (<code class="language-plaintext highlighter-rouge">bash</code>, <code class="language-plaintext highlighter-rouge">zsh</code> or similar). In the moment that you called <code class="language-plaintext highlighter-rouge">exec</code>, passing <code class="language-plaintext highlighter-rouge">vim</code> and a parameter, it <strong>replaced</strong>
the <code class="language-plaintext highlighter-rouge">bash</code> process with a <code class="language-plaintext highlighter-rouge">vim</code> process, so when you close vim, there is no shell there anymore.</p>
<p>You will see this fork + exec pattern all over the place in a UNIX system. If you are running a bash process, when you call, say, <code class="language-plaintext highlighter-rouge">ls</code>, to list your files,
what actually is done is exactly this. The <code class="language-plaintext highlighter-rouge">bash</code> process calls <code class="language-plaintext highlighter-rouge">fork</code> to create an exact copy of itself, then call <code class="language-plaintext highlighter-rouge">exec</code>, to replace this copy with the
<code class="language-plaintext highlighter-rouge">ls</code> process. When the <code class="language-plaintext highlighter-rouge">ls</code> process exits, you are back to the parent process, that is <code class="language-plaintext highlighter-rouge">bash</code>. And talking about a process exiting…</p>
<h2 id="processes-always-exit-with-an-exit-code">Processes always exit with an exit code</h2>
<p>Every process exits with an exit code, that is between 0 and 255. There are <a href="http://tldp.org/LDP/abs/html/exitcodes.html">well accepted meanings</a> for some
of them, but they are really just numeric values that you can handle as you want (although it’s a good idea to keep the conventions).<br />
What is important to know is that the <code class="language-plaintext highlighter-rouge">0</code> is considered a successful exit code, while all the other indicate different types of errors.</p>
<p>We can try that with the <code class="language-plaintext highlighter-rouge">cd</code> command (or any other you want, actually). Notice that <code class="language-plaintext highlighter-rouge">$?</code> can be used to represent the exit code of the
last process that was executed:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cd</span>
<span class="nv">$ </span><span class="nb">echo</span> <span class="nv">$?</span>
0
<span class="nv">$ </span><span class="nb">cd </span>nop
<span class="nb">cd</span>:cd:13: no such file or directory: nop
<span class="nv">$ </span><span class="nb">echo</span> <span class="nv">$?</span>
1
</code></pre></div></div>
<p>The status <code class="language-plaintext highlighter-rouge">0</code> represents that the process was executed successfully, while <code class="language-plaintext highlighter-rouge">1</code> represents a failure, when we tried to cd into a directory that does not exist.<br />
The parent process then can read this code through the <code class="language-plaintext highlighter-rouge">wait</code> system call.<br />
If the child process exits and, for some reason, the parent fails to call <code class="language-plaintext highlighter-rouge">wait</code>,
we have what is called a <strong>zombie process</strong>.</p>
<h2 id="zombie-and-orphan-processes">Zombie and orphan processes</h2>
<p>Zombies and orphans processes are sometimes wrongly mixed together, but they are two different things.</p>
<p>A process becomes a zombie when it exits, but its parent doesn’t call <code class="language-plaintext highlighter-rouge">wait</code>. The process doesn’t really exist anymore, but it still appears in the
process table (like the one you see when you run the <code class="language-plaintext highlighter-rouge">ps</code> command). The table will show a status <code class="language-plaintext highlighter-rouge">Z</code> for the zombies.<br />
This state is possible because the kernel can not fully dispose a process when it exits, otherwise no one would be able to read its exit code, so
it just waits until the parent performs a call to <code class="language-plaintext highlighter-rouge">wait</code>, and then it can be fully removed.<br />
Every process stay in a zombie state, at least for a short period of time, between the moment it exits, and the moment the parent reads its exit code.</p>
<p>A process becomes an orphan when it’s still running, but its parent exits. What happens is that the child process is “adopted” by the initial process,
the first process that is executed in the system, usually called <code class="language-plaintext highlighter-rouge">init</code> (<code class="language-plaintext highlighter-rouge">launchd</code> if you are on a Mac OS). The PPID (parent id) of an orphan process
with be <code class="language-plaintext highlighter-rouge">1</code>.<br />
A process can be orphaned unintentionally, when the parent process crashed, for instance, but it also can be orphaned intentionally, usually when you want
a long running process to be detached from a user session, as is the case for <code class="language-plaintext highlighter-rouge">daemons</code> processes.</p>
<h2 id="daemon-processes">Daemon processes</h2>
<p>A <code class="language-plaintext highlighter-rouge">daemon</code> process is, simply speaking, a process that runs in the background, and is not attached to a controlling terminal. Database and web servers are
good examples of daemons. There are also a bunch of daemons that are responsible for keeping your system working as it’s.</p>
<p>There is one specially important <code class="language-plaintext highlighter-rouge">daemon</code>, the first process created on the system: the <code class="language-plaintext highlighter-rouge">init</code> process. It’s the grandparent of all the other processes.<br />
The <code class="language-plaintext highlighter-rouge">init</code> process can spawn new processes that will be <code class="language-plaintext highlighter-rouge">daemons</code>, or a process can become a <code class="language-plaintext highlighter-rouge">daemon</code> by being intentionally made an orphan, as we saw
before (forking a child and immediately exiting).
You’ll also notice that, by convention, <code class="language-plaintext highlighter-rouge">daemons</code> names usually end with a “d”: <code class="language-plaintext highlighter-rouge">syslogd</code>, <code class="language-plaintext highlighter-rouge">sshd</code>, <code class="language-plaintext highlighter-rouge">httpd</code>, and so on.</p>
<p>But if these <code class="language-plaintext highlighter-rouge">daemons</code> are not attached to a controlling terminal, how can someone actually terminate these processes? Well, one way is by sending them a signal.</p>
<h2 id="signals">Signals</h2>
<p>Remember when I said that I liked to imagine processes as containers, and that these containers could send messages to each other? Well, that’s exactly what
signals are, messages that are sent from one process to another.</p>
<p>The system call used to send a process a signal is <code class="language-plaintext highlighter-rouge">kill</code>. This communication mechanism was originally created to terminate processes, that’s why it is named
like that, but it actually just send a message (that might or might not be meant to terminate the receiver process).</p>
<p>When a process receives a signal, it can carry out the default action for that signal, execute a signal handler function, or, with a few exceptions, just ignore it.</p>
<p>You can run <code class="language-plaintext highlighter-rouge">kill -l</code> to see the list of available signals that can be sent. Each one of these signals also have an equivalent numeric value. For instance,
one of the most used signals, <code class="language-plaintext highlighter-rouge">KILL</code>, can be represented by the number <code class="language-plaintext highlighter-rouge">9</code>.</p>
<p>You can send a signal to a process with the <code class="language-plaintext highlighter-rouge">kill</code> command. For instance, if you want to kill that <code class="language-plaintext highlighter-rouge">vim</code> process that is running (with a PID 27267),
you can run any of these commands:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">kill</span> <span class="nt">-KILL</span> 27267
<span class="nv">$ </span><span class="nb">kill</span> <span class="nt">-SIGKILL</span> 27267
<span class="nv">$ </span><span class="nb">kill</span> <span class="nt">-9</span> 27267
</code></pre></div></div>
<p>That’s also what happens when you hit <code class="language-plaintext highlighter-rouge">Ctrl-c</code> to terminate a program in your terminal, a <code class="language-plaintext highlighter-rouge">SIGINT</code> signal (INTERRUPT) is sent to the running process, and,
as a consequence, it should terminate.</p>
<p>With the exception of <code class="language-plaintext highlighter-rouge">SIGKILL</code> and <code class="language-plaintext highlighter-rouge">SIGSTOP</code>, processes can <code class="language-plaintext highlighter-rouge">trap</code> a signal to perform some custom action (or just ignore it). That’s why sometimes <code class="language-plaintext highlighter-rouge">Ctrl-c</code>
doesn’t seem to work, the target process probably trapped it to do something (e.g. remove temporary files or close connections) before actually exit.</p>
<p>One interesting exception is the <code class="language-plaintext highlighter-rouge">init</code> process, that can ignore even a <code class="language-plaintext highlighter-rouge">SIGKILL</code> or a <code class="language-plaintext highlighter-rouge">SIGSTOP</code> signal. The reason is that the kernel forces a system
crash if the <code class="language-plaintext highlighter-rouge">init</code> process terminates, so it will not deliver any fatal signal to this process.</p>
<h2 id="summarizing">Summarizing</h2>
<ul>
<li>A process is an instance of a running program;</li>
<li>Processes have some properties related to it (pid, ppid, tty, etc.);</li>
<li>Processes are created in a two step process: <code class="language-plaintext highlighter-rouge">exec</code> and <code class="language-plaintext highlighter-rouge">fork</code>;</li>
<li>Processes always exit with an exit code;</li>
<li>A process is a zombie if it is already dead but its parent still didn’t read its exit code with <code class="language-plaintext highlighter-rouge">wait</code>;</li>
<li>A process is an orphan if it is still alive but its parent isn’t. The <code class="language-plaintext highlighter-rouge">init</code> process becomes the new parent;</li>
<li>A daemon is a process that runs in the background, and is not attached to a controlling terminal;</li>
<li>Signals are messages sent from one process to another.</li>
</ul>
Designing good APIs - Avoiding the type marshalling trap2014-07-20T00:00:00+00:00www.brianstorti.com/avoiding-the-type-marshalling-trap<p>“Type marshalling” means automatically serializing an internal object (<code class="language-plaintext highlighter-rouge">order</code>, <code class="language-plaintext highlighter-rouge">product</code>, <code class="language-plaintext highlighter-rouge">address</code>) in a data format (<code class="language-plaintext highlighter-rouge">json</code>, <code class="language-plaintext highlighter-rouge">xml</code>, <code class="language-plaintext highlighter-rouge">html</code>)
that is returned to the consuming client.</p>
<p>It’s tempting to use this technique, as most of the popular web frameworks give this functionality out of the box.</p>
<p>If you use Rails, even the scaffold generated code does this for you:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">show</span>
<span class="vi">@product</span> <span class="o">=</span> <span class="no">Product</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="n">params</span><span class="p">[</span><span class="ss">:id</span><span class="p">])</span>
<span class="n">respond_to</span> <span class="k">do</span> <span class="o">|</span><span class="nb">format</span><span class="o">|</span>
<span class="nb">format</span><span class="p">.</span><span class="nf">json</span> <span class="p">{</span> <span class="n">render</span> <span class="ss">json: </span><span class="vi">@product</span> <span class="p">}</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Or, if you are using Spring MVC, just by adding a <strong>@ResponseBody</strong> annotation in your controller method you will have a serialized user object:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@RequestMapping</span><span class="o">(</span><span class="n">value</span> <span class="o">=</span> <span class="s">"/product/{id}"</span><span class="o">)</span>
<span class="nd">@ResponseBody</span>
<span class="kd">public</span> <span class="nc">Product</span> <span class="nf">show</span><span class="o">(</span><span class="nd">@PathVariable</span><span class="o">(</span><span class="s">"id"</span><span class="o">)</span> <span class="nc">String</span> <span class="n">id</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">Product</span><span class="o">.</span><span class="na">find</span><span class="o">(</span><span class="n">id</span><span class="o">);</span>
<span class="o">}</span>
</code></pre></div></div>
<p>This would generate a response with a json like that:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"product"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Marled wool cardigan"</span><span class="p">,</span><span class="w">
</span><span class="nl">"size"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"code"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0002"</span><span class="p">,</span><span class="w">
</span><span class="nl">"variantName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Regular"</span><span class="p">,</span><span class="w">
</span><span class="nl">"variantId"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"dimensionOne"</span><span class="p">:</span><span class="w"> </span><span class="s2">"M"</span><span class="p">,</span><span class="w">
</span><span class="nl">"regular"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
</span><span class="nl">"dimensionTwo"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"price"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="mi">6</span><span class="p">,</span><span class="w">
</span><span class="nl">"current"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"amount"</span><span class="p">:</span><span class="w"> </span><span class="mf">27.97</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"regular"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"amount"</span><span class="p">:</span><span class="w"> </span><span class="mf">29.00</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"deprecated"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"amount"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"deprecatedType"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"businessId"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2285370120002"</span><span class="p">,</span><span class="w">
</span><span class="nl">"taxCode"</span><span class="p">:</span><span class="w"> </span><span class="s2">"C1"</span><span class="p">,</span><span class="w">
</span><span class="nl">"priceType"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w">
</span><span class="nl">"images"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"imagePath"</span><span class="p">:</span><span class="w"> </span><span class="s2">"webcontent/0005/537/723/cn5537723.jpg"</span><span class="p">,</span><span class="w">
</span><span class="nl">"thumbnailPath"</span><span class="p">:</span><span class="w"> </span><span class="s2">"webcontent/0005/537/721/cn5537721.jpg"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"returnCode"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>This was pretty easy, with a few lines of code we already have an API that can be consumed by our clients. So, what’s so bad about this?</p>
<p>Well, there are a few problems with this approach, but I want to talk specifically about one, that, in my opinion, is the most critical.</p>
<h2 id="coupling-your-server-to-your-clients">Coupling your server to your clients</h2>
<p>This approach is very server-centric: A small change in the server might break its clients. That’s definitely not what we want for our APIs.</p>
<p>Refactoring an internal behaviour shouldn’t require changes in the clients. Imagine having to publish a new version of your API every time
you want to rename a field. You don’t want your clients to have this knowledge of your internal structure, it kills your ability to change.</p>
<p>Besides that, it just make it harder for the client to use this response. Why should it care about a “businessId”?
And if it wants to show the product price, it needs to know that a product has a “price” field, that has a “current” field, that has an “amount” field. That’s way too much.</p>
<h2 id="the-solution-build-you-own-responses">The solution: Build you own responses</h2>
<p>If we don’t want to expose our internal data structures to the outside world, one approach we can take is simply building our own response object, or, in other words,
define this resource’s representation.</p>
<p>A representation is nothing more then a description of the current state of our resource, and that’s exactly what we will build.
There is probably dozens of ways to implement this, and I’ll just show one of them, that is the most simple implementation I can think of.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">ProductRepresentation</span>
<span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">product</span><span class="p">)</span>
<span class="vi">@price</span> <span class="o">=</span> <span class="n">product</span><span class="p">.</span><span class="nf">current_Price</span>
<span class="vi">@name</span> <span class="o">=</span> <span class="n">product</span><span class="p">.</span><span class="nf">description</span>
<span class="vi">@size</span> <span class="o">=</span> <span class="n">product</span><span class="p">.</span><span class="nf">size_name</span>
<span class="vi">@image_path</span> <span class="o">=</span> <span class="n">product</span><span class="p">.</span><span class="nf">image_path</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">class</span> <span class="nc">ProductsController</span> <span class="o"><</span> <span class="no">ApplicationController</span>
<span class="k">def</span> <span class="nf">show</span>
<span class="vi">@product</span> <span class="o">=</span> <span class="no">Product</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="n">params</span><span class="p">[</span><span class="ss">:id</span><span class="p">])</span>
<span class="n">render</span> <span class="ss">json: </span><span class="no">ProductRepresentation</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="vi">@product</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>And with this code, we have a much smaller response, that does not expose our internal structure:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Marled wool cardigan"</span><span class="p">,</span><span class="w">
</span><span class="nl">"price"</span><span class="p">:</span><span class="w"> </span><span class="mf">27.97</span><span class="p">,</span><span class="w">
</span><span class="nl">"size"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Regular"</span><span class="p">,</span><span class="w">
</span><span class="nl">"image_path"</span><span class="p">:</span><span class="w"> </span><span class="s2">"webcontent/0005/537/723/cn5537723.jpg"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>If anything needs to be changed in the server, we just need to make sure
that the <code class="language-plaintext highlighter-rouge">ProductRepresentation</code> is still getting the correct data in the correct fields and all of our clients will continue to work.</p>
<h2 id="conclusion">Conclusion</h2>
<p>As always, this is not the best approach for all the cases, if you have a very small API dealing with simple data structures, or if you have just one client in a very controlled environment, it might not
be worth to have this extra work of building your responses.</p>
<p>Even in these cases, though, I think it’s a nice exercise to think how coupled is your server to your client, and what would be the
impact of minimizing this coupling, allowing both server and client to grow and evolve as independently as possible.</p>