Check out my Kubernetes book!

Implementing a Priority Queue in Ruby

The other day I had to use a priority queue to solve a problem. It was a Java project, so I already had the PriorityQueue class ready to be used. After the code was done, I started to wonder what a solution in Ruby would look like. And then I discovered that Ruby does not have a priority queue implementation in its standard library. How hard could it be to implement my own?

First, some definitions

I just want to define what a queue and a priority queue are before we go to the implementation. If you are already comfortable with these definitions, feel free to jump to the next section.

A queue is a data structure in which the items added first, will be the first to be removed, also known as first-in first-out. Ruby has a queue implementation in its standard library, and the usage is quite simple:

q =
q << 1
q << 2
q << 3

q.pop # => 1

A priority queue is like a queue, where you remove items from the front of the list. The difference is that each element has a priority, and the order of the items inside the queue is determined by this priority, so the first item to be removed will be the one with the highest priority.

Introducing our test element

A priority queue should work with any type of element, not just numbers. As long as there is a way to determine their priority, we should be able use them.
I created this Element class, which has a name and a priority, just so we can use in our tests:

class Element
  include Comparable

  attr_accessor :name, :priority

  def initialize(name, priority)
    @name, @priority = name, priority

  def <=>(other)
    @priority <=> other.priority

It includes the Comparable module and implements <=>, so we can compare two Elements.

The naive implementation

Let’s start with a very simple (and naive) implementation of a priority queue. The idea is that, every time that we need to remove an item, we will sort the entire list of elements by their priority, in ascending order, and then we can just return the last element, that will be the one with the highest priority:

class NaivePriorityQueue
  def initialize
    @elements = []

  def <<(element)
    @elements << element

  def pop
    last_element_index = @elements.size - 1

And we can check that it works:

q =
q <<"bar", 1)
q <<"foo", 3)
q <<"baz", 2)

p # => "foo"

The problem with this approach is the performance, as you might have imagined. Although we can insert in constant time (O(1)), the pop operation is linear (O(n)) in the best case, meaning that the operation time will grow linearly and in direct proportion to the size of the elements list. As the size of the list doubles, the time to perform the operation also is expected to double.
We can do better.

The binary heap

The most common data structure used to implement a priority queue is the binary heap, that is basically a binary tree with some additional properties. The binary heap is a complete binary tree, meaning that it’s fully balanced, or, in other words, that all the levels of the tree are filled with elements, except possibly for the last level of the tree.

The other thing that distinguishes a binary heap is that it complies with the heap property, meaning that all the nodes are greater (or equal) than their children.

Example of a binary heap. Notice that it's a fully balanced binary tree, where all the nodes are greater than their children

One thing that is very interesting about binary heaps is that they can be represented as a simple array. There is no need for links or any complex data structure, just a simple array. If you think about it, it makes a lot of sense. The children of an element at a given index i will always be in 2i and 2i + 1. The same way, the parent of this node will be at the index i/2.

# 0  1    2   3   4   5  6   7  8  9
 [0, 100, 19, 36, 17, 3, 25, 1, 2, 7]

This array represents the tree in the previous image. For instance, if you get the element at the index 4 (17), you can check that its parent is at the index 2 (19), and that its children are at 8 (2) and 9 (7). The only caveat here is that we add a 0 in the first position of this array, that will never be used, but make our calculations a bit easier.
You can find the nodes relation by doing simple arithmetic on their indexes. How cool is that?

Implementing a real priority queue

After we understand how a binary heap works, it’s easy to see how it can be used to implement a priority queue. The element with highest priority will always be in the root of our tree. When we add elements to this queue, we just need to make sure it is placed in the right place to comply with the heap property.

Adding items to the queue

First we will just append the item to our array:

class PriorityQueue
  def initialize
    @elements = [nil]

  def <<(element)
    @elements << element

Just by doing this we already have a complete binary tree. The problem is that it violates the heap property. We need to make sure the node is in the right place of the tree, meaning that it is greater than its children, and smaller than its parent. This operation of putting a node in its place has many names, the most common being bubble up or heapify up. So let’s implement it:

def bubble_up(index)
  parent_index = (index / 2)

  # return if we reach the root element
  return if index <= 1

  # or if the parent is already greater than the child
  return if @elements[parent_index] >= @elements[index]

  # otherwise we exchange the child with the parent
  exchange(index, parent_index)

  # and keep bubbling up

def exchange(source, target)
  @elements[source], @elements[target] = @elements[target], @elements[source]

Now we just need to call it after we add a new element:

def <<(element)
  @elements << element
  # bubble up the element that we just added
  bubble_up(@elements.size - 1)

By doing this we already have a complete binary tree that complies with the heap property. Let’s confirm that:

q =
q << 2
q << 3
q << 1

p q.elements # => [nil, 3, 2, 1]
Removing items from the queue

The only thing missing to have a fully working priority queue is the ability to dequeue items.
As we are using a binary heap, we can assume that the root element will always be the one with the highest priority.

def pop
  # the first element will always be the max, because of the heap constraint

Now we can already retrieve the element with the highest priority. The only problem is that we are not actually removing it from the queue.
What we are going to do is something similar to what we did when we were adding items to the queue. We will exchange the root element with the last element of the queue, and then perform a process called bubble down (or heapify-down) in the new root element, to put it in the correct place of the tree.

def pop
  # exchange the root with the last element
  exchange(1, @elements.size - 1)

  # remove the last element of the list
  max = @elements.pop

  # and make sure the tree is ordered again

def bubble_down(index)
  child_index = (index * 2)

  # stop if we reach the bottom of the tree
  return if child_index > @elements.size - 1

  # make sure we get the largest child
  not_the_last_element = child_index < @elements.size - 1
  left_element = @elements[child_index]
  right_element = @elements[child_index + 1]
  child_index += 1 if not_the_last_element && right_element > left_element

  # there is no need to continue if the parent element is already bigger
  # then its children
  return if @elements[index] >= @elements[child_index]

  exchange(index, child_index)

  # repeat the process until we reach a point where the parent
  # is larger than its children

The only caveat here is that logic to make sure we are always comparing against the largest child.

And that’s all we need to have a working priority queue!

Comparing the two implementations

Just out of curiosity, let’s run a simple benchmark to compare the performance of our real implementation, using the binary heap, with the naive implementation, that just sorts the array for every pop operation.

require 'benchmark/ips'

require_relative 'element'
require_relative 'naive_priority_queue'
require_relative 'priority_queue'

naive =
real =

100_000.times do |i|
  naive <<"Foo #{i}", i)
  real  <<"Foo #{i}", i)

Benchmark.ips do |x|"naive") { naive.pop }"real")  { real.pop  }!

And the results:

Calculating -------------------------------------
               naive     4.000  i/100ms
                real    17.516k i/100ms
               naive     46.521  (± 2.1%) i/s -    236.000
                real      1.756M (± 8.6%) i/s -      8.723M

                real:  1755960.4 i/s
               naive:       46.5 i/s - 37745.64x slower

In my machine, the naive implementation is about 37,745 times slower than the binary heap one. That’s a pretty big difference.

Wrapping up

The priority queue is a very useful data structure, that can be used to solve a bunch of problems, from thread scheduling to graph searching algorithms.
We can have a fully working (and with a quite good performance) priority queue with less than 50 lines of Ruby. And even if you are working with another language, that has a priority queue implementation in its standard library, implementing your own is always a good exercise to understand how things work under the hood.

You can see the entire solution in this gist.

Interested in learning Kubernetes?

I just published a new book called Kubernetes in Practice, you can use the discount code blog to get 10% off.

Get fresh articles in your inbox

If you liked this article, you might want to subscribe. If you don't like what you get, unsubscribe with one click.