Manipulating Bitstrings in Elixir

14 May 2023 -

Per the official documentation, bitstrings are a fundamental data type in Elixir representing a contiguous sequence of bits in memory. Useful for UTF-8 encodings and other fun things like secret obfuscation, bitstrings are a great tool to have in your back pocket. Instead of parroting existing documentation, I wanted to share a little nuance that I had trouble grasping while I was learning about this data type.

When you concatenate, prepend, or append bitstring literals in a bitstring special form(i.e. «value::size»), the output format can look very different from the input format. If your output produces a valid UTF-8 encoded character, you will get that character. If the result of the concatentation would overflow 8 bits (the maximum), then the result will be whatever integer is represented by that 8-bit value, and then a bitstring literal comprised of the remainder. For example:

value = <<0b110::3, 0b001::3>>
new_value = <<0b011::3, value::bitstring, 0b000::3>>
# => <<120, 8::size(4)>>

In this example we are both prepending and appending bitstring literals to the original bitstring value, which you can visualize like so:

<<0b011::3, 0b110::3, 0b001::3, 0b000::3>>

You’ll notice that there are more than 8 bits in this intermediate representation. Since we obviously cannot have more than 8 bits in a bitstring, we need to do something with the extra bits. Elixir will take the first 8 bits provided and return them as an integer(i.e. 120). The reason for this is that a bitstring comprised of 8-bytes is what is called a binary, and is treated differently than other bitstrings. Put in the official terms, a binary is a bitstring that is divisible by 8. All binaries are bitstrings, but not all bitstrings are binaries. The remaining bits from out intermediate representation will be returned as a bitstring literal, using the verbose syntax (i.e. 8::size(4)). Taking it further, what happens if we have many more bits?

<<0b011::3, 0b110::3, 0b001::3, 0b000::3, 0b001::3, 0b010::3>>
# => <<120, 130, 2::size(2)>>

So you see, bitstrings concatenated/appended/prepended in this way will return integers from the leading 8-bit fragments, and return the remaining bits as bitstring literals in the verbose syntax.

This was very confusing for me when I first learned it, so I hope this helps someone. This is by no means a complete explanation of bitstrings, so for more on bitstrings, check out the official docs.