Elixir : Basics of polymorphism

Polymorphism simply means many forms and in the context of programming, polymorphism is when your code that has a single identity, behaves…

Arunmuthuram M

~14 min read · December 9, 2023 (Updated: January 4, 2024) · Free: Yes

Polymorphism simply means many forms, and in the context of programming, polymorphism is when your code that has a single identity, behaves differently or executes different versions of code based on the input data type that it is operating on. The main form of polymorphism in elixir is when functions with the same name and arity execute different versions of code based on the data type of input arguments. One such example is the Enum module's functions such as Enum.each/2, that iterates the elements of the input data structure and performs some processing on each of the elements. This function with a single signature, can operate on different data types such as lists, ranges, maps, MapSets, keyword lists etc and it behaves differently based on the data type of the provided argument, thus achieving polymorphism. This article explains the different tools and features available in elixir to achieve polymorphism.

Polymorphism via Function clauses

The easiest way to achieve polymorphism in elixir is through multiple function clauses that utilise pattern matching and guard clauses to execute different versions of code based on type assertions. Let us now try writing a simple implementation of the Enum.each/2 functionality and achieve polymorphism through multiple function clauses.

defmodule Enum2 do
  #lists and keyword lists
  def each([hd | tl], fun) do
    fun.(hd)
    each(tl, fun)
  end
  def each([], _), do: :ok

  # range  
  def each(%Range{first: first, step: step} = range, fun) do
    each_range(first, step, Range.size(range), fun)
  end

  #MapSets
  def each(%MapSet{} = map_set, fun), do: each(MapSet.to_list(map_set), fun)

  # maps
  def each(map, fun) when is_map(map), do: each(Map.to_list(map), fun) 

  def each(_, _), do: raise "Unsupported input type"

  defp each_range(val, step, size, fun) when size > 0 do
    fun.(val)
    each_range(val + step, step, size - 1, fun)
  end
  defp each_range(_, _, _, _), do: :ok
end

---------------------------------------------------------------------------
Enum2.each([1, 2, 3], &IO.inspect/1)
1
2
3
:ok

Enum2.each([one: 1, two: 2, three: 3], &IO.inspect/1)
{:one, 1}
{:two, 2}
{:three, 3}
:ok

Enum2.each(%{1 => 1, 2 => 2, 3 => 3}, &IO.inspect/1)
{1, 1}
{2, 2}
{3, 3}
:ok

Enum2.each(MapSet.new([1, 2, 3], &IO.inspect/1)
1
2
3
:ok

Enum2.each(1..3, &IO.inspect/1)
1
2
3
:ok

Enum2.each({1, 2, 3}, &IO.inspect/1)
** (RuntimeError) Unsupported input type

As you can see in the code above, a function of a single signature Enum2.each/2 can be used with different data types, and for each data type, a different version of code is being executed, thus achieving polymorphism.

Polymorphism via Protocols

The major disadvantage of achieving polymorphism through multiple function clauses is that it is not easily extensible to new data types. Using the same example above, whenever Enum2.each/2 is required to work on new data types, then the Enum2 module must be modified every time by adding new code to support the new data types. Moreover, if the module is part of a library that is being consumed by others, code cannot be easily added to the shipped libraries by the consumers. This introduces difficulty in extending functionality to user-defined struct data types such as Ranges and MapSets and this is what protocols solve. Protocols allow easy extensibility of a particular functionality to a custom data type even if the function is present in a built-in module or a module from a library. They also do not require all of the implementations of the function for different data types to be grouped together in a single module. Protocols also enable dependency inversion by removing tight coupling between high level modules and low level modules, by acting as a layer of abstraction between them.

Protocols have two major parts. One of them is the protocol itself, with the common public api structure or interface and the other is the concrete implementation of the protocol pertaining to a particular data type. Whenever a common functionality needs to be performed differently on different data types, and if this functionality would require extension to other data types in the future, then a protocol can be defined, which can later be extended to any number of data types by providing concrete protocol implementations for those data types.

Protocol definition

A protocol can be created using defprotocol followed by the name of the protocol and a do block that contains function headers for the common public api interface. Protocol definitions are similar to module definitions and can be created in their own separate files. The function headers inside the protocol definitions have the syntax def function_name(arg1). Since these function headers are just the common public api interface, they should be generic and must not contain function bodies, guard clauses or literal values as parameters. Since these protocols will be implemented on different data types, the function headers must have at least one parameter that takes in a data type as an argument, based on which the correct implementation will be chosen and called in run time.

Since the implementations of these protocols can involve multiple data types, the return value of these implementations may not be simple and may require multiple versions of them for handling different scenarios of the functionality. Hence the protocol definer can dictate what return type of what format is expected from these implementations when these api interfaces are implemented. Even though typespecs may aid in letting the implementers know the data type that is being expected to be returned, proper documentation is required for clearly explaining how the protocol's api interface must be implemented, the signature of the public api functions, the data type and it's exact format that the implementations must return etc. Let us now create a protocol for the same example scenario shown above with Enum.each/2.

defprotocol Enumerable2 do
  @doc "Should return a tuple with the next element and rest of the elements. 
  {:cont, next_el, rest} or {:halt, nil, nil} if there are no elements"
  def next(value)
end

In the above example, we have defined a protocol called Enumerable2 that has one public api function called next/1 along with a short documentation on details of its implementation. When we are defining public api interfaces for a protocol, we should always create low level generic functions that can be composed and reused easily to solve multiple problems. In this example, the main functionality is enumeration of the elements and so instead of a high level each/2 function that cannot be reused for other future scenarios such as a map or a reduce function, the next/1 function provides low level functionality that can be reused in multiple scenarios.

Protocol implementation

Now that we have seen how to define a protocol, let's look into its second part, the implementation. A protocol implementation can be present in any of the modules. By convention they are usually present in the same file as the protocol definition or in the respective modules related to the particular data type that is implementing the protocol. The protocol implementation syntax starts with defimpl followed by the protocol name that is being implemented, a :for option that takes in the data type that this protocol is implemented for and finally a do block that has concrete implementations for all the public api interfaces defined in the protocol definition. These concrete implementations are defined just like regular named functions.

The :for option takes in any of the following values, each denoting a particular data type. Tuple, Atom, List, BitString, Integer, Float, Function, PID, Map, Port, Reference, Any and user-defined structs denoted by its name. The :for option can also take in a list of the above types if the same implementation works for more than one data type. When protocol implementations are defined for user-defined structs and if the implementations are present in the same module defining the struct, then the :for option can be removed and it will implicitly set the current module's name as the value. Let us now implement the Enumerable2 protocol for different data types and see them in action.

defprotocol Enumerable2 do
  def next(value)
end

defimpl Enumerable2, for: List do
  def next([hd | tl]), do: {:cont, hd, tl}
  def next([]), do: {:halt, nil, nil}
end

defimpl Enumerable2, for: Map do
  def next(map), do: Map.to_list(map) |> Enumerable2.next
end

defimpl Enumerable2, for: MapSet do
  def next(map_set), do: MapSet.to_list(map_set) |> Enumerable2.next
end

defimpl Enumerable2, for: Range do
  def next(%Range{first: first, step: step} = range) do
    cond do
      in_range?(range) -> {:cont, first, %Range{range | first: first + step}}
      true -> {:halt, nil, nil}
    end
  end
  defp in_range?(%Range{first: first, last: last, step: step}) do
    (step > 0 and first <= last) ||  (step < 0 and first >= last)
  end
end

As you can see above, we have created the implementations of the Enumerable2 protocol for the data types List, Map, MapSet and Range. The protocol definition requires the implementers to define a single function next/1 that returns a tuple with information of the next element and the rest of the elements. For data types such as Map and MapSet which do not allow directly traversing its elements like lists, we convert them into a list and then delegate them to the next/1 implementation of the list data type. For the range data type, we have created a helper function in_range/1 which is allowed inside protocol implementations.

The protocol implementation offers two module attributes such as @for and @protocol, which carry the values of the current data type for which the implementation is being defined and the current protocol that the implementation is part of, respectively. One application of the @for module attribute is that, when an implementation is defined for multiple data types, code can be written based on the current data type, which can be accessed using the @for module attribute. In our above implementation, we can see that for both Map and MapSet, we are having separate implementations, each calling the to_list method and delegating the execution to the next/1 method of List data type. We can instead provide a single implementation for both data types using the two module attributes as follows.

defimpl Enumerable2, for: [Map, MapSet] do
  def next(value), do: @for.to_list(value) |> @protocol.next
end

If a protocol implementation fails to implement a required public function of the same signature, then a warning will be provided when the code is compiled and an UndefinedFunctionError will be thrown at runtime when the protocol's public function is called on that particular data type. Similarly when a protocol's public function is called with a data type that does not even have a protocol implementation, then a Protocol.UndefinedError will be thrown at run time.

defimpl Enumerable2, for: Integer do
end
warning: function next/1 required by protocol Enumerable2 is not implemented
(in module Enumerable2.Integer) # during compilation

----------------------------------------------------------------------------
Enumerable2.next(5)
** (UndefinedFunctionError) function Enumerable2.Integer.next/1 is undefined
or private, but the behaviour Enumerable2 expects it to be present

----------------------------------------------------------------------------
Enumerable2.next(:atom)
** (Protocol.UndefinedError) protocol Enumerable2 not implemented for
atom of type Atom

Now that we have defined a protocol and its implementations, let's use it to create the Enum.each/2 function.

defmodule Enum2 do
  def each(enumerable2, fun) do
    case Enumerable2.next(enumerable2) do
      {:cont, val, rest} -> fun.(val)
                            each(rest, fun)
      {:halt, _, _} -> :ok
    end
  end
end

---------------------------------------------------------------------------

Enum2.each([1, 2, 3], &IO.inspect/1)
1
2
3
:ok

Enum2.each([one: 1, two: 2, three: 3], &IO.inspect/1)
{:one, 1}
{:two, 2}
{:three, 3}
:ok

Enum2.each(%{1 => 1, 2 => 2, 3 => 3}, &IO.inspect/1)
{1, 1}
{2, 2}
{3, 3}
:ok

Enum2.each(MapSet.new([1, 2, 3], &IO.inspect/1)
1
2
3
:ok

Enum2.each(1..3, &IO.inspect/1)
1
2
3
:ok

As you can see in the example above, the Enumerable2.next/1 is called on the first argument of the Enum.each/2 function. This will internally execute the right implementation based on the data type of the argument and return the tuple in the expected format. Since we made the public interface function next/1 to be low level, it can be easily reused later for additional enum functions such as map.

defmodule Enum2 do
  def each(enumerable2, fun) do
    case Enumerable2.next(enumerable2) do
      {:cont, val, rest} -> fun.(val)
                            each(rest, fun)
      {:halt, _, _} -> :ok
    end
  end

  def map(enumerable2, fun, acc \\ []) do
    case Enumerable2.next(enumerable2) do
      {:cont, val, rest} -> map(rest, fun, [fun.(val) | acc])
      {:halt, _, _} -> Enum.reverse(acc)
    end
  end
end
---------------------------------------------------------------------------
Enum2.map([1, 2, 3], &(&1 * 2))
[2, 4, 6]

Enum2.map(1..3, &(&1 * 2))
[2, 4, 6]

Default implementation

Protocols allow definition of default implementations that could be invoked instead of throwing the Protocol.UndefinedError at run time, when a protocol's public function is called on a data type without a protocol implementation. If you look at the allowed data type values that can be specified for the :for option in a protocol implementation, you will find a value called Any. When a protocol implementation is created for this Any data type, it will be used as the default implementation for all data types. But in order for the default implementation to be invoked, you have to use either the @fallback_to_any option or the @derive option. The @fallback_to_any option is used with a value true inside the protocol definition, which sets the default implementation to be called for all the public api functions for any data type that doesn't have a concrete implementation.

defprotocol Test do
  @fallback_to_any true
  def test(value)
  def test1(value)
end

defimpl Test, for: Integer do
  def test(_), do: IO.puts("Specific implementation for test/1")
  def test1(_), do: IO.puts("Specific implementation for test1/1")
end

defimpl Test, for: Any do
  def test(_), do: IO.puts("Default implementation for test/1")
  def test1(_), do: IO.puts("Default implementation for test1/1")
end

---------------------------------------------------------------------------

Test.test(1)
"Specific implementation for test/1"
:ok

Test.test1(1)
"Specific implementation for test1/1"
:ok

Test.test(:atom)
"Default implementation for test/1"
:ok

Test.test1([1,2])
"Default implementation for test1/1"
:ok

Instead of setting the default implementation for all the data types, you can use the @derive option for user-defined structs, inside their respective modules. The @derive option takes in a list of all the protocols for this particular data type. Only the default implementations of the listed protocols will be used for the data type. For the rest of the protocols not listed in the @derive option, the Protocol.UndefinedError will be thrown at run time, if it does not have the protocol implemented.

defprotocol Test do
  def test(value)
end

defimpl Test, for: Any do
  def test(_), do: IO.puts("Default implementation for test/1")
end

---------------------------------------------------------------------------

defmodule User do
  @derive [Test]
  defstruct [:age, :name]
end

defmodule User2 do
  defstruct [:age, :name]
end

---------------------------------------------------------------------------

Test.test(%User2{})
** (Protocol.UndefinedError) protocol Test not implemented for
 %User2{age: nil, name: nil} of type User2 (a struct)

Test.test(%User{})
Default implementation for test/1
:ok

Some built-in protocols like the Inspect protocol provide the ability to add options to the @derive attribute which can be used to create customised default implementations based on the options. The Inspect protocol provides options such as :only, :except and :optional, that takes in a list of fields that should be printed when the struct is inspected.

defmodule User do
  @derive [{Inspect, only: [:name, :age]}]
  defstruct [:age, :name, :address]
end

---------------------------------------------------------------------------

IO.inspect(%User{age: 20, name: "Trent", address: "12, E street"})
#User<age: 20, name: "Trent", ...>

If a user-defined protocol needs to take in options for @derive like above and provide customised Any implementations based on the options, then the __deriving__ macro must be implemented as required.

Internal structure

Internally whenever a protocol is defined, after compilation, a module will be created with the name of the protocol, and all of the specific implementations of that protocol will be created as nested modules within the protocol module. Every protocol module consists of implementations for all the public api functions and they also contain these functions. Every implementation module will also contain the protocol's public api implementations and the __impl__/1 function that provides the same functionality as the @for and @protocol attributes. For our example protocol definition, Enumerable2, the internal structure after compilation will be as follows.

Enumerable2.
  __protocol__/1
  impl_for!/1
  impl_for/1
  next/1

  List.
    __impl__/1
    next/1

  Map.
    __impl__/1
    next/1

  MapSet.
    __impl__/1
    next/1

  Range.
    __impl__/1
    next/1

As you can see above, both the protocol module and all of the implementation modules have a function implementation for the public function next/1. The public function implementation found in the protocol modules will just have the dispatching logic which will identify the correct implementation module and call the same method next/1 on the implementation module. The concrete implementations for the public function defined under defimpl, will be present under the respective implementation module. In our case, the Enumerable2.next/1 will just have the dispatching logic, that will either call one of the four concrete implementations, Enumerable2.List.next/1, Enumerable2.Map.next/1, Enumerable2.MapSet.next/1 or Enumerable2.Range.next/1 or throw Protocol.UndefinedError if there is no concrete implementation for the data type and no fallback implementation.

The Enumerable2.next/1 will internally call the Enumerable2.impl_for!/1 function that will perform the dispatching logic to return the correct concrete implementation module for the argument data type.

Enumerable2.impl_for!(1..5)
Enumerable2.Range

Enumerabl2.impl_for!([1, 2, 3])
Enumerable2.List

Enumerable2.impl_for(<<1>>)
** (Protocol.UndefinedError) protocol Enumerable2 not implemented for <<1>>
 of type BitString

Once the correct concrete implementation module is returned, then the next/1 function is called on that returned module to execute the correct concrete implementation. Internally, the impl_for!/1 function works differently for struct data types and built-in data types. For struct data types, the struct name can be easily obtained using the __struct__ field and can be concatenated with the protocol name to obtain the correct implementation module. But for built-in non struct types, a series of type checks have to be made one by one to identify the correct implementation module. Once the correct implementation module has been found, a check is made to make sure that the implementation module's code is compiled, loaded and available, after which the implementation module is returned.

The above dispatching logic happens every time a public api function call is made on a data type and this is not performant. Hence, elixir uses a process called protocol consolidation to speed up the dispatching logic at runtime. The protocol consolidation is performed after all of the modules are compiled. Once all the modules are compiled, you know exactly how many concrete implementations are present for a protocol. This information can be utilised and the dispatch logic present in the impl_for!/1 can be modified to directly have type checks only for the data types identified to have the concrete implementations and return the implementation module as soon as a match is found. In our case, after protocol consolidation, the impl_for! contains code that only checks if the argument is a struct of type MapSet or Range, else it checks if the data type is either a map or a list and then the respective implementation module is returned. If none of the above checks match, then the Protocol.UndefinedError is thrown. Protocol consolidation is turned on by default for all mix projects. Elixir also provides a module called Protocol that provides functions to check if a protocol has been consolidated and to manually consolidate protocols etc.

Built-in protocols

Elixir provides some built-in protocols that are used widely and are commonly extended to user defined structs. They are Collectable protocol which is used for collecting or accumulating values into a particular data type; Enumerable protocol used for enumerating and reading values from data structures; Inspect, Inspect.Algebra and Inspect.Opts protocols used for converting data structures into a readable format, used in logs; List.Chars used for converting a data structure into a charlist and the String.Chars protocol used for converting data structures into a binary or textual format. Let us create a user defined struct that implements the data structure Stack and create protocol implementations for all of the applicable built-in protocols. Again, the public api interfaces, their function signatures and return type information for any protocol implementation can be obtained from the respective protocol's documentation.

defmodule Stack do
  defstruct [elements: [], size: 0]

  def peek(%Stack{elements: elements, size: size}) do
    if size == 0, do: nil, else: hd(elements)
  end
  
  def pop(%Stack{size: size, elements: elements} = stack) do
    cond do
      size == 0 -> {nil, stack}
      true -> 
       {hd(elements), %Stack{stack | size: size-1, elements: tl(elements)}}
    end
  end
  
  def push(%Stack{size: size, elements: elements} = stack, el) do
    %Stack{stack | size: size + 1, elements: [el | elements]}
  end

  def size(%Stack{size: size}), do: size

  def empty?(%Stack{size: size}), do: size == 0
end

---------------------------------------------------------------------------

stack = %Stack{}
%Stack{elements: [], size: 0}

stack = Stack.push(stack, 1)
%Stack{elements: [1], size: 1}

Stack.peek(stack)
1

{popped, stack} = Stack.pop(stack)
{1, %Stack{elements: [], size: 0}}

Stack.size(stack)
0

Stack.empty?(stack)
true
defimpl Collectable, for: Stack do
  def into(stack) do
    collector_fun = fn
      acc, {:cont, elem} -> Stack.push(acc, elem)
      acc, :done -> %Stack{acc | elements: Enum.reverse(acc.elements)}
      _, :halt -> :ok
    end
    {stack, collector_fun}
  end
end
---------------------------------------------------------------------------
Enum.into([1, 2, 3], %Stack{})
%Stack{elements: [1, 2, 3], size: 3}

for x <- [1, 2, 3], into: %Stack{}, do: x
%Stack{elements: [1, 2, 3], size: 3}
defimpl Enumerable, for: Stack do
  def count(stack), do: {:ok, Stack.size(stack)}

  def member?(_), do: {:error, __MODULE__}

  def slice(stack), do: {:ok, stack.size, &(&1.elements)}

  def reduce(stack, {:cont, acc}, fun) do 
    cond do
      stack.size == 0 -> {:done, acc}
      true -> {popped, stack} = Stack.pop(stack)
              reduce(stack, fun.(popped, acc), fun)
    end 
  end
  def reduce(_, {:halt, acc}, _), do: {:halted, acc}
  def reduce(stack, {:suspend, acc}, fun) do
    {:suspended, acc, &reduce(stack, &1, fun)}
  end
end

---------------------------------------------------------------------------
stack = Enum.into([1, 2, 3], %Stack{})

Enum.map(stack, &(&1 * 2))
[2, 4, 6, 8]

Enum.count(stack)
4

Enum.sum(stack)
10

Enum.zip(stack, [:one, :two, :three, :four])
[{1, :one}, {2, :two}, {3, :three}, {4, :four}]

Enum.take_while(stack, &(&1 < 3))
[1, 2]

Enum.reduce(stack, 1, fn x, acc -> x * acc end)
24

Stream.take_while(stack, &(&1 < 3)) |> Enum.to_list
[1, 2]

defimpl Inspect, for: Stack do
  import Inspect.Algebra

  def inspect(stack, opts) do
    concat(["Stack(", Inspect.inspect(stack.elements, opts), ")"])
  end
end

---------------------------------------------------------------------------
stack = Enum.into([1, 2, 3, 4], %Stack{})

IO.inspect(stack)
Stack([1, 2, 3, 4])
defimpl String.Chars, for: Stack do
  def to_string(%Stack{size: size} = stack) do 
    cond do
      size < 6 -> "[#{Enum.join(stack, ", ")}]" 
      true -> "[#{Enum.take(stack, 5) |> Enum.join(", ")}...]"
    end
  end
end

---------------------------------------------------------------------------
stack = Enum.into([1, 2, 3, 4], %Stack{})

IO.puts(stack)
[1, 2, 3, 4]
:ok

stack1 = Enum.into([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], %Stack{})
IO.puts(stack)
[1, 2, 3, 4, 5...]
:ok

"stacks - #{stack}, #{stack1}"
"stacks - [1, 2, 3, 4], [1, 2, 3, 4, 5...]"

#elixir #protocol #polymorphism #basics #programing