As part of an upcoming book I am writing ( Part 2–3 of Conscious Artificial Intelligence ) I've slowly been recreating small artificial models of certain Neural Networks and Cognitive Abilities, the current plan is to create a simple consciousness analog down the line out of these parts, check these previous posts if you need some background:

Recreating Biological Short Term Memory
The early consciousness of sound
Engrams
The Problems With Artificial General Intelligence
Sparse Hierarchical Distributed Invariant Representations
The many things AI needs…
Neural Synchronization in the brain
Hearing for robots and AI
Conscious Artifical Intelligence C.A.I. Foundations.

In this post we will go over Working Memory, which can be considered at the limits of current comercial AIs and future ones, in other words I've found that the hard problems of AI and AGI are easy to see ( but not solve ) once you get what working memory is doing and how we are emulating it, but more on that later, let's start with a quick overview:

What is it ?

Working Memory WM is considered by some ( myself included ) to be different from Short Term Memory STM (although there is plenty of overlap with other areas/functions) and consists of the processes and systems needed to manipulate or use the contents of short term memory, so if short term memory is something you can remember (or recall ) in the seconds to minutes range, working memory is what you can do with that information during that time, I like specific examples, so if STM is remembering a series of numbers, working memory ( WM ) is the ability to do some operation with those numbers like adding them up…

None
On the top row you are presented with numbers on cards and are tasked with remembering them in the order they are presented, while on the second one you are asked to add them up as they are presented, this seems trivial until you are tasked with recreating it since the contents of the first addition( made from the contents of short term memory) need to then become short term memory for the next addition (more on the logic when we get to the code example). 
The caveat here is that we are also missing the concept of addition or summation but can easily bypass this for now if you imagine that instead of numbers we are given things and we then have to consciously (another big process ) group them by kind or similarity, more on these important caveats later.
You can also see how WM and STM can be easily confused due to their overlap, you need STM to hold things in memory before or simultaneously to then do operations with them via WM and they can bounce back and forth in between them.

Modalities and quirks

Before moving on with a simple model, I should mention that working memory works across a few modalities, we are perhaps more familiar with sound and vision but other modalities like touch and olfaction seem possible while the most common one ( motor working memory* ) is usually taken for granted, then there is the issue of how narrowly ( or broadly ) one defines working memory, from the previous example how do we know that those stimuli represent numbers and that those numbers represent quantity, the heuristic for adding up 2 numbers would also be missing…

* If for instance you are trying to learn how to type on a keyboard or do some mechanical complex task there's a complex interaction between motor cortex, cerebellum and other structures to rehearse and replay movements, see also implicit memory for overlap.

All this goes to say that the subject is not fully understood and has a lot of placeholder terms ( like many brain related things ) and depth, so we'll just stick with simple examples and analogs for now.

More Examples of working memory:
Imagining or recalling an object and rotating it in the minds eye, following visual directions, maze landmark navigation, image/visual related logic( see also visuospatial Sketchpad )
Many language related abilities like reversing words, counting letters, words, translating from one language to another, pig latin, inner voice . ( see also the phonological loop )
Smell/Olfaction related : Not well documented, but combining perceived smells or using them as place/situation markers seems common.
Touch/Haptics: Even less understood but sight impaired individuals seem to have no problem combining braille symbols and doing other memory related tasks based on touch, more common examples are combining pre-rehearsed or presented movements/poses into new ones. 
⚠️ These examples while helpful don't tell the full story, you could very well think these are all intelligence related tasks and not memory ones, so in order to make the distinction clear we need some way to differentiate working memory from other cognitive tasks and 
processes, so a model even a flawed/incomplete one can help...

Models of Working Memory

The preeminent model in Biology is Baddley's model of working memory :

Simplifying there are 4 main components: one conscious and the rest you can think of as happening without much of your conscious involvement but subserving a central executive that consciously* orchestrates the whole thing:

None
To simplify things, the bottom 2 rows are the contents of short term memory and the central executive is in charge of doing operations with said content, notice the flow of information goes both ways across elements. 
The phonological loop deals with audio data (e.g., a song or sound repeating aka ear-worm, and more normal auditory STM like language perception), the visuospatial sketchpad deals with visual data (e.g., rotating an object, analyzing/holding an image for recognition ).
The episodic buffer deals with multimodal information across time, so either/and of the previous 2 in an orderly way (e.g. you saw a red car honking and later it crashed ) 
The Central Executive in this model is in charge of attending or coordinating the information flow from the short term memory stores, but in a greater sense also provides context and attention ( presumably via higher decision making systems).
* Use awareness as a working/simplified/naive definition of consciousness. 

While this model has been successful in describing WM, it fails in the how department, but as we ( AI minded folks ) are not tasked with recreating biological WM but rather emulating it this model should be enough for the next sections…

Working Memory and AIs

Current AIs do work with some form of both short term memory and working memory but they quickly run out of abilities due to the lack of a central executive and everything behind it…

Perhaps the most similar thing to working memory you are familiar with is the simple UNDO function in programs, the commands you make get stored in STM and when you click undo they get undone or redone which is akin to the operational part of Working Memory.
Another common type of working memory can be found in a digital assistant, you can simply ask it to do an operation with numbers and it will at some point hold them in memory and do some operation with them, much like the code example we'll see in a second.
Unfortunately if you ask more complex time related events you don't get too far ( like asking the AI to recall complex scenery or do some operations with high concepts ), current Working Memory AI's or subroutines also can't generalize and for the most part are not multi domain...

So WM in AIs might become much more useful once we figure out the rest, for now the best we can do is a fairly crude facsimile…

Like my previous prototypes I'll be using off the shelf components (my webcam/laptop ) and Python as the programming language, they are not optimized or even cutting edge but hopefully novel and useful as examples.

Working Memory Analog. ( We'll be recreating the illustration from above ) The following script detects numbers via template matching from a live video feed, stores them in a short term memory buffer and then adds and displays the numbers up, we'll dissect the script in a second but first here it is in action:

None
Excuse the false positives with the 3/5, the number detector is very basic/crunchy but it does eventually work, compare it with the illustrated example up top ( or follow along by running the script )

And the code repo:

About the code:
I left it verbose non DRY so to make it slightly easier to follow...
The number detector is very basic and not optimized as it's not the focus here, instead of attention which is another big subject ( how does the AI know the when and why to add numbers ) it uses a time buffer which means if a number is detected for a continuos period of time it gets added to the STM store.
Working Memory is trivially reduced to a single line that adds up the contents of the STM store:
VARS['WM'] = int(sum(_VARS['STM']))
But it does have a tricky part, the new contents of working memory ( what we just added) need to in turn become part of the STM store (remember how information flows between components), something we do without much thought (another process we don't fully understand) but can simply be emulated.

A model for AI's

While both STM and WM seem to be relatively easy to emulate, as mentioned at the start of this post it becomes quickly evident that there's more to this if we want to achieve some sort of generalization or AGI and the reason is simply that we would need a lot of extra operations ( which would need to be self directed, interconnected and more ) to achieve some semblance of AGI.

Least we drown in all the things we don't know we can note the things we do via a small WM Model or pattern for AIs:

None
Working Memory Model for AIs, achievable/known elements in black and less known ones in grey.
Try matching the example we've been using if you get lost, detected numbers go into the STM store as Inputs, and become working memory through operations before becoming outputs or going back into the STM Store.
This picture/model is not as bad since we have a good chunk of functionality we can emulate, the missing bits do need some extra discussion:
Attention: While attention is considered a separate process or processes, it is still a crucial part of working memory since multiple inputs might be present, in our example we are limiting the script to recognize numbers but this is hard coded behavior, figuring out which inputs are relevant for a certain goal that might also change is another whole ball game as is discriminating from competing inputs.

Complex Operations: By complex I mean they are multimodal and use more complex inputs, think about a mathematician solving equations in his head or an artist coming up with new designs, both are based on years of memories, knowledge and reasoning.

Central Executive: In our running example we, the writers of the AI serve as the central executive since we are deciding on the goal ( add the last number detected ), what elements are relevant (detected numbers ) and when to add them to the STM store (after x amount of time), but this is not how things work in biology or advanced AIs, while I believe we humans and biological beings for the most part have hard coded goals (survive, reproduce) we can also generate our own goals, as for how we achieve them we display a varied and flexible repertoire of abilities, all this goes to say that recreating a central executive is a challenge we are just starting to understand.

The future and missing bits

So what would a multimodal AI with Working Memory and complex operations look like ? Good question, and this is purely futurism…

The most straightforward implementation might be in existing AIs, deep learning as is common right now predicts based on large datasets but is limited to a few domains and problem spaces, tying them to more complex, varied and interconnected ones might increase their utility and working memory can be the bridge, within ANNs you could also architect WM to get better results much like Long short-term memory already does.
What might be more interesting are AIs that use working memory to generate new behavior, these might not be very general in the domain space but might give creatives of all kinds a run for their money.
Working Memory seems critical in self reflection, the "voice" in your head is considered part of the phonological loop (still needs access to longer forms of memory and other areas though), what would an AI do with this ability is still unknown but I believe we will have far more advanced digital assistants and AI helpers for the creatives, jut a guess.

Takeaways

Working Memory is an important part of us, as species we advance via experimentation and working memory is an integral part of coming up with new and complex behavior, reasoning and many other abilities we consider intelligent.

In practice and in code it turns out that implementing working memory can be trivial and architecting AIs with it can be achievable with the help of models and prototypes, generalizing and making AIs multimodal is still a challenge but these are early days.

If anything I'd be surprised if whatever form AGI ends up taking Working Memory ( along with LTM/STM) will surely play a crucial role.

And lastly I hope this post helped you both understand WM, its relationship to current AIs, the challenges and implementing opportunities.

Thanks for reading.

Bibliography/Sources :