Memory is essential for learning. In fact, memory is essential for life. Without memory, we would exist in the perpetual present, a void where we are unable to recall a past and incapable of anticipating a future. We wouldn’t recognise our loved ones or be able to plan our futures. And we wouldn’t be able to learn.
Despite the importance of memory, we still don’t really understand a great deal about how it works. Even with the thousands of laboratory studies and numerous case studies, we are only just beginning to understand how we store and recall information, and why it’s often so inaccurate.
What we can do is build theoretical models from the data we have collected and by doing this we shed a somewhat dim light on how our experiences are stored and recalled. The problem arises when we examine these different types of information and try to locate them within our cognitive architecture. Knowing that the capital of France is Paris is not the same as recalling the birth of our first child, and these recollections are not the same as knowing how to ride a bike. Knowledge and skills are both stored in memory, but we are yet to reach a consensus on whether or not they represent the same type of memory or, indeed, the same type of learning.
Models of Memory (1): The Multi-store Model
The first serious attempt to produce a coherent model of memory dates back to 1968. Psychology had begun to move away from more behaviourist approaches that saw humans as stimulus-response machines and examine the role of cognitive, or thought, structures. These internal components were seen as off-limits to the behaviourists, who believed that only that which could be objectively observed was worth studying.
With the advent of the microprocessor, coupled with a frustration of many researchers at the limitations of the behaviourist paradigm, this attitude began to change. Some behaviourist psychologists began to theorise that the human mind was analogous to a computer, complete with input, output and storage devices. It was a useful if not an entirely accurate metaphor, yet it did manage to lead to a greater interest in the function of memory.
Two early pioneers of cognitive psychology, Richard Atkinson and Richard Shiffrin, proposed that our memory systems could be explained using this computer analogy, producing the first model of memory. This became known as the multi-store model (sometimes referred to as the modal model).

It was called multi-store because it was the first model theorising that memory was composed of different stores (sensory, short-term and long-term). Evidence for these stores was obtained through laboratory studies that found, for example, that information is lost when rehearsal is prevented and that when asked to learn a list of items, people recall the first and last items but find it more difficult to recall those in the middle (the so-called serial position curve. See below).

While highly influential at the time and the catalyst for many decades of research, this early model had some serious flaws.
For example, rehearsal isn’t as important as the model seemed to suggest. Learning a list of words to be recalled at a later date does require rehearsal, but recalling say, what you had for breakfast this morning does not.
Similarly, while participants were often unable to recall items presented to them (especially when given a secondary distraction task or after a significant delay), rates of recall increased when participant where given a list of items and asked to indicate those that had appeared on the original list.
This early research did inform us of some important facts, however. Jacobs had already indicated that memory had a limited capacity back in 1888, but it was George Miller in 1956 who confirmed that short-term memory has a capacity of 7+/- 2 pieces of information (or chunks). Curiously, Miller’s paper was never meant to be taken particularly seriously and resulted in a number of unforeseen consequences.
More recently, Cowan has put this estimate at closer to 4, indicating that short-term memory is much more limited than we once believed.
There also appears to be differences in what is referred to as encoding – the manner in which information is transformed into a modality that can be stored. Alan Baddeley, for example, concluded that information in short-term memory is mainly stored acoustically (or sound-based) while long-term memories are primarily stored in terms of meaning (or semantically).
Short-term memory has a duration of seconds, but a number of factors play a role in exactly how many. If participants are given a list of words to memorise and then repeat them immediately afterwards, they would be far more likely to recall more than they would if we waited, say, ten seconds. Similarly, if we gave them a different task to complete (such as counting backwards for ten seconds) they would recall fewer still.

Models of Memory (2): The Working Memory Model
These and other findings prompted Baddeley (along with Graham Hitch) to propose a new model of short-term memory they called working memory.
Rather than STM being a static store, used for nothing more than temporarily holding information before passing it to LTM (or allowing it to decay or be displaced by new incoming information) they suggested that STM was a dynamic store that manipulated new information and old information stored in LTM for specific purposes.
Try to count the windows in your house without referring to any external cues. Just imagine your house and count the windows. This is, in fact, quite a complex task. You are taking the image of your house from LTM and dragging it into STM, where you use it to mentally travel through the rooms, identify the windows and tot them up. This is taking place in your working memory.
We also use working memory when we carry out mental arithmetic or construct a history essay from the information we have previously learned. The problem is, working memory has limited capacity in all of its systems, including the so-called central executive that coordinates resources between the phonological (sound-based) and visuospatial (image-based) components. It’s easy to overload these systems: try reading silently to yourself while repeating la la la out loud and you’ll soon realise that you can’t recall very much of what you’ve read.

Cognitive Load Theory (CLT) grew from this earlier research and the view that we need some way to circumvent this capacity issue. It’s not new in principle – it simply gives a name to the limitations highlighted by earlier working memory researchers.
CLT was proposed by Australian educational psychologist John Sweller who suggests that because of these cognitive limitations we need to engage with learning in a rather specific way. Most controversially, perhaps, Sweller proposes the use of direct instruction, especially when the to be learned information is new.
Complex novel elements will overload working memory, so the complexity of a task needs to be pitched at the right level. As new information is incorporated into LTM and new cognitive schemas arise, it leads to less load on cognitive resources and the level of complexity can increase.
CLT is supported by a wealth of evidence, both in terms of laboratory studies and RCT’s and I find it difficult to find fault with the theory on this count. That said, there does appear to be a reluctance to attempt to falsify these claims.
In addition, objective measures of cognitive load are difficult to obtain; we have very little knowledge (it would seem) of how much is too much and we are often forced to rely on subjective measures, as in, asking our participants, to identify this. The working memory model asserts that all components have limited capacity, without knowing what this capacity is or how to measure it.
Models of Memory (3): Levels of Processing
Despite having perhaps more practical application for teaching and learning, levels of processing hasn’t really hit the same spot with teachers seen with working memory and cognitive load. Levels of processing theory grew from studies conducted by Fergus Craik and Robert Lockhart in the early 1970’s and concerns itself with the depth of information processing rather than proposing a structured model.
The theory assumes that memory is what happens as a result of processing, that is, memory is a by-product of this processing, the deeper the processing, the more resilient the memory. It, therefore, avoids a stark distinction between long-term memory and short-term memory, dealing with processes rather than stores.
According to the model, information is processed in three ways, each corresponding to level or depth.
Structural Processing
The most shallow form of processing. Information is encoded based on its physical quality, that is how it appears. If the information is a word, let’s say, HOUSE, we might process it in terms of it being written in capital letters.
Phonetic Processing
An intermediate form of processing. Information is encoded based on how it sounds.
Semantic Processing
The deepest form of processing. Information is encoded based on its meaning and it’s relationship to other items with similar meaning.
Shallow processing would include methods such as rote repetition, or maintenance rehearsal and would lead to only short-term retention.
Deep processing, however, involves elaborate rehearsal, that is, making connections and associations as well as combining words with images. We might also relate the item to something personal about ourselves (see the self-reference effect).

No cognitive model is infallible or necessarily accurate, but some are useful. While early attempts have been superseded by more recent ones, they all have a part to play in the attempt to understand the process of human information storage.
Such models are useful in a myriad of ways, from teaching and learning to understanding and treating memory impairment caused by physical trauma or viral attack. There are, of course, many other models available and, no doubt, many more to come.
All models are wrong, but some are useful
George Box, British statistician (1919 – 2013)
Published by