9  Attention and memory

[I]n an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.

Herbert Simon (1971) Designing Organizations for an Information-Rich World

Most economic analysis contains the implicit assumption that we make decisions using all information that is freely available to us. But think about the last purchase you made. What did you pay attention to? The price? Its quality? Any other features of the product? How this purchase could inform future choices? Your future income? The interest rate or potential gain from investing the money?

We have limited attention. This is likely to reflect the cognitive costs of applying greater attention, cognitive constraints on our ability to process the information, or in some cases a strategy for making better decisions (more on that below).

As our attention is limited, the task is often to attract it. Simon noted that many designers build systems as through the problem is information scarcity, rather than attention scarcity. Instead, we need systems that excel at filtering information and providing the most important information at the right time.

Related to our limited attention, we also have limited memory.

Short-term memory is that capacity for holding information in mind in a readily available state. If someone gave you a phone number that you were to immediately dial, this would involve short-term memory. Short-term memory is constrained. It is often measured through memory span tests, such as asking someone to recall a sequence of digits they have just heard. By that measure, short-term memory can typically hold around 4$$1 digits or “chunks”.

Related to (and often considered part of) short-term memory is working memory. Working memory involves the manipulation of stored information. Like short-term memory, it is constrained.

Long-term memory involves the indefinite storage of knowledge. Our long-term memory is incomplete, is highly selective, and fades with time. Further, it changes over time, and can be changed through the act of recall.

Constrained long-term memory and recall is a foundation of the availability heuristic. People tend to weight their judgements toward more recent terms or concepts that are readily available in memory. In determining the probability or frequency of an event, the more available events will be assessed as more probable. For example, when asked about the relative frequency of words starting with the letter K compared to those with K as the third letter, people assume relatively more of the former as words starting with K are easier to recall.

Our lack of attention and memory are a factor behind the success of techniques such as reminders to change people’s decisions or behaviour. Simple strategies such as text message have been found to improve outcomes such as increasing attendance at appointments, reducing missed credit card payments and reducing re-offending.

9.1 Less is more

While limited attention and memory is typically thought of as a constraint and source of error, in some instances it might support better decision making. Often, “less is more”, in that there is a beneficial degree of ignorance, or benefits to excluding information from consideration. For example, incomplete memory might lead to better learning of language (Elman (1993)).

Similarly, most machine learning techniques try to reduce the scope of the variables to which the algorithm pays attention to avoid overfitting. Overfitting is an over-sensitivity to the observed data in developing a model. The inclusion of every detail helps the model match the observed data, but prevents generalisation to new situations. Complex strategies can explain too much in hindsight. In an uncertain world where only part of the information is useful for the future, a simple rule that focuses on only the best or a limited subset of information has a good chance of hitting that useful information and less chance of incorporating irrelevant information.

The most common explanation for less-is-more effects is the bias-variance trade-off. Bias is the degree to which there are erroneous assumptions in your model. The classic case of bias is when you have failed to include a relevant predictor. If you exclude relevant predictors, you introduce bias as your predictive model will not include relevant relations between the predictors and the target output you are trying to predict. However, inclusion of too many predictors can lead to what is called variance, which is an error that arises because of the sensitivity of the model to fluctuations in the data you use to develop the model. It ultimately involves giving too much weight to irrelevant or marginally relevant information.

9.2 Selective attention test

You may have done this test from Simons and Chabris (1999) before. If not, give it a go.

Not everyone succeeds at this task. What do you consider to be the costs and benefits of the phenomena you just observed?