On 8/3/2007, the journal Science will publish a research study entitled "Defusing the Childhood Vocabulary Explosion". This study examines a core process in language acquisition, the so-called vocabulary explosion (or word spurt). During the word spurt (which typically happens during the middle of the second year), children seem to transform from slow word learnings (learning maybe 1-2 words per week) to extremely efficient ones. "Defusing the Vocabulary Explosion" suggests that this may be the result of fundamental mathematical principles, not specialized learning mechanisms or radical transformation on the part of the child.

This paper has recieved a significant amount of media attention (I was pleasantly surprised!). Thus, I've put up this website to provide a little background on the model/proof and to provide links to some of the stories. I'd love to hear from you if you have questions, comments, or ideas, email me at


Coverage has by far exceeded anything I expected, and the media is to be applauded for picking up what I thought was a boring mathy study and describing it well! The embargo on this story ended at 2:00 (EDT, 1:00 Iowa Time) on Thursday, 8/2. As links accumulate, I'll put some here. The article will be released in Science on 8/3.
  • A number of science magazines have picked it up including Ars Technica, Live and the New Scientist, and Science News
  • The Why Files wrote what might be the most entertaining (and accurate) report on this I've seen. Don't miss it.
  • The Associated Press has also done a story that was picked up by papers all over the world: The Iowa City-Press Citizen, The Washington Post, the LA Times, Fox News, the Raleigh News & Observer (and more!).
  • I've done radio interviews for The CBS Hourly News, KCBS (San Francisco), KNX (Los Angeles), the Voice of America, CBC Radio (Toronto), and Associated Press Radio.
  • There's sort of a media explosion going on (I suppose I should model that) with many more articles and posting coming. I'll keep posting links as they become available.

The Study

Typically, children begin learning words at a very slow rate, perhaps only 1-2 per week. However, every parent can attest to the fact that at some point, most children's rate of learning accelerates significantly. This has been called the the word spurt or vocabulary explosion. This figure shows the normative growth curve from the MacArthur Communicative Development Inventory. The acceleration can be seen at around 14 months of age. To the right is a set of data taken from the norms of the MacArthur-Bates Communicative Development Inventory (MCDI). It shows the percentage of words (on the inventory) known by a large sample of children at each point in time. Development seems to accelerate here at around 13 months.
The cannonical view in the field is that at this acceleration point, something must happen in the brain of the child. There have been many mechanisms put forward (for a partial list see the colored boxs on the figure). Many of them seek to leverage the early words of the child to learn later ones.
As an example of such a mechanisms, consider Fastmapping by Mutual Exclusivity (first discovered by Sue Carey in 1978). Consider a child at the kitchen table who knows the word “fork” and “plate” but not “spoon”. When the child is asked for the spoon, they may infer that since they know the name for the pointy thing , and they know the name for the big round thing, the other thing must be a spoon. Critically, their knowledge of fork and plate is what helps them learn this new word. If they only knew one of the two words, they may not have been able to solve this problem. Thus, as the child accumulates more words he or she will be in a better position to learn new words in this way.
Although recent work conducted by my colleagues Larissa Samuelson and Jessica Horst is challenging how much children actually learn from selecting the correct object, fast-mapping is a good example of the kind of specialized mechanism that could in principle account for the vocabulary explosion.

But are such mechanisms even necessary to explain this phenomenon?

To answer this question, I conducted a series of very simple computational simulations that were designed to simplify word learning to the minimal computational problem. I then used this model to understand what the limits of it are. When should the vocabulary explosion be visible? What sorts of factors affect it? Do the assumptions of the model hold (e.g. how valid is it)?

The model makes two simple assumptions.
  1. Words are learned in parallel. That is, children can build partial representations for many words at the same time. They don't have to finish learning "mommy", for example, in order to start learning "daddy".

  2. Words vary in difficulty. That is, some words are more difficult (take more time) to learn than others. Crucially, the number of words in any difficulty level must be distributed such that there are relatively few easy words, and a greater number of moderate or difficult ones.

With these in hand, the model is very simple. First I initialize a set of words with varying degrees of difficulty. Then, on every time-step, each word gets a point. When a word crosses it's threshold, it is considered learned. This simple model, with no specialized mechanisms, shows a characteristic pattern of acceleration. You can see this in the figure to the right which plots the number of words that have crossed threshold (e.g. been learned) as as function of time (in the model). This model went through a very slow phase of acquisition where it only learned a few words until a little after 2000 time-steps at which point it took off.

We can use this simple model to understand what the effect of specialized mechanisms like fast-mapping may be. To simulate this, on each time-step we compute the number of new words the model has acquired. Then we added a point to all of the unlearned words (or two points if two words were learned on that trial). This simulates the idea that each newly learned words "helps" learn new ones. As you can see in the red curve on the left this model also shows the vocabulary explosion. However, it also raises the possibility of the converse--that is, what happens if each word incurs a cost to unlearned words? To simulate this, as each new word was learned, we deducted a point from all the unlearned words (there actually is some evidence for this, see work by Stager & Werker, 1997; Swingley & Aslin, 2006; and ongoing work in the MACLab. This is shown in the blue curve, where the vocabulary explosion is still evident.
In a sense then, these specialized mechanisms can alter the form of the vocabulary explosion, but not the fact that it occurs. By focusing on them, we may be missing the big picture in favor of the details.
The critical feature of the model that causes the acceleration is the number of words at any given difficulty distribution. In the first simulations, I assumed that word-difficulty was distributed as a Gaussian (or bell-curve) distribution. That is, there are lots of words with moderate difficulty, and only a few that are really easy or really hard. This can be seen on the right.

If this is the case, then the number of words learned at any given point in time is just the area under this curve, or the integral. By around 2600 time-steps, this model will have learned 1481 words (the sum of every point along that Gaussian from 0 to 2600). However, between 2600 and 3200 time-steps the model will acquire almost 4000. Notice the difference--the model learned many more words in the second block of time (a considerably shorter time than the first). This is the acceleration that everyone has noticed.

But why assume a Gaussian?

It seems obvious that what makes a word difficult to learn is influenced by many factors: its sound-pattern, whether it's meaning is abstract or concrete, how often it occurs, it's part of speech, just to name a few. What perhaps is not obvious is that when many independent distributions of difficulty (or anything for that matter) sum together they will usually approximate a Gaussian. This has been known in the statistics world for a long time, as the Central Limit Theorem and it's one of the the foundations of modern statistics.

But does it even have to be a Gaussian?

If you go back to the analysis of the area under the curve, the answer seems to be "no". In fact, any distribution in which there are few easy words and a greater number of difficult words should have this property. That is, if the distribution is monotonically increasing, its integral will be too.
With this in mind, the last few simulations looked at real language. Since the difficulty of any given word is difficult to estimate, the simplest thing was to assume that words that occur frequently are easy, and words that occur rarely are hard. I took the frequency of occurrence for the top 2000 words of English and scaled them to a reasonable difficulty metric. The model was then trained using this as the difficulty distribution. This model also showed the vocabulary spurt. The model was trained twice. Once on frequency statistics collected from speech between adults (the blue curve); and once for frequency statistics collected from speech to children (red curve). Both models showed the vocabulary spurt, but interestingly the child-directed speech model seemed to take-off faster (and the adult-directed one did better later).
While it's a bit premature to know for sure if child-directed speech really does encourage faster word learning, it does suggest that even in this simplistic analysis there may be something in the statistical distribution of words that is helpful.

But what about the fact that the model acquires all the words at the same time?

If we assume that each time-step represents a week or even a day this doesn't seem to be a problem -- children are likely to be exposed to thousands of words every day. Of course, this is typically how we measure the speed at which children learn words, since it is impossible measure whether or not they know a word every minute.
However, what if we soften this assumption a little? What if we assume that the model/child can keep track of many words in parallel but only can earn a point for one word at a time (e.g. as it hears them)? Well in this case, you would need to find a way to pick which word was heard at any given time. The easiest way to do this is to assume that the liklihood of a word being chosen is a function of its frequency--more difficult words appear relatively infrequently, easier words more frequently. When we do this, we still see the vocabulary spurt--even if all the words have a constant level of difficulty.
So what are the implications of this work?
Most importantly this model suggests that specialized word learning mechanisms are not needed to explain the vocabulary explosion. It doesn't disprove their existence -- in fact there is a lot of good empirical work in their favor. What it does sugges is that these mechanisms may not be the cause of this big phenomenon we see in development. While they can change the form of it, they cannot change the fact of it.

So what does explain the vocabulary explosion?
It appears that the necessary conditions to see it are two. First, that words are learned in parallel -- the system must be able to build a representation of many words at the same time. And Second, that words must vary in difficulty. Specifically there must be more difficult words than easy words. Both of these seem to be simple and easy to accept hypotheses?

Where does the distribution of word difficulty come from?
As I discussed, the distribution of word difficulty arises from many sources: frequency of occurence for sure, but also phonology, morphology, syntax, semantics and the context. In a sense, the structure of the language provides the structure for this critical factor of the child's environment. However, these things will be shaped by the proclivities and abilities of the child--children differ in their ability to learn different aspects of language and children's own interests and personality will shape the language that's used around them.

In a sense, the vocabulary explosion is not to be found in the genes or in the environment. If anything it arises out of the mathematics of learning, and out of a more unified idea, the organism/environment complex.
Page maintained by Bob McMurray
Last updated on 8/1/07