Google Ngrams:  What Words are Most Often Found in Books?

- by Michael Stillman

Ngrams graph popularity of terms "flapper" and "hippie."

Leave it to the folks at Google to come up with another amazing new tool for us to use. I'm not yet sure of the practical uses for it, but it is something that will fascinate lovers of books and history for hours. It's called "Ngrams," and its existence relies upon the massive book-scanning project on which Google embarked in 2004.

 

At this point, Google Books' database contains scanned copies of some 15 million books. From this, Google has selected 5.2 million books, containing 500 billion words, for its Ngram word search. However, Ngrams does not simply match words. What it does is to determine how many books employ those words. They do not just provide a total, but place the matches on a chronological map. That way you can see how frequently a word has been used at various times. You can track the development, or antiquating of words or phrases by seeing how frequently they appear in books.

 

You can plot these graphs for single words or up to five words in combination. You can plot just one word or phrase, or several of them on the same graph to show a comparison. For example, the graph on this page is a comparison of the popularity of the words "flapper" and "hippie." "Flapper" reaches its peak in popularity in the 1920s, then tumbles, along with everything else, in the years of the Great Depression. Its use is then fairly constant over the past 70 years.

 

"Hippie," on the other hand, is a nonexistent word until the early 1960s. By the middle of that decade, it becomes more common in books than "flapper," a position it never relinquishes. It peaks in use around 1970, before settling down to regular, but less frequent usage.

 

Then there are the name changes. Compare "Hawaii" with the "Sandwich Islands." In the early days, the British gave the island chain the name "Sandwich Islands" in honor of the Earl of Peanut Butter and Jelly. That name starts showing up in the late 18th century, but "Hawaii" does not appear in the graph until the 1820s. It then slowly closes the gap, finally surpassing "Sandwich Islands" around 1890. Since then, the Sandwiches have slowly disappeared, while "Hawaii" became overwhelmingly more common.

 

This only applies to books in English, perhaps unsurprising because others may not have recognized the claims of the Earl's homeland. Books in French and German (you can sort these and several other languages separately) never showed much popularity for the "Sandwich" name, Hawaii becoming more common as early as the 1820s.

 

Another such example can be seen in the Turkish capital. For centuries it was known as "Constantinople." In 1930, the Turks changed its name to "Istanbul." There is no "Istanbul" in the books prior to this date, but within a couple of years, it quickly surpasses centuries-old "Constantinople."

 

Some times names get reused. "Engelbert Humperdinck" first appears near the turn of the 20th century as the music of the German composer gained popularity. His name peaked in the 1920s and then began to decline. However, in the 1960s, the name starts bouncing back up after the English crooner adopted the old composer's funny-sounding name as his own.

 

A similar pattern can be found for "Benjamin Harrison." Harrison was the name of a signer of the Declaration of Independence, while his grandson of the same name became President in 1884. Harrison has two peaks on the graph, a century apart.