查看原文
其他

CityReads│When Words Meet Numbers: What It Reveals about Writing

Ben Blatt 城读 2020-09-12

213


When Words Meet Numbers: What It Reveals about Writing


The written word and the world of numbers should not be kept apart.

Ben Blatt, 2018.Nabokov's Favorite Word Is Mauve: What the Numbers Reveal About the Classics, Bestsellers, and Our Own Writing, Simon & Schuster. 

Source: https://www.simonandschuster.com/books/Nabokovs-Favorite-Word-Is-Mauve/Ben-Blatt/9781501105395

https://www.npr.org/2017/03/31/521836700/nabokovs-favorite-word-is-mauve-crunches-the-literary-numbers

https://www.forbes.com/sites/kevinknudson/2017/04/30/book-review-nabokovs-favorite-word-is-mauve/#f29295ee0593

 

Famous novelists write in their own particular style. Ernest Hemingway used short sentences; Henry James used much longer ones; Virginia Woolf let them flow free-form from her mind. But say you're handed a particular piece of text. Could you figure out the author just by examining the words that appear?

 

That question, and other similar investigations, are the subject of Ben Blatt's Nabokov's Favorite Word is Mauve.

 

In Nabokov’s Favorite Word Is Mauve, statistician and journalist Ben Blatt brings big data to the literary canon, exploring the wealth of fun findings that remain hidden in the works of the world’s greatest writers. He assembles a database of thousands of books and hundreds of millions of words, and starts asking the questions that have intrigued curious word nerds and book lovers for generations: whether our favorite authors follow conventional writing advice about using cliches, adverbs and exclamation points (they mostly do)? Whether an algorithm can identify a writer from his or her prose style? Do men and women write differently? What are our favorite authors’ favorite words? Which bestselling writer uses the most clichés? What makes a great opening sentence? which authors use the shortest first sentences versus those who use the longest? How can we judge a book by its cover?

 

You are what you write


The book begins with a historical conundrum: who really wrote the Federalist Papers? A handful of them were in doubt as their authorship had been claimed by both Madison and Hamilton. For nearly two centuries, historians argued about who the real author was, teasing out evidence by the political slant of each essay. But no one seemed to find the convincingly answer.

 

The question was finally settled in 1963 by Mosteller and Wallace, a pair of statisticians. They approached the problem systematically: (a) count the frequency of common words in works known to be written by each man; (b) count the frequency of those same words in the disputed essays; (c) compare. In the end, it mostly came down to the use of whilst versus while--Madison used the former and Hamilton the latter. That, along with an array of other common words, confirmed Madison as the author of the unsettled manuscripts.

 

The answer was hidden in the words themselves—but to find them, scholars needed not a close reading, but a close counting. They needed to look only at the numbers.

 

He writes, she writes

 

It tuns out that authors who are women write equally about men and women, but men write overwhelmingly about men.

 

For every appearance of the word "she" in classics by male authors, Blatt found three uses of the word "he." In classics by women, the ratio was pretty much one-to-one.

 

Male authors of classic literature are three times as likely to write that a female character "interrupted" than male characters. In contemporary popular and literary fiction, the ratio is smaller, but it's still there.

 

 

  

Literary fingerprint

 

One of the core point of the book is “to test whether something like a literary fingerprint exists for famous writers.” It does, Blatt finds­—across their oeuvres, “authors do end up writing in a way that is both unique and consistent, just like an actual fingerprint is distinct and unchanging.”

 

Blatt looked for the specific words that authors use much more frequently than the rate at which those words generally occur in the rest of written English (i.e., compared to a huge sample of literary works — some 385 million words in total — written in English between 1810 and 2009, assembled by linguists at Brigham Young Univeristy).

 

His criteria: A favorite word

 

(1) Must occur in at least half of the author's books;

(2) Must be used at a rate of at least once per 100,000 words;

(3) Must not be so obscure that it's used less than once per million in the BYU sample of written English;

     (4) Is not a proper noun.

 

Here is some of his findings: three favorite words

 

Jane Austen: civility, fancying, imprudence

 

Dan Brown: grail, masonic, pyramid

 

Truman Capote: clutter, zoo, geranium

 

John Cheever: infirmary, venereal, erotic

 

Agatha Christie: inquest, alibi, frightful

 

F. Scott Fitzgerald: facetious, muddled, sanitarium

 

Ian Fleming: lavatory, trouser, spangled

 

Ernest Hemingway: concierge, astern, cognac

 

Toni Morrison: messed, navel, slop

 

Vladimir Nabokov: mauve, banal, pun (As Blatt points out, Nabokov had synesthesia, a condition that caused him to associate various colors with the sound and shape of letters and words. "Mauve" was his favorite: He used the word at a rate that's 44 times higher than the rate at which it occurs in the BYU sample of written English.)

 

Jodi Picoult: courtroom, diaper, diner

 

Ayn Rand: transcontinental, comrade, proletarian

 

J.K. Rowling: wand, wizard, potion

 

Amy Tan: gourd, peanut, noodles

 

Mark Twain: hearted, shucks, satan

Edith Wharton: nearness, daresay, compunction

 

Virginia Woolf: flushing, blotting, mantelpiece

 

Adverbs are not your friends

 

In literary lore, one of the best stories of all time is a mere six words. “For sale: baby shoes, never worn.” It’s the ultimate example of less is more, and you’ll often find it attributed to Ernest Hemingway. It’s unclear whether it was in fact Hemingway who penned these words. But it’s natural that writers and readers would want to attribute the story to the Nobel winner. He’s known for his economical prose, and the shortest-of-short stories is, at the very least, emblematic of his style.

 

Stephen King also famously said, “The road to hell is paved with adverbs”.

 

Blatt wanted to find out if writers like Hemingway and Stephen King and others, lived up to the hype. And if not, who does use the fewest adverbs? Which author uses them the most? Moreover, when we look at the big picture, can we find out whether great writing does indeed hew to those efficient “laws of prose writing”? Do the best books use fewer adverbs?

 

Here adverbs only include“the ones that usually end in -ly.” Blatt takes a corpus of novels written by a broad cross-section of famous writers (details of the data set are in the book, but it is expansive and comprehensive) and does frequency counts on the -ly adverbs.

 

Turns out, the adverb thing holds up: When Blatt combined several lists of the "Great Books" of the 20th century, he came up with 37 which were generally considered great.

 

When comparing these to the same authors' other novels, the "Great Books" used significantly fewer adverbs. Of these authors' books that kept to a strict adverb rate (less than 50 per 10,000 words) 67% were considered "Great," whereas only 16% of their adverb-loaded books (containing more than 150 per 10,000 words) were ever considered "Great."

 

 


No surprise that Hemingway is the most efficient writer by this metric. But this is just the beginning of the fun in addressing this question. Is each author uniformly efficient, or does it vary from book to book? Is Hemingway really the most efficient or just the most efficient on average? It turns out that William Faulkner wrote three books with a lower adverb rate (As I Lay Dying (31), The Sound and the Fury (42), The Unvanquished (46)) than Hemingway's lowest count (To Have and Have Not (52)).

 

In the epilogue, Blatt writes: ”Successful writers pen hundreds of thousands of words in their lifetime. In any other field with hundreds of thousands of data points it would be quite clear that the information could be mined to examine human behavior and psychology. I believe the same is true for examining words…The written word and the world of numbers should not be kept apart.”

 

This remark reminds me of a conversation between physicist Richard Feynman and his artist friend about art and science. His artist friend held up a flower and said, “I, as an artist, can see how beautiful a flower is. But you, as a scientist, take it all apart and it becomes dull.”

 

Feynman didn’t agree. He replied, “First of all, the beauty that you see is available to other people—and to me, too. Although I might not be quite as refined aesthetically as you are, I can appreciate the beauty of a flower. But at the same time, I see much more in the flower than you see. I can imagine the cells inside, which also have a beauty. There’s beauty not just at the dimension of one centimeter; there’s also beauty at a smaller dimension. (Science) only adds to the excitement and mystery and awe of a flower. It only adds. I don’t understand how it subtracts.”

 

To borrow from Feynman, numbers only add beauty to words. Not subtracts.

 


Related CityReads

06.CityReads│Life in the City Is Essentially One Giant Math Problem

16.CityReads│Writing Lessons from Stephen King

21.CityReads│A Red Capitalist in Shanghai in 1966

23.CityReads│How to Tell Lies with Maps?

35.CityReads│The Joy of Stats

38.CityReads│Sontag: What Makes Me Feel Strong?

44.CityReads│How Could Humanity Escape Poverty?

53.CityReads│What If Shakespeare Had A Sister?

81.CityReads│Ildefons Cerdà: Father of Urbanization and Modern Barcelona

87.CityReads│A Russian in Lijiang, 1941-1949

89.CityReads | Here is New York by E.B.White

96.CityReads│CityReads│Alexander von Humboldt: the man who invents the nature

98.CityReads│What Jane Jacobs Got Right and Wrong about Cities?

105.CityReads│Winners and Losers of Globalization

113.CityReads│Haruki Murakami: A Running Novelist & Translator

117.CityReads│Remembering Edutainer Hans Rosling,Who Made Data Dance and Taught us Fact-based Worldview

127.CityReads│Everybody Lies: How the Internet Reveals Who We Are

135.CityReads│Borges in Conversation

144.CityReads│Everyone Can Excel at Math & Science, If You Learn How to Learn

148.CityReads│A New Way of Learning Economics to Understand Real World

164.CityReads│May You Find Your Own Abbot Vallet

165.CityReads│Scale: Simple Law of organisms, Cities and Companies

167.CityReads│Poems for City and Urban Life

173.CityReads│Mary Wollstonecraft’s A Vindication of the Rights of Woman

174.CityReads│Such, Such Was George Orwell

175.CityReads│What Is the Best Way to Learn Statistics?

200.CityReads│The unforgettable Shanghai gentleman

203.CityReads│Becoming Leonardo da Vinci

211.CityReads│Learning Statistical Thinking for the 21st Century

212.CityReads│Industrial City Life under the Brush of L.S. Lowry

(Click the title or enter our WeChat menu and reply number )

CityReads Notes On Cities

"CityReads", a subscription account on WeChat, 

posts our notes on city reads weekly. 

Please follow us by searching "CityReads"  

Or long press the QR code  above


    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存