parsing sentences for style
“Her hand and wrist were so finely formed that she could wear sleeves not less bare of style than those in which the Blessed Virgin appeared to Italian painters; and her profile as well as her stature and bearing seemed to gain the more dignity from her plain garments, which by the side of provincial fashion gave her the impressiveness of a fine quotation from the Bible,–or from one of our elder poets,–in a paragraph of to-day’s newspaper.”
– George Eliot, Middlemarch
I’m just gonna think about sentences for a while. I think it’s important for poetry.
I will mourn the loss of thousands of precious lives, but I will not rejoice in the death of one, not even an enemy. “Returning hate for hate multiplies hate, adding deeper darkness to a night already devoid of stars. Darkness cannot drive out darkness; only light can do that. Hate cannot drive out hate, only love can do that.”
– MLK, with an English teacher
So apparently the first sentence in the quote above was written by an English teacher, who then quoted MLK. But somehow the quote marks got lost, and a lot of people thought the first sentence was also by MLK. In the article where I first read about this, the writer says of the first sentence: “It didn’t sound right.” But… really? It sounds to me like it could be MLK. How does it sound wrong? More to the point: how could a computer make sentences that sound “right”? And how can we use that effect for poetry?
I believe most researchers who look at authorship attribution using stylometrics use lexical and bigram features, e.g Matthew L. Jockers, Daniela M. Witten, “A comparative study of machine learning methods for authorship attribution”, Literary and Linguistic Computing 2010 25(2):215-223, (http://llc.oxfordjournals.org/cgi/content/abstract/25/2/215)
but how would this scale down to the sentence level? What if you’re trying to figure out the authorship of a suspect fragment, and what if it’s mostly made up of words that are common across all documents in your corpora? What if you’re looking at very short forms like poetry?
I dunno. But I’m currently guessing that a stylometrics of poetry at the sentence level will need more than than lexical and n-gram features. And anyways this is a good excuse to parse sentences, which is just darn fun (if you’re not doing it for work!)
I wrote a couple articles recently, which gave me the opportunity to realize how bad my first-draft writing is. I wanted to parse my sentences to see exactly why they were bad, but I didn’t have time (deadlines!) so I forgot all about it. Then I saw a movie review that was bad in the same way that my first-draft writing is bad, so I parsed it instead. (or actually, the Stanford Parser did.)
“”The Beaver,” a dysfunctional-family melodrama about depression, and the ties that bind and throttle, and the tears that trickle but dry in the warm glow of said family, has a card up (or actually on) its sleeve, and it isn’t entirely Mel Gibson.”
– Manohla Dargis, NYT
OK, look, I shouldn’t say it’s bad… that would be like those annoying grammar dorks who make fun of people who don’t “speak right.” Every sentence has personality, and if the personality isn’t appropriate to the immediate task, well, that’s an effect that we should be aware of and call upon in some other context.
- “the ties that bind and throttle” – the parser gets this wrong by suggesting that it’s on the same level as “a dysfunctional-family melodrama”. But from syntax alone, it’s ambiguous: you need semantics to figure it out.
- “the tears that trickle but dry in the warm glow of said family” – similar problem. I think the right parse would have 3 NPs with CCs (“depression”, “the ties…”, “the tears…”) where the NP-NN “depression” currently sits. The last 2 NPs don’t actually add much; the phrases mix metaphors, are clichéd, and are redundant with “dysfunctional”. Think about it: this sentence uses 20 words to be redundant with half an adjective.
- “or actually on” – a parenthetical statement in a sentence of this complexity just adds to processing load. (I’m not sure what the “right parse” for this is…)
These are clearly going to have some kind of effect on the reader’s processing of the sentence, and on the personality and affect the reader imparts on the text. (what in the text? character? author? topic/theme? dunno. probably, it just provides an affordance whose interpretation is specific to a reader.) For me, it makes the author seem a little hyperbolic and prone to having too many run-on thoughts to express efficiently. In eRoGK7’s “Same” poems, it might have a different effect. Probably there’s some interaction between syntax, semantics, and situated meaning (text topic, reading context, author, maybe even what the lit theorists call “discourse”).
What really kills me is that the Eliot sentence at the beginning of this post is longer, but parses better. Look at the first part of it:
I love the phrase “not less bare of style”. I get it, but I somewhere in my brain some neural logic-processor is still trying to figure it out.
The parser seems to fail on “or from one of our elder poets”, which doesn’t actually add much. But for me, the sentence as a whole flows better than the movie review.
I think I need to read more about human language processing, especially models of errors and difficulties. I came across this video that seems like a good intro:
skipping through it (I hate talks…) it cites the “well-known result” that one of these two is more difficult to read:
“The reporter who attacked the senator admitted the error.”
“The reporter who the senator attacked admitted the error.”
Which one? Well, it turns out that human reading time is shorter on the former, which contains a subject relative clause instead of an object relative clause. If I’ve ever heard those phrases before, I promptly forgot them, btw.
Anyways, I’m not saying we need to become psycholinguists. We all develop an intuitive understanding of this. But personally, I’d someday like to track down a computational model of the effect of syntax on style and the aesthetic effect on a reader.