Skip to content

Other tools: Dissociated Press in Emacs

November 4, 2010

Comed to usurparty sat sill
Please, remarking, I real to come be patted on this, there was
grow me to
introduch nonsense!

See: the next momently unting late. So she went on,
You are creatures wouldn’t seem to
The Cat’s head she sits purring.
A rabbit reted.
I cake.

Hush! Hersentence first sentence in “purpose”!
If you think I can listance.
Its tongueent in search of half the tone!

Then once considering about the gamine them raw.
It’s all the Particular;
the cook tulip-rowne of them different per way.

I could tell you are ver ask again.
Of that she
had somehind him.

 
 

November 3, selected lines, generated from character-based n-grams. Source text: Alice in Wonderland. Generator: Dissociated Press in Emacs

Ah, Emacs. There’s a reason I keep it as my default text editor: it’s amazingly powerful. One day a couple years ago I decided to do everything in Emacs. I read email, browsed the web, coded java, even played mp3s in Emacs. I mean, why shouldn’t your text editor be able to do all those things? And be open-source GPLed, so I could re-write it if, you know, I had time and the inclination? Sure, sometimes find myself using notepad or openoffice if I’m in a hurry because, you know, Emacs has it’s own way of doing things that can take a while to figure out. But given anything computational you may want to do, Emacs probably has something related to it.

For example, n-gram generators. Let’s say you wanna do character-based n-gram generation on a text. First, open the source text in Emacs:

Now to run Dissociated Press you enter “M-x dissociated-press” at the bottom, see it there? Well, you don’t actually press “M”, then “-“, then “x”, etc… ’cause, you know…. you just don’t! What you do is press “Alt” and “x” at the same time, which activates the uneditable minibuffer at the bottom and puts the M-x there for you automatically. Then you type “dissociated-press” and hit enter.

Awesome! Keep pressing “y” until you got what you need.

But hold up, what if you want to do word-based (rather than character-based) n-gram generation? Well, you gotta pass an argument to the program, and since this is emacs, well….
OK, listen, just do the following: 1) press “ESC” (escape), 2) press “-” and “1” and enter. Nothing will happen, but that’s cool, what we’re doing is just putting an argument in the buffer. 3) Then do Alt-x and type “dissociated-press” again. Before you press enter, you’ll see the “-1” before the “M-x dissociated-press” at the bottom:

Killer. Just press enter now, and:

We are teh leet!

So here’s the deal: the default dissociated-press generates from a character-based bigram. You can change the number of “characters of continuity” by entering a POSITIVE number in step 2 above; and the “words of continuity” by entering a NEGATIVE number in step 2 above. (Though it seems to floor out at 2; you’ll notice the text above doesn’t look like it was generated from a unigram word model.)

You can also modify the program (search for “dissociate.el”), if, you know, you’re down with the Lisp variant that emacs uses:

I ran across Dissociated Press in a Wikipedia page generated from an export of the Jargon File. Allegedly, Dissociated Press dates from 1972, when some MIT AI lab people developed it apparently without realizing that A.A. Markov did n-gram modeling round 1913 and Shannon did n-gram generation round 1951. (I think the MIT AI people back then were busy convincing themselves that they should use logical/symbolic approaches rather than neural approaches, which is why they weren’t really thinking of statistical approaches much.)

So there you go. You went all around the world looking for an n-gram generator, but it was there in your back yard all along. You get Emacs from gnu.org, ese! Is it straightforward to install on Windows? Well…

Leave a comment