NLP For Poets
Pygmal
Pygmal is a Perl script I wrote for Dragomir Radev's Natural Language Processing class.
Pygmal is a program that generates variations on a sentence and ranks them according to their poetic meter. To do this, it uses the Charniak parser, the CMU Pronouncing Dictionary, and WordNet.
pygmal-0.0.2 is available under the Perl License.
pygmal takes the following arguments:
- pygmal -m [-h] file [metric] [-sc synonym count] [-sv synonym variation] [-cc compression count] [-cv compression variation] where file is the name of the file to process,
- metric is the meter to which pygmal will shape the file (e.g., "lss", the default, is a pattern of long-short-short),
- -sc synonym count is the number of sentences generated by replacing words with synonyms from Lexical Freenet (default 2),
- -sv synonym variation is the proportion of word senses to use from WordNet (default 0.2)
- -cc compression count is the number of sentences generated by deleting syntactic components (default 2),
- -cv compression variation is the probability of deleting any component with an appropriate tag (currently subordinate clauses, adjectival phrases, adverbial phrases, prep phrases, adjectives, and adverbs) and defaults to 0.2,
- and -h tells pygmal to show you what it's doing.
Pygmal returns some sentences (variations on the original) together with a number from 0 to 1. This number gives a measure of how well the sentence fits the meter.
Pygmal gets this number by attempting to fit the metric pattern against the meter of the line. It takes the starting position that gives the greatest proportion of fits. E.g., 0.5 means that for some starting position of the metric, half the syllables had the stress they should have. (There is one starting position for each syllable in the given pattern.)
Pygmal first parses a file with the Charniak parser. It then queries WordNet for synonyms of the nouns, verbs, adjectives, and adverbs. It takes the first N senses (ordered in WordNet by frequency). It next creates further variations by removing some of the syntactic components, removes the parsing, and ranks them according to how well they fit the specified meter.
BUGS:
- The semantic content of the sentences pygmal generates leaves much to be desired.
- Words sometimes get removed that shouldn't, e.g., in "Many of the X who ...", 'many' gets tagged as an adjective.
HISTORY:
(12/22/2004) 0.0.2
- Rewrote the synonym subroutine to use WordNet instead of Lexical Freenet. It checks that words are replaced by the correct part of speech and gives better synonyms. This also speeds up the program, since WordNet is stored locally.
- Redid the -sv argument so it now reflects quality instead of quantity.
- Pygmal now parses first, then looks for synonyms. This speeds things up since we're now only doing one parsing.
TODO:
- Add options for rhyme, alliteration, and assonance.
- Put the proper morphology on the synonyms so that more tags can be replaced with synonyms.
- Build things from the ground up, rather then choosing from random selections; make individual choices more intelligently. (check the meter of each possible synonym individually?)
- Test and improve the measurement of meter.
OLD VERSIONS:
pygmal-0.0.1