print(dialogue) technical

 

print(dialogue)’s technical details

There are three aspects of print(dialogue): the computer program I wrote, the script the program produces, and the production itself. The computer program was written and theorized during a series of independent studies with NYU professor, Allison Parrish.

I used my personal MacBook Pro: (Retina, 13-inch, Mid 2014), Processor: 2.8 GHz Intel Core i5, Memory: 8 GB 1600 MHz DDR3, Graphics: Intel Iris 1536 MB. I wrote this computer program in the programming language Python. To organize my programming files and libraries, I used Jupyter Notebook through the Anaconda Navigator.

The Python program itself primarily utilizes a freely-available semantic network called ConceptNet. ConceptNet is “designed to help computers understand the meanings of words that people use.” ConceptNet was created at MIT in 1999, and has since grown to include knowledge from other projects. ConceptNet is based on a system of nodes. In this system, words are not defined by definitions, but by associations. ConceptNet is considered to be primarily hand-authored, meaning it is a database/library created by humans. Meaning, these associations are not what would be considered empirical data. In their article, “Representing General Relational Knowledge in ConceptNet 5,” Robert Speer and Catherine Havasi explain that “ConceptNet is a knowledge representation project, providing a large semantic graph that describes general human knowledge and how it is expressed in natural language.” ConceptNet’s “knowledge about ‘jazz’ includes not just the properties that define it, such as IsA(jazz, genre of music); it also includes incidental facts such as • AtLocation(jazz, new orleans) • UsedFor(saxophone, jazz), and • plays percussion in(jazz drummer, jazz).” Here is a diagram from Speer and Havasi’s article that visualizes ConceptNet’s knowledge representation project

The program I created navigates ConceptNet to create ’sensible’ dialogue, or dialogue that makes ‘sense.’ It does this by querying ConceptNet’s API in such a way that it creates the longest list (given a particular starting point) possible of related/connected concepts. One example of a generated list: “buy a hamburger, go to the store, buy food, cook a meal, follow the recipe, cooking dinner, follow a recipe, watch a tv, sitting down, you watch TV, relaxing, staying in bed, copulating, procreation, meeting girls, flirting, having fun, playing chess, recreation,” etc. It traverses ConceptNet in order to simulate dialogue that’s responses were associations from the last line. My reason for using ConceptNet to write dialogue lives in the fact that it goes beyond describing words and phrases through their lexical definitions. It describes relationships between words and phrases. 

After my program creates this long chain of related concepts from ConceptNet and stores them in the list data structure, I tag the first word from each of these concepts with its part of speech. To tag a part of speech means that a statistical model makes “predictions that generalize across the language – for example, a word following “the” in English is most likely a noun.”  In my program, if the phrase was ‘ran through the park,’ then ‘ran’ would be tagged as a past tense verb. For tagging parts of speech, I employed spaCy, an open-source Python library for natural language processing. 

The next step in the program involves creating sentences using each concept or phrase found in ConceptNet. Now that they are all tagged, they can be placed into Tracery grammars. Tracery is “a super-simple tool and language to generate text,” by Kate Compton. It allows me to create branching sentence structures for each concept. In a sense, the sentence structures are random, but they also have a logic to them. For example, if the ConceptNet phrase was ‘climbing a mountain,’ then I must decide what kind of sentence can start with an -ing verb. The sentence could start with ‘Why,’ ‘When,’ ‘Where,’ ‘How,’ or ‘Why.’ The next choice to make would be who is climbing the mountain, ‘you’ or ‘I.’ This way the sentence could be constructed as ‘Why are you climbing a mountain?’ or ‘Where am I climbing a mountain?’ and so on. A real worked example is the following (the bolded words are the ConceptNet phrases): 

Question: If you want to buy a hamburger, what's stopping you? 

Answer: I need to go to the store first.

After the dialogue is generated, the program creates a printable output for the actors to read.  

Bibliography:

Compton K., Kybartas B., Mateas M. (2015) Tracery: An Author-Focused Generative Text Tool. In: Schoenau-Fog H., Bruni L., Louchart S., Baceviciute S. (eds) Interactive Storytelling. ICIDS 2015. Lecture Notes in Computer Science, vol 9445. Springer, Cham

Honnibal, Matthew, and Ines Montani. “SpaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental Parsing.” To Appear, 2017.

Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. "ConceptNet 5.5: An Open Multilingual Graph of General Knowledge." In proceedings of AAAI31.