Natalie Schrimpf gives Homecoming Lecture at Dartmouth
Ph.D. candidate Natalie Schrimpf gave this year’s Homecoming Lecture for the Program in Linguistics at her undergraduate alma mater, Dartmouth College. The Lecture is given annually during the College’s Homecoming celebrations by an alumnus or alumna who majored in linguistics and went on to pursue a career in the field.
For the Lecture, Natalie introduced her research on automatic text summarization, which she is conducting at Yale. Automatic text summarization is a natural language processing (NLP) task whose goal is to develop computer algorithms that can read documents and produce summaries of their content. In addition to being an interesting problem in machine learning, automatic text summarization has many important software applications. For example, search engines often use automatic text summarization to provide information that might help users decide which results are the most relevant.
A wide variety of techniques have been applied to the task of automatic text summarization. One very simple technique, for example, might be to simply use the first and last sentences of a document as its summary. More advanced techniques develop sophisticated statistical models to determine what parts of a document make the most important contributions to its content.
Natalie’s talk investigated whether the structure of a document might be exploited to produce better summaries that provide more comprehensive coverage of the document’s content. The technique she proposes divides the document into several portions, each of which represents a different topic. This is done by making use of rhetorical information. Using standard methods, Natalie’s technique produces a summary of each of these portions, and combines these summaries to produce a summary for the entire document. Because this new summary contains a summary for each topic discussed in the document, it is unlikely that a topic discussed in the document will be left out of the summary. After testing her algorithm, Natalie found that the summaries produced by combining summaries for each topic were of higher quality overall than those produced by simply summarizing a document as a whole without dividing it into topics.
The Homecoming Lecture was given on October 4 at Dartmouth College in Hanover, New Hampshire. The abstract for the talk is available on the Dartmouth College website.