Up for Debate

IBM’s Watson question-answering system stunned the world in 2011 when it bested human champions of the TV trivia game show Jeopardy! Although the Watson brand has fallen on hard times, the company’s language-processing prowess continues to develop.

What’s new: Noam Slonim led a team at IBM to develop Project Debater, which is designed to compete in formal debates.

Key insight: A debate opens with four-minute opening statements by both sides followed by rounds of rebuttals and finally closing statements. To perform well, a debater must quickly prepare arguments supported by evidence, address competing arguments, and organize statements logically — a set of tasks too diverse for an end-to-end system. Instead, the team built a pipeline of independent components, each a complex system in its own right.

How it works: Project Debater receives a motion to argue in favor of or against. Then it’s off to the races finding facts, arguments, and counterarguments and stitching them together into speeches.

The argument mining component searches the 400 million articles in LexisNexis for relevant opinions and extracts evidence that backs or refutes them. A model based on a gated recurrent unit (a type of recurrent neural network) in conjunction with an SVM classifies whether an opinion supports or opposes the motion.
The argument knowledge base is a compendium of arguments, quotes, and analogies grouped into thematic classes. The system classifies the theme of the motion it’s arguing to find relevant arguments, both supporting and opposing. Claims are linked to counterclaims, so the system can rebut common opposing arguments and avoid concurring accidentally.
A rebuttal module turns an opponent’s speech into text using Watson Speech to Text. It compares the opponent’s arguments with those discovered by the earlier components using a combination of models including LSTMs, hand-written rules, and logistic regression. It uses the most relevant argument to form a rebuttal.
The debate construction component clusters arguments based on their theme. A rule-based system filters out similar arguments, picks the best paragraphs, and organizes them into a speech. Finally, a text-to-speech service synthesizes audio output.

Results: Project Debater is the first system of its kind, and no established benchmark exists to evaluate it. The researchers compared the quality (judged by humans on a scale of one to five) of the system’s opening statement with a speech on the same topic generated by a GPT-2 pretrained on a large text corpus and fine-tuned on speeches. Project Debater achieved an average score of 4.1, far outperforming the fine-tuned GPT-2’s score of 3.2.

Yes, but: Project Debater lost a 2019 competition with debate champion Harish Natarajan — albeit narrowly.

Why it matters: Building a system that can beat humans at competitive debate isn’t a multi-decade, multi-team project like winning at chess or Go, but it’s a substantial endeavor. So far, Project Debater has generated over 50 papers and spawned the subfields in claim detection and evidence detection.

We’re thinking: The AI community is embroiled in its own debates, including an annual event in Montreal. Maybe this system can participate next time around?