Up for Debate

IBM’s Watson question-answering system stunned the world in 2011 when it bested human champions of the TV trivia game show Jeopardy! Although the Watson brand has fallen on hard times, the company’s language-processing prowess continues to develop.

What’s new: Noam Slonim led a team at IBM to develop Project Debater, which is designed to compete in formal debates.

Key insight: A debate opens with four-minute opening statements by both sides followed by rounds of rebuttals and finally closing statements. To perform well, a debater must quickly prepare arguments supported by evidence, address competing arguments, and organize statements logically — a set of tasks too diverse for an end-to-end system. Instead, the team built a pipeline of independent components, each a complex system in its own right.

How it works: Project Debater receives a motion to argue in favor of or against. Then it’s off to the races finding facts, arguments, and counterarguments and stitching them together into speeches.

Results: Project Debater is the first system of its kind, and no established benchmark exists to evaluate it. The researchers compared the quality (judged by humans on a scale of one to five) of the system’s opening statement with a speech on the same topic generated by a GPT-2 pretrained on a large text corpus and fine-tuned on speeches. Project Debater achieved an average score of 4.1, far outperforming the fine-tuned GPT-2’s score of 3.2.

Yes, but: Project Debater lost a 2019 competition with debate champion Harish Natarajan — albeit narrowly.

Why it matters: Building a system that can beat humans at competitive debate isn’t a multi-decade, multi-team project like winning at chess or Go, but it’s a substantial endeavor. So far, Project Debater has generated over 50 papers and spawned the subfields in claim detection and evidence detection.

We’re thinking: The AI community is embroiled in its own debates, including an annual event in Montreal. Maybe this system can participate next time around?