Augmenting Lateral Thinking in Language Models with Humor and Riddle Data for the BRAINTEASER Task
#Language Models #Lateral Thinking #BRAINTEASER Task #SemEval 2024 #DeBERTaV3 #Humor and Riddle Data #Multiple-Choice Architecture
📌 Key Takeaways
- Researchers augmented language models with humor and riddle data for improved lateral thinking
- The system achieved 92.5% accuracy on Sentence Puzzle and 80.2% on Word Puzzle subtasks
- Multiple-choice formulation improved accuracy by 10 points compared to sequence classification
- Data augmentation proved more effective for sentence-level than word-level reasoning
📖 Full Retelling
Researchers Mina Ghashami and Soumya Smruti Mishra have developed an innovative approach to enhance language models' lateral thinking capabilities by incorporating humor and riddle data for the SemEval 2024 BRAINTEASER task, presenting their findings in a paper submitted on May 16, 2024 and last revised on February 23, 2026, addressing the challenge of creative non-linear reasoning that remains underexplored in natural language processing. The BRAINTEASER task, which forms part of the SemEval 2024 evaluation workshop, challenges language models to perform lateral thinking—a cognitive process that defies conventional associations and requires creative problem-solving. The task consists of two subtasks: Sentence Puzzle and Word Puzzle, both designed to push beyond typical commonsense reasoning. The researchers' approach involved fine-tuning the DeBERTaV3 model using HuggingFace's AutoModelForMultipleChoice architecture, while augmenting the provided training data with two additional sources: a humor-style question-answering dataset generated via GPT-4 prompting and the RiddleSense dataset. This strategic augmentation was based on the observation that humor and riddles share the same lateral reasoning structure required by the BRAINTEASER task. Their system demonstrated impressive results, achieving 92.5% overall accuracy on the Sentence Puzzle subtask and 80.2% on the Word Puzzle subtask, ranking 6th out of 31 teams and 10th out of 23 teams respectively. Notably, the researchers discovered that framing the problem as multiple-choice rather than sequence classification yielded a significant 10-point accuracy improvement with the same base model. Further analysis revealed that while data augmentation with humor and riddle data proved particularly effective for sentence-level lateral reasoning, word-level puzzles continue to present a more substantial challenge, suggesting areas for future research in this domain of natural language processing.
🏷️ Themes
Artificial Intelligence, Natural Language Processing, Cognitive Computing
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
--> Computer Science > Computation and Language arXiv:2405.10385 [Submitted on 16 May 2024 ( v1 ), last revised 23 Feb 2026 (this version, v3)] Title: Augmenting Lateral Thinking in Language Models with Humor and Riddle Data for the BRAINTEASER Task Authors: Mina Ghashami , Soumya Smruti Mishra View a PDF of the paper titled Augmenting Lateral Thinking in Language Models with Humor and Riddle Data for the BRAINTEASER Task, by Mina Ghashami and 1 other authors View PDF HTML Abstract: The SemEval 2024 BRAINTEASER task challenges language models to perform lateral thinking -- a form of creative, non-linear reasoning that remains underexplored in NLP. The task comprises two subtasks, Sentence Puzzle and Word Puzzle, requiring models to defy conventional commonsense associations. We present a system that fine-tunes DeBERTaV3 using HuggingFace's AutoModelForMultipleChoice architecture. We augment the provided training data with two additional sources: (1) a humor-style question-answering dataset generated via GPT-4 prompting, and (2) the RiddleSense dataset. This data augmentation strategy is motivated by the observation that humor and riddles share the lateral reasoning structure required by the task. Our best system achieves 92.5\% overall accuracy on the Sentence Puzzle subtask and 80.2\% on the Word Puzzle subtask, ranking 6th out of 31 teams and 10th out of 23 teams, respectively. We further show that the choice of task formulation matters: framing the problem as multiple-choice rather than sequence classification yields a 10-point accuracy improvement with the same base model. Our analysis reveals that data augmentation with humor and riddle data is particularly effective for sentence-level lateral reasoning, while word-level puzzles remain a harder challenge. Comments: Accepted at SemEval 2024 (Colocated with NAACL 2024) Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG) Cite as: arX...
Read full article at source