How Neural Networks Detect and Interpret Wordplay: New Insights from HSE Researchers

An international team including researchers from the HSE Faculty of Computer Science has presented KoWit-24, an annotated dataset of 2,700 Russian-language Kommersant news headlines containing wordplay. The dataset enables an assessment of how artificial intelligence detects and interprets wordplay. Experiments with five large language models show that even advanced systems still make mistakes, and that interpreting wordplay is more challenging for them than detecting it. The results were presented at the RANLP conference; the paper is available on Arxiv.org, and the dataset and the code for reproducing the experiments are available on GitHub.
Wordplay refers to deliberate use of language that violates linguistic norms in order to attract attention, entertain, or amuse the reader. It is common in Russian news headlines and can take various forms. For example, the headline ‘Osobo bumazhnye persony’ plays on the phrase ‘Osobo vazhnye persony’ (Russian for ‘very important persons’). The word vazhnye (‘important’) is replaced with bumazhnye (‘paper-related’), which rhymes with the original and shifts the meaning toward the topic of paper production. Another example is ‘Kod naklikal,’ the headline of an article about open-source code. It closely resembles ‘kot naplakal,’ an idiom meaning ‘very little,’ thereby creating a humorous ambiguity.
For human readers, such wordplay in headlines is immediately apparent and requires no explanation. However, large language models such as ChatGPT or GigaChat Max are often at a loss, struggling not only to detect the wordplay but even more so to explain the joke. One reason for this difficulty is the limited humour datasets on which LLMs are trained. In most cases, humour in these datasets is represented by canned internet jokes explicitly labelled as ‘jokes,’ which is insufficient for the models to learn why something is funny. In addition, such datasets contain almost no annotation—there are no machine- or human-readable layers of description indicating whether wordplay is present, what type of technique is used, what the headline refers to, and so on.
Researchers from the HSE Faculty of Computer Science, in collaboration with colleagues from IT:U—Interdisciplinary Transformation University Austria—and independent researchers, have created KoWit-24, a dataset dedicated to wordplay. It comprises 2,700 headlines from the Russian business daily Kommersant published between January 2021 and December 2023, along with contextual information: each headline is accompanied by a short description of the news story (the lead) and a summary. For each instance of wordplay, the authors manually annotated the type of technique, identified the anchors—the words that trigger the wordplay—and, where possible, linked the original expressions to relevant Wikipedia articles.
The authors adopted linguist Alan Scott Partington’s definition of wordplay, according to which wordplay occurs when the same expression can be interpreted in at least two ways and this effect is intentional. Wordplay can arise in several ways. One case involves ambiguity inherent in a word or its sound. For example, in the headline ‘Volgu ne mogut zastavit’ tech’ bystree,’ the word Volgu (Volga) refers both to the river and to a federal highway with the same name. Another case involves a slight modification of a well-known phrase or title, in which the author alters the wording while relying on the reader to recognise the original and complete the joke. For instance, ‘Missiya sokratima’ alludes to ‘Missiya nevypolnima,’ the Russian title of the film Mission: Impossible, while the headline itself suggests that a diplomatic mission can be downsized.
The researchers also distinguished ‘nonce words’—coined for a single occasion—and oxymorons, which combine two contradictory meanings. This approach not only allowed them to collect and describe examples but also to compare the performance of different language models.
After annotation, the authors tested the dataset on five LLMs: GPT-4o, YandexGPT-4, GigaChat Lite, GigaChat Max, and Mistral NeMo. Each model was provided with a headline and the corresponding news lead and asked to perform two tasks: first, to determine whether the headline contained wordplay, and second, to interpret it by identifying the original phrase or reference. The researchers compared the effects of two types of prompts: a simple prompt asking whether the headline contained wordplay, and an extended prompt providing a definition along with examples of different wordplay types. The extended prompt improved performance on the detection task for three of the five models, while GPT-4o demonstrated the strongest performance in both detection and interpretation. For all models, interpreting the source of the joke proved significantly more difficult than simply detecting the presence of wordplay.
Pavel Braslavski
‘KoWit-24 addresses two key limitations of earlier datasets: it provides context for each headline and includes multi-level annotation. This transforms a collection of examples into a full-fledged “testbed” for AI. It now allows for an objective comparison of models—whether a model can detect wordplay, identify the anchor, and correctly recall the original phrase or reference. Such verifiable metrics not only allow for a more accurate evaluation of current systems but also support their intentional improvement through selection of prompts, training examples, and fact-checking strategies. In the future, we plan to investigate whether this dataset can be used to enhance humour generation,’ says Pavel Braslavski, Associate Professor at the HSE Faculty of Computer Science and co-author of the paper.
In addition, the dataset establishes a common and transparent standard for evaluation, as researchers use the same data and experimental scripts. This reduces variability in the results and helps develop models that better understand natural language, rather than merely following the logical structure of the text.
See also:
Resource Race and Green Transition: Three Unexpected Conclusions from Foresight Centre’s Research on Climate and Poverty
Beneath the surface of green energy—which most people associate with solar panels, electric vehicles, and reduced CO2 emissions—lies a complex web of geopolitical interests, international inequality, and resource constraints. Researchers from the Laboratory for Science and Technology Studies (LST) at the HSE ISSEK Foresight Centre have published a series of articles in leading international journals on hidden and overt conflicts surrounding critically important metals and minerals, as well as related processes in the energy sector.
Immersion in Second Language Environment Influences Bilinguals’ Perception of Emotions
Researchers at the Cognitive Health and Intelligence Centre at the HSE Institute for Cognitive Neuroscience have discovered how bilingual individuals process emotional words in their native (first) and non-native (second) languages. It was found that the link between word meaning and bodily sensations is weaker in a second language than in a first language. However, the more a person is immersed in a language environment, the smaller this difference becomes. The article has been published in Language, Cognition and Neuroscience.
HSE Students Among Winners of Yandex High-Tech Startup Accelerator
Yandex has announced the results of its Yandex AI Startup Lab accelerator, whose final round featured 12 IT projects. Over the course of three months, their creators—students and young entrepreneurs—worked alongside the company’s experts to develop their products. Four startups in digital marketing, medicine, and robotics were named the best, with their teams receiving cash prizes and cloud resource grants. Among them was Gradius, a startup founded by students from HSE University.
Researchers Find More Effective Approach to Revealing Majorana Zero Modes in Superconductors
An international team of researchers, including physicists from HSE MIEM, has demonstrated that nonmagnetic impurities can help more accurately reveal Majorana zero modes—quantum states considered promising building blocks for quantum computing. The researchers found that these impurities shift the energy levels that typically obscure the Majorana signal, while leaving the mode itself largely unaffected, thereby making its spectral peak more distinct. The study has been published in Research.
New Development by HSE Scientists Helps Design Reliable Electronics Faster at a Lower Cost
Scientists from HSE MIEM have developed a new approach to modelling electrothermal processes in high-power electronic circuits on printed circuit boards (PCB). The method allows engineers to quickly and accurately predict how electronic components heat up during operation, helping prevent overheating and potential failures. The results have been published in Russian Microelectronics.
The Future of Cardiogenetics Lies in Artificial Intelligence
Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a program capable of analysing regions of the human genome that were previously inaccessible for accurate interpretation in genetic testing. The program adapts large generative AI (GenAI) models for cardiogenetics to predict how specific mutations affect the function of individual genes.
HSE Researchers: Young Russians Have Sufficient Knowledge About Money but Lack Money Management Skills
Adolescents and young adults in Russia today are well versed in financial terminology: they know what bank cards, loans, interest rates, and online payments are. However, as researchers at HSE University have found, real money-management skills remain poorly developed among most young people. The study ‘Financial Literacy, Financial Culture, and Financial Autonomy of Youth’ has been published in Monitoring of Public Opinion: Economic and Social Changes.
Why Weaker Competitors Give Up—and How to Keep Them in the Game
Anastasia Antsygina, Assistant Professor at HSE University’s Faculty of Economic Sciences, has developed a prize distribution model that maximises competitor engagement. She proposed revising the traditional ‘winner-takes-all’ approach and, in certain cases, offering a small reward even to those who have lost. According to her, this could increase participant motivation and make the competition more intense. The findings of her research were published in the Economic Theory journal.
HSE Researchers Compile Scientific Database for Studying Children’s Eating Habits
The database created at HSE University can serve as a foundation for studying children’s eating habits. This is outlined in the study ‘The Influence of Age, Gender, and Social-Role Factors on Children’s Compliance with Age-Based Nutritional Norms: An Experimental Study Using the Dish-I-Wish Web Application.’ The work has been carried out as part of the HSE Basic Research Programme and was presented at the XXVI April International Academic Conference named after Evgeny Yasin.
New Foresight Centre Study Identifies the Most Destructive Global Trends for Humankind
A team of researchers from the HSE International Research and Educational Foresight Centre has examined how global trends affect the quality of human life—from life expectancy to professional fulfilment. The findings of the study titled ‘Human Capital Transformation under the Influence of Global Trends’ were published in Foresight.


