Can LLMs Generate Humor?

Humor is a culturally-constructed concept in human language, consisting of difficulties in comprehension and creation. Like other reasoning procedures, humor encompasses ongoing thinking and iteration instead of an instant spark of inspiration. However, the creativity and associated thinking inherent in producing humor is not easy even for humans. 

The paper by Wang et al. (2024) highlights the obstacles of producing humor by deploying Large Language Models (LLMs) because of the necessity of creativity and associative thinking. Moreover, some paradigms like the Creative Leap of Thought (CLoT) are criticized for their inability for producing humor successfully and the Creative Leap of Structured Thought (CLoST) is proposed as an answer. 

It concentrates on constant reflection and modification throughout two phases; (a) Automatic Associative Instruction Evolution (AAIE), and (b) Guided Exploratory Self-Improvement Tuning (GESIT), facilitating the judgment mechanism of LLMs and ability to generate humor. As the framework boosts reasoning, creative thinking, and humor detection in LLMs, assessments show that CLoST surpasses alternative models in both Chinese and English humor databases. 

Methodology

What was done in the methodology part was developing a framework for producing humor in Large Language Models (LLMs) by utilizing knowledge graphs and causal relationships between question-related entities, answer-related entities, and confounding entities, which is the area where the first two entities overlap. 

In the first step, Associative Automatic Instruction Expansion (AAIE) was utilized to enhance the model’s capability to produce responses creatively. The rewriter, imaginator, and analyst were involved in this phase, collaborating with each other to craft more complicated instructions through continuously revising and developing connections.

Following that, Guided Explorative Self-Improvement Tuning (GESIT) was employed to strengthen the framework’s ability of humor judgment. This was achieved by applying Direct Preference Optimization (DPO) to maximize the replies according to human selections with proficient oversight, aiding the framework to comprehend the causality in relationships among entities. LLM was trained both to produce humorous output and to advance its capability to overcome differentiated and complicated tasks during these steps.

Experiments

Subsequently, researchers have tested CLoST by building a collection of data in two languages (English and Chinese) and making a comparison with other LLM modals. A number of different humor-associated data collections, namely Oogiri-GO and SemEval 2020/2021, in addition to internal data, were included for training. To assess the humor judgment capabilities of frameworks, multiple-choice questions were prepared. In both languages, CLoST surpassed the other models, such as LLAMA3 and GPT-4o, in accuracy. It presented superiority in associative generalization when it was evaluated in terms of creativity. Moreover, according to Ablation studies, resulting from the effectiveness of several components, CLoST elevates reasoning and creative thinking.

Conclusion

In this paper, the framework of Creative Leap of Structured Thought (CLoST) to improve the generation of humorous content in large language models (LLMs) was presented by researchers. It also outperforms among other LLMS in logical reasoning and humor creation. CLoST utilizes Guided Explorative Self-Improvement to strengthen judgment and creative thinking. This study presents that CLoST enhances performance in tasks such as humor detection and humorous response production.

Reference

Wang, H., Zhao, Y., Li, D., Wang, X., Liu, G., Lan, X., & Wang, H. (2024). Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps. arXiv preprint arXiv:2410.10370.