ICCV 2023 - Paris, France

WECIA: 1st Workshop on Emotionally and Culturally Intelligent AI

β€œLook Dave, I can see you're really upset about this."

HAL 9000 computer, 2001: A Space Odyssey

A Workshop on Emotionally and Culturally Intelligent AI, hosted by #ICCV2023.

zoom Zoom Live Stream Link


Workshop Program

All invited talks, oral presentations and the panel discussion will take place in Room E06 of the Paris Convention Center.
ICCV virtual conference link.
The workshop will happen October 3rd 2023, from 8:30 AM to 13:00 PM (Paris time).


Opening08:30 AM - 08:40 AM

Abstract. Despite the remarkable performance of NLP these days, current systems often ignore the social part of language, e.g., who says it, or what goals, and with what social implications, all of which severely limits the functionality of these applications and the growth of the field. This talk will discuss some of our recent efforts towards socially aware NLP via two studies. The first part looks at how large language models work in the context of social understanding, and how human-AI collaboration can reduce costs and improve the efficiency of social science research. The second part looks at linguistic prejudice with a participatory design approach to develop dialect-inclusive language tools and adaptation techniques for low-resourced language and dialect. I conclude by discussing the challenges and hidden risks of building socially aware NLP systems in the age of LLMs.
Bio. Diyi Yang is an assistant professor in the Computer Science Department at Stanford, affiliated with the Stanford NLP Group, Stanford HCI Group, Stanford AI Lab (SAIL), and Stanford Human-Centered Artificial Intelligence (HAI). She is interested in Computational Social Science, and Natural Language Processing. Yang's research goal is to better understand human communication in social context and build socially aware language technologies to support human-human and human-computer interaction.


Keynote09:15 AM - 09:50 AM

Abstract. Self-supervised learning is becoming an inevitable machine learning approach to learn rich representations from unlabeled data. However, in the era of large language models, it is unclear why we should pursue a paradigm that purely learns from visual observations without text. In this talk, I would like to discuss a motivation in relation to emotionally and culturally intelligent AI, and navigate through two key methods as effective pretext tasks for visual representation learning.
Bio. Xinlei Chen is a Research Scientist in FAIR labs at Meta AI, Menlo Park, CA. His current research interest is on pre-training, especially pre-training of visual representations with self-supervision and/or multi-modality. Chen was a PhD student educated at the Language Technology Institute, Carnegie Mellon University, while working at the Robotics Institute. He graduated with a bachelor's degree in computer science from Zhejiang University, China.
Abstract. We will discuss the gradual shift in terminology and focus when it comes to factuality: from ""fake news"" (focus on factuality), to ""disinformation"" (factuality + malicious intent), and finally to ""infodemic"" (focus on harm). Subsequently, there has been a gradual shift in research attention from factuality to trying to understand intent. One direction we proposed in this respect was to detect the use of specific propaganda techniques in text, e.g., appeal to emotions, fear, prejudices, logical fallacies, etc. Another direction has been to understand framing, e.g., COVID-19 can be discussed from a health, an economic, a political, or a legal perspective, among others. Yet another direction has been to better understand the type of text, e.g., objective news reporting vs. opinion piece vs. satire. All these have to do with emotions and were featured in our recent SemEval-2023 task 3, which further promotes multilinguality, covering English, French, Georgian, German, Greek, Italian, Polish, Russian, and Spanish. Yet another important aspect is multimodality, as Internet memes are much more influential than simple text. We will discuss our work on analyzing memes in terms of propaganda (SemEval-2021 Task 6), harmfulness, harm's target identification, role-labeling in terms of who is portrayed as a hero/villain/victim (CLEF'2024 shared task), and generating natural text explanations for the latter. Finally, we will discuss how emotions play a crucial role when training and aligning large language models (LLMs). In particular, we will share our experience from implementing quardrails for Jais, the best language model for Arabic, which is also on par with LLaMA 2 for English.
Bio. Preslav Nakov is Professor at Mohamed bin Zayed University of Artificial Intelligence. Previously, he was Principal Scientist at the Qatar Computing Research Institute, HBKU, where he led the Tanbih mega-project, developed in collaboration with MIT, which aims to limit the impact of "fake news", propaganda and media bias by making users aware of what they are reading, thus promoting media literacy and critical thinking. He received his PhD degree in Computer Science from the University of California at Berkeley, supported by a Fulbright grant. He is Chair-Elect of the Association for Computational Linguistics (ACL), Secretary of ACL SIGSLAV, and Secretary of the Truth and Trust Online board of trustees. Formerly, he was PC chair of ACL 2022, and President of ACL SIGLEX. He is also member of the editorial board of several journals including Computational Linguistics, TACL, ACM TOIS, IEEE TASL, IEEE TAC, CS&L, NLE, AI Communications, and Frontiers in AI. He authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and 250+ research papers. He received a Best Paper Award at ACM WebSci'2022, a Best Long Paper Award at CIKM'2020, a Best Demo Paper Award (Honorable Mention) at ACL'2020, a Best Task Paper Award (Honorable Mention) at SemEval'2020, a Best Poster Award at SocInfo'2019, and the Young Researcher Award at RANLP’2011. He was also the first to receive the Bulgarian President's John Atanasoff award, named after the inventor of the first automatic electronic digital computer. His research was featured by over 100 news outlets, including Forbes, Boston Globe, Aljazeera, DefenseOne, Business Insider, MIT Technology Review, Science Daily, Popular Science, Fast Company, The Register, WIRED, and Engadget, among others.


Keynote10:20 AM - 10:50 AM

Abstract. In the first part, to set the stage, we cover irresponsible AI: (1) discrimination (e.g., facial recognition); (2) phrenology (e.g., predicting emotions); (3) bad uses of generative AI (e.g., mental health) and (4) limitations (e.g., human incompetence). These examples do have a personal bias but set the context for the second part where we address three challenges: (1) principles & governance, (2) regulation and (3) our cognitive biases. We finish discussing our responsible AI initiatives and the near future.
Bio. Ricardo Baeza-Yates is Director of Research at the Institute for Experiential AI of Northeastern University. Before, he was VP of Research at Yahoo Labs, based in Barcelona, Spain, and later in Sunnyvale, California, from 2006 to 2016. He is co-author of the best-seller Modern Information Retrieval textbook published by Addison-Wesley in 1999 and 2011 (2nd ed), that won the ASIST 2012 Book of the Year award. From 2002 to 2004 he was elected to the Board of Governors of the IEEE Computer Society and between 2012 and 2016 was elected for the ACM Council. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow, among other awards and distinctions. He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989, and his areas of expertise are web search and data mining, information retrieval, bias on AI, data science and algorithms in general.
Abstract. Guided by semantic space theory, large-scale computational studies have advanced our understanding of the structure and function of emotional behavior. I will integrate findings from large-scale, multicultural experimental studies of facial expression (N=19,656), vocal bursts (N=12,616), speech prosody (N=30,109), multimodal reactions (N=8,056), and more. These studies combine methods from psychology and computer science to yield new insights into what expressive behaviors signal, how they are perceived, and how they interact with language to shape social interaction. Using machine learning to extract cross-cultural dimensions of behavior while minimizing biases due to demographics and context, we arrive at objective measures of the structural dimensions that make up human expression. Expressions are consistently found to be high-dimensional and blended, with their meaning across cultures being efficiently conceptualized in terms of a wide range of specific emotion concepts. In more applied studies, joint embeddings of expressions and language predict real-life outcomes more accurately than language alone. Altogether, these findings support and extend previous findings from the semantic space theory approach, generating a new atlas of expressive behavior. This new taxonomy departs from models such as the basic six and affective circumplex, suggesting a new way forward for training AI models that understand and learn from human expression.
Bio. Alan Cowen is an emotion scientist leading a lab and technology company called Hume AI and a former researcher at U.C. Berkeley and Google. His research uses computational methods to address how emotional behaviors can be evoked, conceptualized, parameterized, predicted, annotated, and translated, how they influence our social interactions and bring meaning to our lives. He has investigated responses to millions of emotional stimuli by tens of thousands of participants around the world, analyzed brain representations of emotion using fMRI and intracranial electrophysiology, investigated ancient sculptures, and used deep learning to measure facial expressions in millions of naturalistic videos from around the world.


Panel Discussion11:50 AM - 12:25 PM


Announcement12:25 PM - 12:30 PM



Welcome to the Workshop on Emotionally and Culturally Intelligent AI (WECIA)!

Affective computing focuses on incorporating emotional and subjective elements into machine learning models, allowing them to better understand and respond to human emotions This is an essential aspect of human intelligence, as emotions play a significant role in how we perceive the world around us. As deep learning models have improved with the introduction of large datasets, there has been a growing emphasis on going beyond factual information and including emotional experience. However, there are still significant challenges in developing machine learning models that accurately capture the complexity of human emotions and experiences.

This workshop will provide an opportunity to learn about the latest developments in affective computing, including the role of datasets in improving machine learning models and the milestones in the development of emotionally aware models. We will also explore the ethical and cultural concerns related to affective computing and the need for diversity and inclusiveness in the datasets used to train these models. Through panel discussions and expert presentations, we will delve into the challenges of creating emotionally and culturally aware AI that is also safe. This is a critical issue as we move towards more widespread use of machine learning models in various applications.


Invited Speakers

Ricardo Baeza-Yates

Director of Research Institute for Experiential AI of Northeastern University

Preslav Nakov

Professor MBZUAI

Alan S. Cowen

CEO & Chief Scientist Hume AI

Diyi Yang

Assistant Professor Stanford University

Xinlei Chen

Research Scientist FAIR, Meta AI


πŸ† ArtELingo Challenge

Our proposed challenge aims to improve emotion understanding and its connection to vision and language through two tasks: Emotion Recognition and Affective Image Captioning. Emotion Recognition requires models capable of predicting emotions from textual and visual input, while Affective Image Captioning involves generating captions dependent on both an image and an emotion. We use a multi-lingual, cross-cultural dataset, ArtELingo, to encourage the development of solutions that model emotions and cultural differences. Our goal is to inspire the creation of emotion-culture-aware AI systems, a significant advancement for the field.

In addition to modeling emotions, we aim to encourage solutions that also model cultural differences. Our ultimate objective is to stimulate the development of emotion-culture-aware AI systems within the community.

πŸ“¦ Dataset

ArtELingo is a large-scale multi-lingual dataset of affective experiences triggered by visual artworks and explained in captions.
It contains 1.4 million annotations in Arabic, English, and Chinese. Emotional experiences are inherently subjective, and existing datasets collected in English from western cultures fail to represent cultural diversity. ArtELingo captures both similarities and differences between cultures, providing a unique opportunity to explore cross-cultural emotional understanding. While many similarities exist, intriguing cultural differences exist. Our challenge aims to inspire the development of models that can better capture these differences.

πŸ“¨ Submission and metrics

We use the eval.ai platform to host the challenge.
As mentioned above, we propose two tasks; Emotion Recognition and Affective Image Captioning.

  • For Emotion Recognition, we use the F1 metric on a private test set that is not available to the contestants.
  • We provide a train and validation splits from the ArtELingo dataset for the participants to develop their models.
  • For the Affective Image Captioning task, we use standard captioning metrics such as BLEU, METEOR, ROUGE, and CIDEr.
In addition, we use an emotion alignment metric that reflects whether the generated caption express a particular emotion or not.

For both tasks, we evaluate the cross culture understanding via a held-out set from each language. Moreover, we evaluate an extreme cross culture scenario where we provide a few shots of annotations collected in Spanish and evaluate the submitted models on a private spanish test set.

Eval.ai challenges links:


🎯 Accepted papers

We are pleased to present the list of accepted papers for the WECIA workshop at ICCV 2023. These papers showcase the latest research and notable progress made in the field of emotionally-aware AI.

We extend our sincere appreciation to all the participants for their valuable contributions to the workshop. The WECIA workshop owes its success to the passion and expertise of these researchers. We cordially invite everyone to explore this curated collection of accepted papers.


Workshop Organizers

Youssef S. Mohamed

Ph.D. Student KAUST

Habib Slim

Ph.D. Student KAUST

Faizan Farooq Khan

Ph.D. Student KAUST

Kristen J. Ekkens

Executive Coach & Entrepreneur Stanford, KAUST

Philip Torr

Professor University of Oxford

Kenneth Ward Church

Professor Northeastern University

Mohamed Elhoseiny

Assistant Professor KAUST


πŸ“… Timeline

Archival track:
Event Date
Paper submission deadline August 1, 2023
Notification to authors August 07, 2023
Camera-ready deadline August 21, 2023
Non-archival track:
Event Date
Paper submission deadline August 20, 2023           
Notification to authors August 30, 2023           
Camera-ready deadline September 14, 2023           
Event Date
Release of training/validation data May 1, 2023
Challenge online July 12, 2023
Submission deadline September 15, 2023
Winners announcement October 2, 2023

For any question or support, please reach @Youssef.M.