acl-org · mjpost · Oct 8, 2025 · Sep 21, 2025 · Sep 21, 2025 · Sep 21, 2025
diff --git a/bin/ingest_orcids.py b/bin/ingest_orcids.py
@@ -69,11 +69,11 @@ def parse_paper_yaml(paper_path: str) -> List[Dict[str, str]]:
 @click.argument(
     'full_volume_id',
     type=str,
-    required=True,
+    required=False,
 )
 def main(
     paper_yaml: str,
-    full_volume_id: str,
+    full_volume_id: str = None,
 ):
     anthology_datadir = Path(sys.argv[0]).parent / ".." / "data"
     # anthology = Anthology(
@@ -86,6 +86,10 @@ def main(
     # people = AnthologyIndex(srcdir=anthology_datadir)
     # people.bibkeys = load_bibkeys(anthology_datadir)
 
+    if full_volume_id is None:
+        full_volume_id = Path(paper_yaml).name.replace(".yaml", "")
+        print(f"Taking full volume ID from file name: {full_volume_id}", file=sys.stderr)
+
     # Load the papers.yaml file, skipping non-archival papers
     papers = [p for p in parse_paper_yaml(paper_yaml) if p["archival"]]
     # print(f"Found {len(papers)} archival papers", file=sys.stderr)

diff --git a/data/xml/2024.acl.xml b/data/xml/2024.acl.xml
diff --git a/data/xml/2024.alvr.xml b/data/xml/2024.alvr.xml
diff --git a/data/xml/2024.arabicnlp.xml b/data/xml/2024.arabicnlp.xml
diff --git a/data/xml/2024.argmining.xml b/data/xml/2024.argmining.xml
diff --git a/data/xml/2024.c3nlp.xml b/data/xml/2024.c3nlp.xml
@@ -40,9 +40,9 @@
     </paper>
     <paper id="2">
       <title>Conformity, Confabulation, and Impersonation: Persona Inconstancy in Multi-Agent <fixed-case>LLM</fixed-case> Collaboration</title>
-      <author><first>Razan</first><last>Baltaji</last></author>
-      <author><first>Babak</first><last>Hemmatian</last></author>
-      <author><first>Lav</first><last>Varshney</last><affiliation>University of Illinois at Urbana-Champaign</affiliation></author>
+      <author orcid="0000-0003-0818-8717"><first>Razan</first><last>Baltaji</last></author>
+      <author orcid="0000-0001-6138-5782"><first>Babak</first><last>Hemmatian</last></author>
+      <author orcid="0000-0003-2798-5308"><first>Lav</first><last>Varshney</last><affiliation>University of Illinois at Urbana-Champaign</affiliation></author>
       <pages>17-31</pages>
       <abstract>This study explores the sources of instability in maintaining cultural personas and opinions within multi-agent LLM systems. Drawing on simulations of inter-cultural collaboration and debate, we analyze agents’ pre- and post-discussion private responses alongside chat transcripts to assess the stability of cultural personas and the impact of opinion diversity on group outcomes. Our findings suggest that multi-agent discussions can encourage collective decisions that reflect diverse perspectives, yet this benefit is tempered by the agents’ susceptibility to conformity due to perceived peer pressure and challenges in maintaining consistent personas and opinions. Counterintuitively, instructions that encourage debate in support of one’s opinions increase the rate of instability. Without addressing the factors we identify, the full potential of multi-agent frameworks for producing more culturally diverse AI outputs will remain untapped.</abstract>
       <url hash="8f8f9da2">2024.c3nlp-1.2</url>
@@ -51,10 +51,10 @@
     </paper>
     <paper id="3">
       <title>Synchronizing Approach in Designing Annotation Guidelines for Multilingual Datasets: A <fixed-case>COVID</fixed-case>-19 Case Study Using <fixed-case>E</fixed-case>nglish and <fixed-case>J</fixed-case>apanese Tweets</title>
-      <author><first>Kiki</first><last>Ferawati</last></author>
+      <author orcid="0000-0003-0717-0769"><first>Kiki</first><last>Ferawati</last></author>
       <author><first>Wan Jou</first><last>She</last><affiliation>Kyoto Institute of Technology</affiliation></author>
-      <author><first>Shoko</first><last>Wakamiya</last><affiliation>Nara Institute of Science and Technology</affiliation></author>
-      <author><first>Eiji</first><last>Aramaki</last><affiliation>Nara Institute of Science and Technology, Japan</affiliation></author>
+      <author orcid="0000-0002-9371-1340"><first>Shoko</first><last>Wakamiya</last><affiliation>Nara Institute of Science and Technology</affiliation></author>
+      <author orcid="0000-0003-0201-3609"><first>Eiji</first><last>Aramaki</last><affiliation>Nara Institute of Science and Technology, Japan</affiliation></author>
       <pages>32-41</pages>
       <abstract>The difference in culture between the U.S. and Japan is a popular subject for Western vs. Eastern cultural comparison for researchers. One particular challenge is to obtain and annotate multilingual datasets. In this study, we utilized COVID-19 tweets from the two countries as a case study, focusing particularly on discussions concerning masks. The annotation task was designed to gain insights into societal attitudes toward the mask policies implemented in both countries. The aim of this study is to provide a practical approach for the annotation task by thoroughly documenting how we aligned the multilingual annotation guidelines to obtain a comparable dataset. We proceeded to document the effective practices during our annotation process to synchronize our multilingual guidelines. Furthermore, we discussed difficulties caused by differences in expression style and culture, and potential strategies that helped improve our agreement scores and reduce discrepancies between the annotation results in both languages. These findings offer an alternative method for synchronizing multilingual annotation guidelines and achieving feasible agreement scores for cross-cultural annotation tasks. This study resulted in a multilingual guideline in English and Japanese to annotate topics related to public discourses about COVID-19 masks in the U.S. and Japan.</abstract>
       <url hash="a97cafc5">2024.c3nlp-1.3</url>
@@ -63,11 +63,11 @@
     </paper>
     <paper id="4">
       <title><fixed-case>CRAFT</fixed-case>: Extracting and Tuning Cultural Instructions from the Wild</title>
-      <author><first>Bin</first><last>Wang</last><affiliation>I2R, A*STAR</affiliation></author>
+      <author orcid="0000-0001-9760-8343"><first>Bin</first><last>Wang</last><affiliation>I2R, A*STAR</affiliation></author>
       <author><first>Geyu</first><last>Lin</last><affiliation>Institute of Infocomm Research, A*STAR</affiliation></author>
       <author><first>Zhengyuan</first><last>Liu</last><affiliation>I2R</affiliation></author>
       <author><first>Chengwei</first><last>Wei</last></author>
-      <author><first>Nancy</first><last>Chen</last></author>
+      <author orcid="0000-0003-0872-5877"><first>Nancy</first><last>Chen</last></author>
       <pages>42-47</pages>
       <abstract>Large language models (LLMs) have rapidly evolved as the foundation of various natural language processing (NLP) applications. Despite their wide use cases, their understanding of culturally-related concepts and reasoning remains limited. Meantime, there is a significant need to enhance these models’ cultural reasoning capabilities, especially concerning underrepresented regions. This paper introduces a novel pipeline for extracting high-quality, culturally-related instruction tuning datasets from vast unstructured corpora. We utilize a self-instruction generation pipeline to identify cultural concepts and trigger instruction. By integrating with a general-purpose instruction tuning dataset, our model demonstrates enhanced capabilities in recognizing and understanding regional cultural nuances, thereby enhancing its reasoning capabilities. We conduct experiments across three regions: Singapore, the Philippines, and the United States, achieving performance improvement of up to 6%. Our research opens new avenues for extracting cultural instruction tuning sets directly from unstructured data, setting a precedent for future innovations in the field.</abstract>
       <url hash="19d05c55">2024.c3nlp-1.4</url>
@@ -86,15 +86,15 @@
     <paper id="6">
       <title>Do Multilingual Large Language Models Mitigate Stereotype Bias?</title>
       <author><first>Shangrui</first><last>Nie</last></author>
-      <author><first>Michael</first><last>Fromm</last><affiliation>Fraunhofer Institute IAIS, Fraunhofer IAIS</affiliation></author>
-      <author><first>Charles</first><last>Welch</last><affiliation>McMaster University</affiliation></author>
+      <author orcid="0000-0002-7244-4191"><first>Michael</first><last>Fromm</last><affiliation>Fraunhofer Institute IAIS, Fraunhofer IAIS</affiliation></author>
+      <author orcid="0000-0002-3489-2882"><first>Charles</first><last>Welch</last><affiliation>McMaster University</affiliation></author>
       <author><first>Rebekka</first><last>Görge</last><affiliation>Fraunhofer Institute IAIS, Fraunhofer IAIS</affiliation></author>
-      <author><first>Akbar</first><last>Karimi</last><affiliation>Rheinische Friedrich-Wilhelms Universität Bonn</affiliation></author>
+      <author orcid="0000-0002-5132-2435"><first>Akbar</first><last>Karimi</last><affiliation>Rheinische Friedrich-Wilhelms Universität Bonn</affiliation></author>
       <author><first>Joan</first><last>Plepi</last><affiliation>Rheinische Friedrich-Wilhelms Universität Bonn</affiliation></author>
       <author><first>Nazia</first><last>Mowmita</last><affiliation>Fraunhofer Institute IAIS, Fraunhofer IAIS and Rheinische Friedrich-Wilhelms-Universität Bonn</affiliation></author>
       <author><first>Nicolas</first><last>Flores-Herr</last><affiliation>Max-Planck Institute and Fraunhofer Institute IAIS, Fraunhofer IAIS</affiliation></author>
       <author><first>Mehdi</first><last>Ali</last><affiliation>Fraunhofer Institute IAIS, Fraunhofer IAIS</affiliation></author>
-      <author><first>Lucie</first><last>Flek</last><affiliation>Rheinische Friedrich-Wilhelms Universität Bonn</affiliation></author>
+      <author orcid="0000-0002-5995-8454"><first>Lucie</first><last>Flek</last><affiliation>Rheinische Friedrich-Wilhelms Universität Bonn</affiliation></author>
       <pages>65-83</pages>
       <abstract>While preliminary findings indicate that multilingual LLMs exhibit reduced bias compared to monolingual ones, a comprehensive understanding of the effect of multilingual training on bias mitigation, is lacking. This study addresses this gap by systematically training six LLMs of identical size (2.6B parameters) and architecture: five monolingual models (English, German, French, Italian, and Spanish) and one multilingual model trained on an equal distribution of data across these languages, all using publicly available data. To ensure robust evaluation, standard bias benchmarks were automatically translated into the five target languages and verified for both translation quality and bias preservation by human annotators. Our results consistently demonstrate that multilingual training effectively mitigates bias. Moreover, we observe that multilingual models achieve not only lower bias but also superior prediction accuracy when compared to monolingual models with the same amount of training data, model architecture, and size.</abstract>
       <url hash="d20198a4">2024.c3nlp-1.6</url>
@@ -103,7 +103,7 @@
     </paper>
     <paper id="7">
       <title>Sociocultural Considerations in Monitoring Anti-<fixed-case>LGBTQ</fixed-case>+ Content on Social Media</title>
-      <author><first>Sidney</first><last>Wong</last><affiliation>University of Canterbury</affiliation></author>
+      <author orcid="0000-0002-8483-0540"><first>Sidney</first><last>Wong</last><affiliation>University of Canterbury</affiliation></author>
       <pages>84-97</pages>
       <abstract>The purpose of this paper is to ascertain the influence of sociocultural factors (i.e., social, cultural, and political) in the development of hate speech detection systems. We set out to investigate the suitability of using open-source training data to monitor levels of anti-LGBTQ+ content on social media across different national-varieties of English. Our findings suggests the social and cultural alignment of open-source hate speech data sets influences the predicted outputs. Furthermore, the keyword-search approach of anti-LGBTQ+ slurs in the development of open-source training data encourages detection models to overfit on slurs; therefore, anti-LGBTQ+ content may go undetected. We recommend combining empirical outputs with qualitative insights to ensure these systems are fit for purpose.</abstract>
       <url hash="d1d609cf">2024.c3nlp-1.7</url>
@@ -112,10 +112,10 @@
     </paper>
     <paper id="8">
       <title>Are Generative Language Models Multicultural? A Study on <fixed-case>H</fixed-case>ausa Culture and Emotions using <fixed-case>C</fixed-case>hat<fixed-case>GPT</fixed-case></title>
-      <author><first>Ibrahim</first><last>Ahmad</last><affiliation>Northeastern University</affiliation></author>
-      <author><first>Shiran</first><last>Dudy</last><affiliation>Northeastern University</affiliation></author>
-      <author><first>Resmi</first><last>Ramachandranpillai</last><affiliation>Institute for Experiential AI and Linköping University</affiliation></author>
-      <author><first>Kenneth</first><last>Church</last><affiliation>Northeastern University</affiliation></author>
+      <author orcid="0000-0001-9514-1807"><first>Ibrahim</first><last>Ahmad</last><affiliation>Northeastern University</affiliation></author>
+      <author orcid="0000-0002-7569-5922"><first>Shiran</first><last>Dudy</last><affiliation>Northeastern University</affiliation></author>
+      <author orcid="0000-0002-4302-9327"><first>Resmi</first><last>Ramachandranpillai</last><affiliation>Institute for Experiential AI and Linköping University</affiliation></author>
+      <author orcid="0000-0001-8378-6069"><first>Kenneth</first><last>Church</last><affiliation>Northeastern University</affiliation></author>
       <pages>98-106</pages>
       <abstract>Large Language Models (LLMs), such as ChatGPT, are widely used to generate content for various purposes and audiences. However, these models may not reflect the cultural and emotional diversity of their users, especially for low-resource languages. In this paper, we investigate how ChatGPT represents Hausa’s culture and emotions. We compare responses generated by ChatGPT with those provided by native Hausa speakers on 37 culturally relevant questions. We conducted experiments using emotion analysis. We also used two similarity metrics to measure the alignment between human and ChatGPT responses. We also collect human participants ratings and feedback on ChatGPT responses. Our results show that ChatGPT has some level of similarity to human responses, but also exhibits some gaps and biases in its knowledge and awareness of Hausa culture and emotions. We discuss the implications and limitations of our methodology and analysis and suggest ways to improve the performance and evaluation of LLMs for low-resource languages.</abstract>
       <url hash="7e10b534">2024.c3nlp-1.8</url>
@@ -125,7 +125,7 @@
     <paper id="9">
       <title>Computational Language Documentation: Designing a Modular Annotation and Data Management Tool for Cross-cultural Applicability</title>
       <author><first>Alexandra</first><last>O’Neil</last><affiliation>Indiana University at Bloomington</affiliation></author>
-      <author><first>Daniel</first><last>Swanson</last><affiliation>Indiana University</affiliation></author>
+      <author orcid="0000-0002-9847-8111"><first>Daniel</first><last>Swanson</last><affiliation>Indiana University</affiliation></author>
       <author><first>Shobhana</first><last>Chelliah</last><affiliation>Indiana University at Bloomington</affiliation></author>
       <pages>107-116</pages>
       <abstract>While developing computational language documentation tools, researchers must center the role of language communities in the process by carefully reflecting on and designing tools to support the varying needs and priorities of different language communities. This paper provides an example of how cross-cultural considerations discussed in literature about language documentation, data sovereignty, and community-led documentation projects can motivate the design of a computational language documentation tool by reflecting on our design process as we work towards developing an annotation and data management tool. We identify three recurring themes for cross-cultural consideration in the literature - Linguistic Sovereignty, Cultural Specificity, and Reciprocity - and present eight essential features for an annotation and data management tool that reflect these themes.</abstract>