Financial Misinformation Detection
Introduction
In the financial sector, the accuracy of information is crucial for the integrity of decisions, market operation, risk management, compliance, and trust establishment. However, the proliferation of digital media has escalated the spread of financial misinformation. Such misinformation, including deceptive investment propositions and biased news articles, can manipulate market prices and influence economic sentiment, presenting substantial risks. The advent of LLMs in finance has introduced transformative potential for analysis [1], prediction [2], and decision-making [3]. In this challenge, participants are expected to engineer LLMs capable of not only identifying fraudulent financial content but also generating clear, concise explanations that elucidate the reasoning behind the classification by levering claims and contextual information. This requirement for explanation generation is crucial, as it adds an additional layer of transparency and trust in the AI's decision-making process, enhancing the model's utility for investors, regulatory bodies, and the broader financial community.
The goal of this challenge is to create a specialized LLM that excels in pinpointing financial misinformation and articulating its findings. By incorporating explanations, the model not only identifies misinformation but also educates users about the nature of the misinformation, leveraging a wide array of financial domain features such as income, finance, economics, and budget. This approach aims to fortify the model's effectiveness and contribute to a more transparent, accountable, and stable financial environment. The ability to detect and provide explanations for fake financial news is vital for safeguarding investors and diminishing adverse effects on financial markets, thereby promoting a well-informed and resilient financial ecosystem.
Task
This task tests the ability of LLM to verify financial misinformation while generating plausible explanations. Participants need to develop or adapt LLMs to identify financial claims (True'/'False'/'Not Enough Information') and give explanations for their decision according to the related information, following the designed prompt template of the query.
The following template is an example of constructing the instruction tuning data to support the training and evaluation of LLMs [4]. Also, participants can adjust the template to make full use of all information.
Task: [task prompt]. Claim: [claim]. Context: [context]. Prediction: [output1]. Explanation: [output2]
[task prompt] denotes the instruction for the task (e.g. Please determine whether the claim is True, False, or Not Enough Information based on contextual information, and provide an appropriate explanation.). [claim] and [context] are the claim text and contextualization content from the raw data respectively. [output1] and [output2] are the output from LLM.
Dataset
The task leverages the FIN-FACT [5] dataset, a comprehensive collection of financial claims categorized into areas like Income, Finance, Economy, Budget, Taxes, and Debt. The claim label categorizes claims as 'True', 'False', and 'NEI (Not Enough Information)'. The dataset contains the following information.
'claim': represents the core assertion.
'posted date': temporal information.
'sci-digest': claim summaries
'justification': justification offers insights into their accuracy to further contextualize claims.
'image link': Visual information.
'Issues': highlight complexities within claims.
'evidence': supporting information, which serves as the ground truth of explanations.
NOTE: Participants need to predict labels and generate explanations (evidence) simultaneously based on a single model using other information in the data. The blind test set will not include label and evidence columns. Closed-source LLMs (e.g. chatgpt, gpt4) can also be used. However, it is necessary to provide hyperparameters (especially the seed) and the required scripts to ensure reproducibility.
Data Examples:
Evaluation
The task uses metrics such as Accuracy, Precision, Recall, Micro-F1 for misinformation detection evaluation and ROUGE (1, 2, and L) [6], BERTScore [7] for explanation evaluation. The metrics ROUGE and BERTScore are commonly used to evaluate the quality of automatically generated text and its similarity to reference text.
We use the average of F1 and ROUGE -1 scores as the final ranking metrics.
Model Cheating Detection
To measure the risk of data leakage from the test set used in the training of a model, the Model Cheating, participants need to upload their final model to hugging face and the necessary scripts for Model Cheating Detection.
[1] When FLUE Meets FLANG: Benchmarks and Large Pretrained Language Model for Financial Domain. (https://aclanthology.org/2022.emnlp-main.148)
[2] Bloomberggpt: A large language model for finance. (https://arxiv.org/abs/2303.17564)
[3] Pixiu: A large language model, instruction data and evaluation benchmark for finance. (https://arxiv.org/abs/2306.05443)
[4] FMDLlama: Financial Misinformation Detection based on Large Language Models. (https://www.arxiv.org/abs/2409.16452)
[5] Fin-Fact: A Benchmark Dataset for Multimodal Financial Fact Checking and Explanation Generation. (https://arxiv.org/abs/2309.08793)
[6] ROUGE: A Package for Automatic Evaluation of Summaries. (https://aclanthology.org/W04-1013)
[7] BERTScore: Evaluating Text Generation with BERT. (https://openreview.net/pdf?id=SkeHuCVFDr)
Registration
Please choose a unique team name and ensure that all team members provide their full names, emails, institutions, and the team name. Every team member should register using the same team name. We encourage you to use your institution email to register.
Schedule
Registration Open: 15 August 2024
Practice set release: 2 September 2024
Practice data: Huggingface link
Baseline: Github link
Training set release: 15 September 2024
Training data: Huggingface link
Blind test set release: 30 October 2024
Systems submission: 7 November 2024
Release of results: 12 November 2024
Paper Submission Deadline (both regular and shared-task papers): 25 November 2024
Notifications of Acceptance: 5 December 2024
Camera-ready Paper Deadline: 13 December 2024
Workshop Date: 19-20 January 2025
Submission
The ACL Template MUST be used for your submission(s). The main text is limited to 4 pages. The appendix is unlimited and placed after references.
The paper title format is fixed: "[Model or Team Name] at the Financial Misinformation Detection Challenge Task: [Title]".
The reviewing process will be single-blind. Accepted papers proceedings will be published at ACL Anthology.
Shared task participants will be asked to review other teams' papers during the review period.
Submissions must be in electronic form using the paper submission software linked above.
At least one author of each accepted paper should register and present their work in person in FinNLP-FNP-LLMFinLegal-2025. Papers with “No Show” may be redacted. Authors will be required to agree to this requirement at the time of submission. It's a rule for all COLING-2025 workshops.
Shared Task Organizers
Zhiwei Liu - University of Manchester, UK
Keyi Wang - Columbia University, Northwestern University, USA
Zhuo Bao - Internet Domain Name System Beijing Engineering Research Center Co, China
Xin Zhang - University of Manchester, UK
Jiping Dong - University of Chinese Academy of Sciences, China
Boyang Gu - Imperial College London, UK
Kailai Yang - University of Manchester, UK
Dong Li - FinAI, Singapore
Qianqian Xie - FinAI, Singapore
Sophia Ananiadou - University of Manchester, UK; Archimedes RC, Greece
Contact
Contestants can communicate any questions on Discord in the #coling2025-finllm-workshop channel.
Contact email: fmd.finnlp@gmail.com