NEWS最新消息


 2025/11/15
 【114.12.05】 政策科學資料分析全球線上論壇~2025系列講座 Extractive versus Generative Language Models for Political Conflict Text Classification


Patrick Brandt
(Professor,University of Texas at Dallas)

Date:Friday, December 5 - 10:00AM~13:00PM  (Taiwan Time, GMT+8)
Topic:Extractive versus Generative Language Models for Political Conflict Text Classification
Registration:https://reurl.cc/5bGAeR(歡迎事先報名、註冊,填寫NCHU-姓名)
Webinar:(待通知)


We review our recent ConfliBERT language model (Hu et al. 2022) to process political and violence related texts. When fine-tuned, results show that ConfliBERT has superior performance in accuracy, precision and recall over other large language models (LLMs) like Google’s Gemma 2 (9B), Meta’s Llama 3.1 (7B), and Alibaba’s Qwen 2.5 (14B) within its relevant domains. It is also hundreds of times faster than these more generalist LLMs. These results are illustrated using texts from the BBC, re3d, and the Global Terrorism Database (GTD). We demonstrate that open, fine-tuned models can outperform the more general models in terms of accuracy, precision, recall and at a fraction of the cost.
我們回顧近期提出的ConfliBERT語言模型(Hu等人,2022年),用於處理政治與暴力相關文本。經微調後,結果顯示ConfliBERT在相關領域的準確度、精確度與召回率表現,均優於其他大型語言模型(如Google的Gemma 2(90億參數)、Meta的Llama 3.1(70億參數)及阿里巴巴的Qwen 2.5(140億參數))。其運算速度亦較這些通用型LLM快數百倍。本研究透過BBC、re3d及全球恐怖主義資料庫(GTD)文本驗證上述結論,證實開放的、經微調的模型能在準確度、精確度與召回率方面超越通用模型,且成本僅需後者的一小部分。
 

Patrick T. Brand, Ph.D.

Professor of Public Policy, Political Economy, and Political Science in the School of Economic, Political, and Policy Sciences at the University of Texas at Dallas. His research employs time-series analysis and machine learning across a variety of areas. The main time series models employed include Bayesian statistics, multiple-equation or vector autoregression models, methods for producing and evaluating forecast quality, the derivation of new models for time series of counts, and modeling structural change and endogenous shifts. The machine learning work, in concert with computer scientists, has modernized how this work is done in political science and international relations. In addition, in recent years, this has shifted to work on event data technology for coding conflict and cooperation events about civil and international actors.

德克薩斯大學達拉斯分校經濟、政治和政策科學學院的公共政策、政治經濟學和政治學教授。他的研究在多個領域採用時間序列分析方法和機器學習。主要採用的時間序列模型包括貝葉斯統計、多重方程或向量自迴歸模型,這些方法可用於生成與評估預測品質、計數型時間序列新模型推導,以及結構變遷與內生性轉變的建模。與電腦科學家協作的機器學習研究,已將政治學與國際關係領域的實證研究方式現代化。此外,近年研究重心已轉向事件數據技術領域,致力於編碼涉以及國內與國際行為者的衝突與合作事件。