最新訊息


 2024/11/21
 【113.12.06】 政策科學資料分析全球線上論壇~2024系列Ⅳ講座 Fast-ER:GPU-Accelerated Record Linkage in Python


Dr. R. Michael Alvarez and Jacob Morrier
(California Institute of Technology)

Date:Friday, December 6 - 10:00AM~13:00PM  (Taiwan Time, GMT+8)
Topic:Fast-ER:GPU-Accelerated Record Linkage in Python 
Registration:https://reurl.cc/36Dz00(歡迎事先報名、註冊,填寫NCHU-姓名)
Webinar:(待通知)


Abstract:

Record linkage, also called "entity resolution," consists of matching observations from two datasets representing the same unit, even when consistent common identifiers are absent. This process typically involves computing string similarity metrics, such as the Jaro-Winkler metric, for all pairs of values between the datasets. The Fast-ER package accelerates these computations with graphical processing units (GPUs). It estimates the parameters of the Fellegi-Sunter model, a widely used probabilistic record linkage model, and performs the necessary data processing on CUDA-enabled GPUs. Our experiments demonstrate that this approach can increase processing speed by over 60 times, reducing processing time from hours to minutes, compared to the previous leading software implementation. This significantly improves the scalability of probabilistic record linkage and deduplication for large datasets.