Publications
2025
FilBench: Can LLMs Understand and Generate Filipino?
EMNLP 2025 (Main)
We created a comprehensive benchmark for evaluating LLMs on PH-centric tasks, including cultural knowledge, reading comprehension, classical NLP, and generation.
The UD-NewsCrawl Treebank: Reflections and Challenges from a Large-scale Tagalog Syntactic Annotation Project
ACL 2025 (Main)
We created the largest Tagalog treebank to date, containing 100x more data than previous treebanks. Our project also revealed limitations in the Universal Dependencies framework especially on non-Indo-European languages.