Publications

2025

FilBench: Can LLMs Understand and Generate Filipino?

Lester James V. Miranda*, Elyanah Aco*, Conner Manuel*, Jan Christian Blaise Cruz, Joseph Marvin Imperial

EMNLP 2025 (Main)

PDF Code Leaderboard Poster Presentation

We created a comprehensive benchmark for evaluating LLMs on PH-centric tasks, including cultural knowledge, reading comprehension, classical NLP, and generation.

The UD-NewsCrawl Treebank: Reflections and Challenges from a Large-scale Tagalog Syntactic Annotation Project

Angelina A. Aquino*, Lester James V. Miranda*, Elsie Marie T. Or*

ACL 2025 (Main)

PDF Dataset Poster Presentation

We created the largest Tagalog treebank to date, containing 100x more data than previous treebanks. Our project also revealed limitations in the Universal Dependencies framework especially on non-Indo-European languages.

2023

calamanCy: A Tagalog Natural Language Processing Toolkit

Lester James V. Miranda

NLP OSS Workshop (NLP-OSS)

PDF Code Website