ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists

Margo Schlanger Vivek S. Sankaran
2025