WAP PAPER

Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models

Pick your path

This hub links to four standalone pages: High School vs Grad depth, English vs 中文. Each page includes full sections and its own narrative.

HS · EN Grad · EN 高中 · 中文 研究生 · 中文

Paper facts

Authors: Jiaqi Leng, Xiang Hu, Junxiong Wang, Jianguo Li, Wei Wu, Yucheng Lu
Date: 20 Oct 2025 (arXiv preprint)
Venue: arXiv (cs.CL)
arXiv: 2510.17196
DOI: 10.48550/arXiv.2510.17196

Versions