Tag
1 articles
DashAttention uses adaptive sparse selection to keep hierarchical attention differentiable and improve long-context efficiency.