Tag
vision-language models
3 articles

Research/Jun 8
MemDreamer tackles long-video overload
MemDreamer splits perception from reasoning to make hours-long video understanding fit in a tiny context window.

Research/Jun 3
IPT helps VLMs reason about hidden space
Imaginative Perception Tokens improve multimodal models’ ability to reason about unseen spatial structure.

Research/Jun 2
ProtoAda tackles multimodal continual tuning drift
ProtoAda adds format-aware prototypes and geometry-aware updates to reduce interference in multimodal continual instruction tuning.