Tag
1 articles
A new vision representation lets text steer ViT features toward specific objects without giving up generic visual utility.