Selected publications


Have Large Vision-Language Models Mastered Art History?


BackFlip: The Impact of Local and Global Data Augmentations on Artistic Image Aesthetic Assessment


Aligning Object Detector Bounding Boxes with Human Preference


Color Equivariant Convolutional Networks


Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models


Are Current Long-Term Video Understanding Datasets Long-Term?


Video BagNet: Short Temporal Receptive Fields Increase Robustness in Long-Term Action Recognition


Humans Disagree with the IoU for Measuring Object Detector Localization Error


Long-Term Behaviour Recognition in Videos with Actor-Focused Region Attention