I’m a researcher of artificial intelligence at DFKI and RPTU Kaiserslautern-Landau. My research interests include efficient deep learning, transformer models, multimodal learning, and computer vision. In my PhD project, my focus lies on the development of efficient transformer models for vision, language, and multimodal tasks.
M.Sc. in Mathematics, 2022
Leibniz University Hannover
B.Sc. in Computer Science, 2022
Leibniz University Hannover
B.Sc. in Mathematics, 2019
Leibniz University Hannover
A comprehensive benchmark and analysis of more than 45 transformer models for image classification to evaluate their efficiency, considering various performance metrics. We find the optimal architectures to use and uncover that model-scaling is more efficient than image scaling.
This paper introduces TaylorShift, a novel reformulation of the attention mechanism using Taylor softmax that enables computing full token-to-token interactions in linear time. We analytically and empirically determine the crossover points where employing TaylorShift becomes more efficient than traditional attention. TaylorShift outperforms the traditional transformer architecture in 4 out of 5 tasks.
If you have any questions, want to collaborate, or just want to chat, feel free to reach out to me.