01101011
10001001
10000001
11000001
10010001
10101001
11001001
11110000
01011011
01111110
00100001
11011011
00001110
10011111
10011000

Research Group:

Multimodal Learning Technologies

Head of the Research Group:

Prof. Dr. Daniele Di Mitri

Multimodal Learning Technologies is a cutting-edge research area that focuses on developing AI systems capable of processing and integrating multiple types of data inputs—such as text, images, video, audio, or even sensor data—to create more versatile and human-like models. This field explores advancements in deep learning architectures like transformers for multimodal tasks such as image captioning, speech-to-text translation, or video understanding. Applications span industries including healthcare (e.g., medical image analysis combined with patient records), education (e.g., interactive learning tools), and entertainment (e.g., immersive virtual reality experiences). The ultimate goal is to create intelligent systems that can seamlessly understand complex real-world scenarios.