MLLMs Construction Company - Investigating Multimodal LLMs’ Communicative Skills In a Collaborative Building Task
Marika Sarzotti, Giovanni Duca, Chris Madge, and 2 more authors
In Proceedings of the 11th Italian Conference on Computational Linguistics (CLiC-it 2025).
Code and data , Sep 2025
How effective are the communication choices of Multimodal Large Language Models when pursuing a common goal? Can they make use of common human dialogical patterns? We address these questions by engaging two agents based on the Mistral model in a collaborative building task, where one has to instruct the other how to build a specific target structure. The aim of this work is to investigate whether different prompting techniques with varying degrees of multimodality can influence the performance of MLLM-based agents in the proposed task.