This paper introduces GAVIP, an interactive and immersive platform allowing for audio-visual virtual objects to be controlled in real-time by physical gestures and with a high degree of intermodal coherency. The focus is particularly put on two scenarios exploring the interaction between a user and the audio, visual, and spatial synthesis of a virtual world. This platform can be seen as an extended virtual musical instrument that allows an interaction with three modalities: the audio, visual and spatial modality. Intermodal coherency is thus of particular importance in this context. Possibilities and limitations offered by the two developed scenarios are discussed and future work presented.