Audio-Visual Generation Call for Papers

Call for papers
https://link.springer.com/journal/11263/updates/27726868

We cordially invite you or your colleagues to submit related papers to
the Special Issue.


** Special Issue  "Audio-Visual Generation" **

The ability to simulate and reason about the physical world is central
to human intelligence. We perceive our surroundings and construct
mental models that allow us to internally simulate possible outcomes,
enabling reasoning, planning, and action—what we might call
“world simulators”. Similarly, developing a world simulator is
crucial for building human-like AI systems that can interact
effectively with dynamic and complex environments. Recent research has
shown that high-fidelity video generation models are a promising path
toward building such comprehensive and efficient world simulators.

However, the physical world is inherently multimodal. Human perception
mostly relies not only on visual stimuli but also on sound. Sound
often conveys critical information complementing what we can see,
providing a richer and more nuanced understanding of the
environment. To create world simulators capable of mimicking
human-like perception and reasoning, it is crucial to develop coherent
audiovisual generative models. Despite this, most modern approaches
focus on vision-only or language-visual modalities, often with less
focus on understanding and generating integrated audiovisual signals.

This special issue aims to spotlight the exciting yet underexplored
field of audio-visual generation as a key stepping stone towards
achieving multi-modal world simulators. Our goal is to prioritize
innovative approaches that explore this multimodal integration,
advancing both the generation and analysis of audio-visual content. In
addition to these approaches, we also aim to explore the broader
impacts of this research. Moreover, in line with the classical concept
of analysis-by-synthesis, advances in audiovisual generation can
foster improvements in analysis and understanding methods, reinforcing
the symbiotic relationship between these two areas. This research is
not merely about content creation; it holds the potential to form a
fundamental building block for more advanced, human-like AI systems.




Topics of interest: This special issue invites research articles
tackling the challenges and proposing novel creative ideas in
audio-visual generation. Potential topics of interest include, but are
not limited to:

* Audio and image/video generation
* Audio-conditional X generation
* Speech and talking avatar generation
* Advanced audio-visual adaptor or interface
* Benchmark and dataset
* Ethical considerations and social impact
* Generic topics and applications related to audio-visual generation


 
** Submission Guidelines **

Please submit via IJCV Editorial Manager: www.editorialmanager.com/visi
Choose SI: Audio-Visual Generation from the dropdown.

Submitted papers should present original, unpublished work, relevant
to one of the topics of the Special Issue. All submitted papers will
be evaluated on the basis of relevance, significance of contribution,
technical quality, scholarship, and quality of presentation, by at
least two independent reviewers. It is the policy of the journal that
no submission, or substantially overlapping submission, be published
or be under review at another journal or conference at any time during
the review process. Manuscripts will be subject to a peer reviewing
process and must conform to the author guidelines available on the
IJCV website at: https://www.springer.com/11263.


** Important Dates **

* Manuscript submission deadline: 15 March 2025
* First review notification: 25 May 2025
* Revised manuscript submission: 10 July 2025
* Final review notification: 10 August 2025
* Final manuscript submission: 20 September 2025
* Publication: Fall 2025
 
Organizers:

    Tae-Hyun Oh, KAIST, South Korea; 
    Shiqi Yang, SB Intuitions, SoftBank, Japan
    Zhixiang Wang, CyberAgent AI Lab, Japan
    Sergey Tulyakov, Snap Inc., USA
    Stavros Petridis, Imperial College London, UK; Meta, UK
    Vicky Kalogeiton, Ecole Polytechnique, IP Paris, France
    Ming-Hsuan Yang, University of California, Merced, CA, USA; Yonsei University, South Korea