EchoFoley

Event-Centric Hierarchical Control for Video-Grounded Creative Sound Generation

Bingxuan Li^1,2,4 · Yiming Cui¹ · Yicheng He¹ · Yiwei Wang³ · Shu Zhang¹ · Longyin Wen¹ · Yulei Niu¹

¹ByteDance Intelligent Creation ²University of Illinois Urbana-Champaign ³University of California, Merced ⁴University of California, Los Angeles

Paper Dataset (coming soon) Code (coming soon)

Motivation of EchoFoley: Given a silent video, we generate creative, story-aligned soundtracks with fine-grained, event-level control over how each sound is crafted and transformed over time.

examples

Demos

Qualitative comparison across multiple creative instructions.

We show example videos generated by EchoVidia, comparing against three baselines for a range of creative, event-centric instructions. model.

Instruction 1

Instance Level Control

Add a sound of match scratching to the ignition sound at 00:05.

HunayuanVideoFoley

MMAudio

Thinksound

EchoVidia (ours)

Instruction 2

Instance Level Control

Insert a metallic pulse explosion sound right after the finger touches the interface, when the circuit lines appear.

HunayuanVideoFoley

MMAudio

Thinksound

EchoVidia (ours)

Instruction 3

Instance Level Control

Make the golf ball sound like a rocket when it flies out.

HunayuanVideoFoley

MMAudio

Thinksound

EchoVidia (ours)

Instruction 4

Group Level Control

Make the cat first meow, then hiss, and hiss again while punching, illustrating an escalation of emotion.

HunayuanVideoFoley

MMAudio

Thinksound

EchoVidia (ours)

Instruction 5

Video Level

Render the entire video with a futuristic, sci‑fi aesthetic.

HunayuanVideoFoley

MMAudio

Thinksound

EchoVidia (ours)

BibTeX

Please cite EchoFoley if you find this work useful.

@article{li2025echofoley,
title={EchoFoley: Event-Centric Hierarchical Control for Video Grounded Creative Sound Generation},
author={Li, Bingxuan and Cui, Yiming and He, Yicheng and Wang, Yiwei and Zhang, Shu and Wen, Longyin and Niu, Yulei},
journal={arXiv preprint arXiv:2512.24731},
year={2025}
}