InstructAV2AV: Instruction-Guided Audio-Video Joint Editing

Representative Application Scenarios

▶️ Tip: Clicking a category tab or a case button will auto-play the input and output videos in sequence.

Instruction: Keep the person’s identity and change the spoken words to "This is more than just art, it’s a statement."

Input

Output

Instruction: Change the man into a young woman with brown hair, wearing a gray blazer over a light pink top and a necklace with a heart-shaped pendant, and saying, "I really think we should give it another chance."

Input

Output

Instruction: Add a dark vintage sedan driving from the right to the left.

Input

Output

Instruction: Remove the chipmunk standing on the stone surface among the peanuts.

Input

Output

Additional Application Scenarios

▶️ Tip: Clicking a category tab or a case button will auto-play the input and output videos in sequence.

Instruction: Keep the person‘s appearance, change the timbre to a man, and change the spoken words to "I understand, but I think we need to consider."

Input

Output

Instruction: Keep the spoken content, and change the man to a woman dressed in a dark blazer over a red sweater.

Input

Output

Instruction: Keep the timbre, change the woman to a red-haired woman wearing a white shirt, and change the spoken words to "I came here to tell you that you should to go."

Input

Output

Comparison with State-of-the-Art Methods

▶️ Tip: Clicking a result tab will auto-play the input and output videos in sequence.

Instruction: Change the man into a woman with long dark hair, wearing a red and black checkered shirt, and saying, "Wait, what? That can't be right... Oh no, did I just miss the deadline now?"

Input

AvED

CoherentAVEdit

AVI-Edit

InstructAV2AV

Instruction: Make the horse dark brown with a white saddle.

Input

AvED

CoherentAVEdit

AVI-Edit

InstructAV2AV

Citation

@article{InstructAV2AV,
      title={InstructAV2AV: Instruction-Guided Audio-Video Joint Editing},
      author={Zheng, Haojie and Yang, Yixin and Yang, Siqi and Weng, Shuchen and Shi, Boxin},
      journal={arXiv preprint arXiv:2605.18467},
      year={2026}
}

InstructAV2AV: Instruction-Guided Audio-Video Joint Editing

Demo

Representative Application Scenarios

Additional Application Scenarios

Comparison with State-of-the-Art Methods

Citation