https://www.robots.ox.ac.uk/~vgg/software/lipsync/

在视频中消除音频和视觉流之间的时间延迟,以及

确定在视频中的多个面孔中谁在说话。

Dependencies

git clone https://github.com/joonson/syncnet_python.git
conda create -n syncnet_python python==3.10
conda activate syncnet_python
pip install -r requirements.txt

requirements.txt

numpy==1.26.4
opencv-contrib-python==4.9.0.80
opencv-python==4.9.0.80
torch==2.2.2
torchvision==0.17.2
scipy==1.15.2
scenedetect
python_speech_features

Download model

./download_model.sh

Run