🎄Use AI voice on your tutorial vidoes🌟

Dec 17 2023

Back to 2023 Advent Calendar

If for whatever reason you don't like your voice, or you want to make videos programmatically without recording your own voice. You can use AI voice to generate voiceover for your videos.

Voice APIs 🎤

There are multiple voice APIs available:

Most of them have similar pricing and some have free tiers. APIs and UX is also similar, you send a text and get an audio file back.

The are competing in terms of quality (how natural the voice sounds), languages supported, and speed.


Let's take a look at the example on how to use it with Azure API.

in python Azure has API client

import azure.cognitiveservices.speech as speechsdk

the simplest way to use it is to create a client and call synthesize_speech_to_file method

speech_config = speechsdk.SpeechConfig(subscription="your-subscription-key", region="your-region")
audio_config = speechsdk.audio.AudioOutputConfig(filename="file.wav")
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async("Hello world")

Aligning video and audio

The problem with the above approach is that you get a single audio file, but you need to align it with your video.

there are 2 ways to solve it:

  1. split your script into multiple files and get multiple audio files back.

then put audio files in at correct times.

  1. use bookmarks. add bookmark markers into your script, and text-to-speech software will give you timecodes for it.

then you can put vidoes or files at the correct timecodes.

How to stitch audio and video together

There are multiple ways to do it, but the easiest way is to use ffmpeg if you want to do it programmatically.

ffmpeg -i video.mp4 -i audio.wav -c:v copy -c:a aac -map 0:v:0 -map 1:a:0 output.mp4

or on Mac you can use iMovie if you prefer drag and drop.