How to Convert Speech to Text in Python - Python Code (2023)

Abdou Rockikz · 7 min read · Updated aug 2022 · Machine Learning · Application Programming Interfaces

Disclosure: This post may contain affiliate links, meaning when you click the links and make a purchase, we receive a commission.

Speech recognition is the ability of computer software to identify words and phrases in spoken language and convert them to human-readable text. In this tutorial, you will learn how you can convert speech to text in Python using the SpeechRecognition library.

As a result, we do not need to build any machine learning model from scratch, this library provides us with convenient wrappers for various well-known public speech recognition APIs (such as Google Cloud Speech API, IBM Speech To Text, etc.).

Note that if you do not want to use APIs, and directly perform inference on machine learning models instead, then definitely check this tutorial, in which I'll show you how you can use the current state-of-the-art machine learning model to perform speech recognition in Python.

Learn also:How to Translate Text in Python.

Alright, let's get started, installing the library using pip:

pip3 install SpeechRecognition pydub

Okay, open up a new Python file and import it:

import speech_recognition as sr

The nice thing about this library is it supports several recognition engines:

We gonna use Google Speech Recognition here, as it's straightforward and doesn't require any API key.

Reading from a File

Make sure you have an audio file in the current directory that contains English speech (if you want to follow along with me, get the audio file here):

filename = "16-122828-0002.wav"

This file was grabbed from the LibriSpeech dataset, but you can use any audio WAV file you want, just change the name of the file, let's initialize our speech recognizer:

# initialize the recognizerr = sr.Recognizer()

The below code is responsible for loading the audio file, and converting the speech into text using Google Speech Recognition:

# open the filewith sr.AudioFile(filename) as source: # listen for the data (load audio to memory) audio_data = r.record(source) # recognize (convert from speech to text) text = r.recognize_google(audio_data) print(text)

This will take a few seconds to finish, as it uploads the file to Google and grabs the output, here is my result:

I believe you're just talking nonsense

The above code works well for small or medium size audio files. In the next section, we gonna write code for large files.

(Video) Speech Recognition using Python

Reading Large Audio Files

If you want to perform speech recognition of a long audio file, then the below function handles that quite well:

# importing libraries import speech_recognition as sr import os from pydub import AudioSegmentfrom pydub.silence import split_on_silence# create a speech recognition objectr = sr.Recognizer()# a function that splits the audio file into chunks# and applies speech recognitiondef get_large_audio_transcription(path): """ Splitting the large audio file into chunks and apply speech recognition on each of these chunks """ # open the audio file using pydub sound = AudioSegment.from_wav(path) # split audio sound where silence is 700 miliseconds or more and get chunks chunks = split_on_silence(sound, # experiment with this value for your target audio file min_silence_len = 500, # adjust this per requirement silence_thresh = sound.dBFS-14, # keep the silence for 1 second, adjustable as well keep_silence=500, ) folder_name = "audio-chunks" # create a directory to store the audio chunks if not os.path.isdir(folder_name): os.mkdir(folder_name) whole_text = "" # process each chunk for i, audio_chunk in enumerate(chunks, start=1): # export audio chunk and save it in # the `folder_name` directory. chunk_filename = os.path.join(folder_name, f"chunk{i}.wav") audio_chunk.export(chunk_filename, format="wav") # recognize the chunk with sr.AudioFile(chunk_filename) as source: audio_listened = r.record(source) # try converting it to text try: text = r.recognize_google(audio_listened) except sr.UnknownValueError as e: print("Error:", str(e)) else: text = f"{text.capitalize()}. " print(chunk_filename, ":", text) whole_text += text # return the text for all chunks detected return whole_text

Note: You need to install Pydub using pip for the above code to work.

The above function uses split_on_silence() function from pydub.silence module to split audio data into chunks on silence. The min_silence_len parameter is the minimum length of silence to be used for a split.

silence_thresh is the threshold in which anything quieter than this will be considered silence, I have set it to the average dBFS minus 14, keep_silence argument is the amount of silence to leave at the beginning and the end of each chunk detected in milliseconds.

These parameters won't be perfect for all sound files, try to experiment with these parameters with your large audio needs.

After that, we iterate over all chunks and convert each speech audio into text, and then adding them up altogether, here is an example run:

path = "7601-291468-0006.wav"print("\nFull text:", get_large_audio_transcription(path))

Note: You can get 7601-291468-0006.wav file here.


audio-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat. audio-chunks\chunk2.wav : At a short distance from the city. audio-chunks\chunk3.wav : Just at what is now called dutch street. audio-chunks\chunk4.wav : Sooner bounded with proofs of his ingenuity. audio-chunks\chunk5.wav : Patent smokejacks. audio-chunks\chunk6.wav : It required a horse to work some. audio-chunks\chunk7.wav : Dutch oven roasted meat without fire. audio-chunks\chunk8.wav : Carts that went before the horses. audio-chunks\chunk9.wav : Weather cox that turned against the wind and other wrongheaded contrivances. audio-chunks\chunk10.wav : So just understand can found it all beholders. Full text: His abode which you had fixed in a bowery or country seat. At a short distance from the city. Just at what is now called dutch street. Sooner bounded with proofs of his ingenuity. Patent smokejacks. It required a horse to work some. Dutch oven roasted meat without fire. Carts that went before the horses. Weather cox that turned against the wind and other wrongheaded contrivances. So just understand can found it all beholders.

So, this function automatically creates a folder for us and puts the chunks of the original audio file we specified, and then it runs speech recognition on all of them.

Reading from the Microphone

This requires PyAudio to be installed in your machine, here is the installation process depending on your operating system:


You can just pip install it:

pip3 install pyaudio


You need to first install the dependencies:

sudo apt-get install python-pyaudio python3-pyaudiopip3 install pyaudio


You need to first install portaudio, then you can just pip install it:

brew install portaudiopip3 install pyaudio

Now let's use our microphone to convert our speech:

with sr.Microphone() as source: # read the audio data from the default microphone audio_data = r.record(source, duration=5) print("Recognizing...") # convert speech to text text = r.recognize_google(audio_data) print(text)

This will hear from your microphone for 5 seconds and then try to convert that speech into text!

(Video) Python Speech Recognition | Speech To Text Converter | Google Speech

It is pretty similar to the previous code, but we are using the Microphone() object here to read the audio from the default microphone, and then we used the duration parameter in the record() function to stop reading after 5 seconds and then uploads the audio data to Google to get the output text.

You can also use the offset parameter in the record() function to start recording after offset seconds.

Also, you can recognize different languages by passing language parameter to the recognize_google() function. For instance, if you want to recognize Spanish speech, you would use:

text = r.recognize_google(audio_data, language="es-ES")

Check out supported languages in this StackOverflow answer.


As you can see, it is pretty easy and simple to use this library for converting speech to text. This library is widely used out there in the wild. Check the official documentation.

If you want to convert text to speech in Python as well, check this tutorial.

Finally, if you're a beginner and want to learn Python, I suggest you take thePython For Everybody Coursera course, in which you'll learn a lot about Python. You can also check ourresources and courses page to see the Python resources I recommend on various topics!

Read Also: How to Recognize Optical Characters in Images in Python.

Happy Coding ♥

View Full Code How to Convert Speech to Text in Python - Python Code (1) View on Skillshare

Sharing is caring!

Read Also

(Video) How to Convert Speech to Text in Python ( Easy Way )

How to Translate Languages in Python
Learn how to make a language translator and detector using Googletrans library (Google Translation API) for translating more than 100 languages with Python.

Visit →

Speech Recognition using Transformers in Python
Learn how to perform automatic speech recognition (ASR) using wav2vec2 transformer with the help of Huggingface transformers library in Python

Visit →

(Video) Python (Text to Speech) and (Speech to Text) Converter

How to Play and Record Audio in Python
Learn how to play and record sound files using different libraries such as playsound, Pydub and PyAudio in Python.

Visit →

Comment panel


How do I convert audio to text in Python? ›

Convert an audio file into text

Import Speech recognition library. Initializing recognizer class in order to recognize the speech. We are using google speech recognition. Audio file supports by speech recognition: wav, AIFF, AIFF-C, FLAC.

How do I convert spoken words to text? ›

Dictating text
  1. Open Speech Recognition by clicking the Start button. ...
  2. Say "start listening" or click the Microphone button to start the listening mode.
  3. Open the program you want to use or select the text box you want to dictate text into.
  4. Say the text that you want dictate.

How do you code a speech recognition in Python? ›

Code. #import library import speech_recognition as sr # Initialize recognizer class (for recognizing the speech) r = sr. Recognizer() # Reading Audio file as source # listening the audio file and store in audio_text variable with sr.

What is PyAudio in Python? ›

PyAudio provides Python bindings for PortAudio v19, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms, such as GNU/Linux, Microsoft Windows, and Apple macOS. PyAudio is distributed under the MIT License.

How can I convert audio to text online for free? ›

How to Transcribe MP3 to Text:
  1. Upload an MP3 file. Upload your MP3 file to VEED. ...
  2. Convert to text. Under Subtitles, click on 'Auto Transcribe', select your preferred language, and you're done! ...
  3. Download your text file.

Is there a free program that converts audio to text? ›

Dictation is a free and simple tool that offers fast conversion of audio to text.

Is there an app that turns speech into text? ›

Dictation - Speech to text allows to dictate, record, translate and transcribe text instead of typing. It uses latest speech to text voice recognition technology and its main purpose is speech to text and translation for text messaging. Never type any text, just dictate and translate using your speech!

Is Python good for speech recognition? ›

It allows computers to understand human language. Speech recognition is a machine's ability to listen to spoken words and identify them. You can then use speech recognition in Python to convert the spoken words into text, make a query or give a reply. You can even program some devices to respond to these spoken words.

Does Python speech recognition need Internet? ›

Speech to text translation: This is done with the help of Google Speech Recognition. This requires an active internet connection to work. However, there are certain offline Recognition systems such as PocketSphinx, that have a very rigorous installation process that requires several dependencies.

Which algorithm is used in speech recognition? ›

In one of the works [10], speech pre-processing method was considered using the VAD algorithm, which proves that this algorithm improves the performance of speech recognition.

What is pyttsx3 python? ›

pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline and is compatible with both Python 2 and 3. An application invokes the pyttsx3. init() factory function to get a reference to a pyttsx3. Engine instance.

How do you program voice recognition? ›

Use voice recognition in Windows
  1. Select (Start) > Settings > Time & language > Speech.
  2. Under Microphone, select the Get started button.
  3. The Speech wizard window opens, and the setup starts automatically. If the wizard detects issues with your microphone, they will be listed in the wizard dialog box.

How do I get the mic input in python? ›

Take voice input from the user in Python using PyAudio – speech_recognizer
  1. Take input from the mic.
  2. Convert the voice or speech to text.
  3. Store the text in a variable/or you can directly take it as user input.

Can Python play sounds? ›

Play sound on Python is easy. There are several modules that can play a sound file (. wav). These solutions are cross platform (Windows, Mac, Linux).

How do I install pip? ›

Step 1: Download the ( file and store it in the same directory as python is installed. Step 2: Change the current path of the directory in the command line to the path of the directory where the above file exists. Step 4: Now wait through the installation process. Voila!

How do I stream audio in Python? ›

Audiostream is a Python extension that provide an easy-to-use API for streaming bytes to the speaker, or read an audio input stream. It use SDL + SDL_Mixer for streaming the audio out, and use platform-api for reading audio input. This extension works on Android and iOS as well.

How do I transcribe an audio file? ›

How to Transcribe Audio to Text
  1. Upload Your Audio File. ...
  2. Choose Custom Transcription Options. ...
  3. Receive & Download Your Text File. ...
  4. Set Up. ...
  5. Find Your Shorthand. ...
  6. Write What You Hear. ...
  7. Edit Your Text File. ...
  8. Export the Correct File.

How can I translate audio to English? ›

Translate with a microphone
  1. Give your browser permission to use your microphone and check your microphone settings on your browser. ...
  2. On your computer, go to Google Translate.
  3. Choose the languages to translate to and from. ...
  4. At the bottom, click the Microphone .
  5. Speak the word or phrase you want to translate.

How long does it take to transcribe 1 hour of audio? ›

For professional transcriptionist, the average time to transcribe one audio hour ranges from 2-3 hours. Some of the most qualified transcriptionists can transcribe up to 30 minutes of audio in an hour.

How can I transcribe faster? ›

Let's have a look at these tips to get a faster transcription.
  1. Make use of an Autocorrect Tool.
  2. Practice Typing to perfection.
  3. Making use of High-Quality and noise cancellation headset.
  4. A comfortable and quiet environment.
  5. Type Smartly.
  6. Get your hands on a good transcribing software.
  7. Take Breaks.
  8. The final word.

Is transcribe app free? ›

Live Transcribe is easy to use, all you need is a Wi-Fi or network connection. It's free of charge to download on over 1.8 billion Android devices operating on 5.0 Lollipop and above.

Who invented voice text? ›

In 1990, the company Dragon released Dragon Dictate which was the world's first voice recognition system for consumers. In 1997, they improved it and developed Dragon NaturallySpeaking. With this solutions users could speak 100 words per minute. In 1996, the first voice activated portal (VAL) was made by BellSouth.

Does Google have speech to text? ›

Start voice typing in a document

Open a document in Google Docs with a Chrome browser. Voice typing. A microphone box appears. When you're ready to speak, click the microphone.

Is Dragon anywhere free? ›

Dragon Anywhere, available on Android and iOS

$15/mo subscription begins at end of trial.

How do I convert voice to text on my computer? ›

How to start voice typing
  1. Press Windows logo key + H on a hardware keyboard.
  2. Press the microphone key next to the Spacebar on the touch keyboard.

Is Dragon dictation free? ›

Download your one-week FREE TRIAL now! Trial converts to a monthly ($14.99) or annual ($149.99) subscription. Paperwork doesn't end when you're away from your desk. Dragon Anywhere is the only mobile dictation app that enables continuous dictation of documents, with no length or time limits.

How do I convert an audio file to WAV in Python? ›

MP3 to WAV conversion
  1. from os import path.
  2. from pydub import AudioSegment.
  3. # files.
  4. src = "transcript.mp3"
  5. dst = "test.wav"
  6. # convert wav to mp3.
  7. sound.export(dst, format="wav")

What is SR in Python? ›

Speech Recognition (Version 2.1) [Software]. Available from

How do I use Google speech API in Python? ›

Using the Speech-to-Text API with Python
  1. Overview.
  2. Setup and requirements.
  3. Enable the API.
  4. Authenticate API requests.
  5. Install the client library.
  6. Start Interactive Python.
  7. Transcribe audio files.
  8. Get word timestamps.

What does speech recognition do? ›

Speech recognition software can translate spoken words into text using closed captions to enable a person with hearing loss to understand what others are saying. Speech recognition can also enable those with limited use of their hands to work with computers, using voice commands instead of typing.

How do I read an audio file in Python? ›

open() This function opens a file to read/write audio data. The function needs two parameters - first the file name and second the mode. The mode can be 'wb' for writing audio data or 'rb' for reading.

How do I import an audio file into Python? ›

In this chapter, you'll learn how to use this helpful library to ensure all of your audio files are in the right shape for transcription.
  1. Introduction to PyDub. ...
  2. Import an audio file with PyDub.
  3. Play an audio file with PyDub.
  4. Audio parameters with PyDub.
  5. Adjusting audio parameters.
  6. Manipulating audio files with PyDub.

How do I find the sample rate of a WAV file in Python? ›

We will get the sample rate and the data in the form of an array as the output. OUTPUT: 43100,([[-1, -2], [ 1, 1], [-4, -3], ..., [ 4, -2], [-4, 2], [ 4, -1]],) The 1st value is the sample rate followed by the data of the provided wave file.

What is pyttsx3? ›

pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline and is compatible with both Python 2 and 3. An application invokes the pyttsx3. init() factory function to get a reference to a pyttsx3. Engine instance.

What is recognizer () in Python? ›

Speech recognition is a machine's ability to listen to spoken words and identify them. You can then use speech recognition in Python to convert the spoken words into text, make a query or give a reply. You can even program some devices to respond to these spoken words.

Which algorithm is best for speech recognition? ›

Two popular sets of features, often used in the analysis of the speech signal are the Mel frequency cepstral coefficients (MFCC) and the linear prediction cepstral coefficients (LPCC). The most popular recognition models are vector quantization (VQ), dynamic time warping (DTW), and artificial neural network (ANN) [3].

Does Python speech recognition need Internet? ›

Speech to text translation: This is done with the help of Google Speech Recognition. This requires an active internet connection to work. However, there are certain offline Recognition systems such as PocketSphinx, that have a very rigorous installation process that requires several dependencies.

Is Google Cloud speech API free? ›

Google Cloud Speech API is a platform-as-a-service that enables developers to create applications that can process and recognize natural language. The API is free to use for both private and public projects.

How do I use Google API speech to text? ›

Before you begin
  1. Enable Speech-to-Text on a GCP project. Make sure billing is enabled for Speech-to-Text. Create and/or assign one or more service accounts to Speech-to-Text. ...
  2. Set your authentication environment variable.
  3. (Optional) Create a new Google Cloud Storage bucket to store your audio data.

Is Siri an AI? ›

Siri is Apple's personal assistant for iOS, macOS, tvOS and watchOS devices that uses voice recognition and is powered by artificial intelligence (AI).

Who invented text speech? ›

In 1952, the first voice recognition device was created by Bell Laboratories and they called it (her) 'Audrey'. 'Audrey' was ground-breaking technology as she could recognize digits spoken by a single voice; a massive step forward in the digital world.

Who uses voice activated? ›

It can be used by people with disabilities, for in-car systems, in the military, and also by businesses for dictation, or to convert audio and video files into text. Voice recognition software can also be used in customer service to process routine phone requests, or in healthcare and legal for documentation processes.


1. Speech to Text Using Python || Convert an Audio Transcript into Text || Python Project For Beginners
(Bug Ninza)
2. Converting Speech to Text using Python
3. Speech Recognition Using Python | Speech To Text Translation in Python | Python Training | Edureka
4. Converting Speech to Text in 10 Minutes with Python and Watson
(Nicholas Renotte)
5. Convert Speech to Text | Speech Recognition | Machine Learning | Python
(Hackers Realm)
6. Python Convert Audio File To Text | Speech Recognition in Python
Top Articles
Latest Posts
Article information

Author: Pres. Lawanda Wiegand

Last Updated: 31/03/2023

Views: 5893

Rating: 4 / 5 (51 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Pres. Lawanda Wiegand

Birthday: 1993-01-10

Address: Suite 391 6963 Ullrich Shore, Bellefort, WI 01350-7893

Phone: +6806610432415

Job: Dynamic Manufacturing Assistant

Hobby: amateur radio, Taekwondo, Wood carving, Parkour, Skateboarding, Running, Rafting

Introduction: My name is Pres. Lawanda Wiegand, I am a inquisitive, helpful, glamorous, cheerful, open, clever, innocent person who loves writing and wants to share my knowledge and understanding with you.