Leverage Python to capture spoken input, transcribe it into text, and engage in a live dialogue with ChatGPT.
Dependencies
Install the required packages before execution:
pip install SpeechRecognition openai pocketsphinx
Verify successful installation by importing them in a Python session:
import speech_recognition as sr
import openai
Workflow Overview
The implementation captures audio from a microphone, converts speech to text via offline recognition, sends the transcript to the OpenAI chat completion endpoint, and outputs the model's reply.
Core Implementation
main_voice_chat.py:
import speech_recognition as sr
from openai import OpenAI
recognizer = sr.Recognizer()
gpt_client = OpenAI(api_key="YOUR_API_KEY")
while True:
try:
with sr.Microphone() as mic:
print("Speak now...")
captured_audio = recognizer.listen(mic)
try:
spoken_text = recognizer.recognize_sphinx(captured_audio)
print(f"Recognized phrase: {spoken_text}")
gpt_response = gpt_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": spoken_text}]
)
print(gpt_response.choices[0].message.content)
except sr.UnknownValueError:
print("Unable to decode speech.")
except sr.RequestError as err:
print(f"Recognition service error: {err}")
Mechanism Details
Audio Capture and Transcription
recognizer = sr.Recognizer()
with sr.Microphone() as mic:
print("Speak now...")
captured_audio = recognizer.listen(mic)
The Recognizer instance accesses the default microphone, records until silence, and stores raw audio data.
Chat Completion Request
spoken_text = recognizer.recognize_sphinx(captured_audio)
gpt_response = gpt_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": spoken_text}]
)
print(gpt_response.choices[0].message.content)
Speech is transcribed using CMU Sphinx (offline). The resulting string becomes the user message payload for the GPT model, which returns a conversational reply.
Execution Notes
Save the script and run it in an environment with internet access. Replace YOUR_API_KEY with a valid OpenAI secret key. For production use, avoid hardcoding credentials; load them securely from environment variables or configuration files.
Accuracy Considerations
- Perform voice capture in low-noise environments to improve transcription reliability.
- Adjust microphone gain and distance if recognition quality degrades.
- Handle API authentication securely to prevent leakage of sensitive tokens.