Real-Time Voice-to-ChatGPT Interaction Using Python for Conversational Interfaces

Leverage Python to capture spoken input, transcribe it into text, and engage in a live dialogue with ChatGPT.

Dependencies

Install the required packages before execution:

pip install SpeechRecognition openai pocketsphinx

Verify successful installation by importing them in a Python session:

import speech_recognition as sr
import openai

Workflow Overview

The implementation captures audio from a microphone, converts speech to text via offline recognition, sends the transcript to the OpenAI chat completion endpoint, and outputs the model's reply.

Core Implementation

main_voice_chat.py:

import speech_recognition as sr
from openai import OpenAI

recognizer = sr.Recognizer()
gpt_client = OpenAI(api_key="YOUR_API_KEY")

while True:
    try:
        with sr.Microphone() as mic:
            print("Speak now...")
            captured_audio = recognizer.listen(mic)

        try:
            spoken_text = recognizer.recognize_sphinx(captured_audio)
            print(f"Recognized phrase: {spoken_text}")

            gpt_response = gpt_client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[{"role": "user", "content": spoken_text}]
            )
            print(gpt_response.choices[0].message.content)

        except sr.UnknownValueError:
            print("Unable to decode speech.")

    except sr.RequestError as err:
        print(f"Recognition service error: {err}")

Mechanism Details

Audio Capture and Transcription

recognizer = sr.Recognizer()
with sr.Microphone() as mic:
    print("Speak now...")
    captured_audio = recognizer.listen(mic)

The Recognizer instance accesses the default microphone, records until silence, and stores raw audio data.

Chat Completion Request

spoken_text = recognizer.recognize_sphinx(captured_audio)
gpt_response = gpt_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": spoken_text}]
)
print(gpt_response.choices[0].message.content)

Speech is transcribed using CMU Sphinx (offline). The resulting string becomes the user message payload for the GPT model, which returns a conversational reply.

Execution Notes

Save the script and run it in an environment with internet access. Replace YOUR_API_KEY with a valid OpenAI secret key. For production use, avoid hardcoding credentials; load them securely from environment variables or configuration files.

Accuracy Considerations

  • Perform voice capture in low-noise environments to improve transcription reliability.
  • Adjust microphone gain and distance if recognition quality degrades.
  • Handle API authentication securely to prevent leakage of sensitive tokens.

Tags: python Speech Recognition chatgpt Voice Interface OpenAI API

Posted on Wed, 13 May 2026 22:02:40 +0000 by Technex