Automatic Speech Recognition — Capstone Project

1 minute read

An end-to-end Automatic Speech Recognition (ASR) system that accurately transcribes spoken language into written text across diverse domains and accents. The system is served via a FastAPI backend, presented through a Streamlit interface, and deployed on Azure Kubernetes Service (AKS).

App screenshot


Key Concepts

  • Automatic Speech Recognition (ASR) — converting spoken audio into written text with high accuracy across varied accents and subject domains
  • Kaldi — industry-standard open source toolkit for building speech recognition pipelines
  • Vosk API — lightweight offline speech recognition library that wraps Kaldi models for easy integration
  • FastAPI — high-performance Python web framework used to expose the ASR model as a REST API
  • Streamlit — Python-based UI framework for building and serving the interactive front end
  • Azure AKS — managed Kubernetes service used to deploy and scale the containerised application

Architecture

Audio Input (Streamlit UI)
        ↓
  FastAPI Backend
        ↓
  Vosk / Kaldi ASR Model
        ↓
  Transcribed Text Output
        ↓
  Deployed on Azure AKS

Prerequisites

Install the required dependencies:

pip install -r requirements.txt

Running the App

Start the FastAPI backend:

uvicorn main:app --reload

Launch the Streamlit UI:

streamlit run app.py

Repository

github.com/uday160386/asr-capstone-project

Presentation