Audio Transcription Service mit Whisper.cpp - Recording, Streaming, Chunked und File-Based Transkription

HTML 47.4%
Python 46.6%
JavaScript 4%
Dockerfile 1.6%
Jinja 0.4%

Find a file

SaschaFuksa 458ef76279 Merge pull request 'fix 13' (#140 ) from claude/fix-13 into master Reviewed-on: #140		2026-04-28 14:48:39 +00:00
.claude	fix 11	2026-04-28 13:44:47 +00:00
.devcontainer	removed comma	2026-04-08 07:07:17 +00:00
.forgejo/workflows	feat: add Forgejo CI pipeline and batch MR workflow	2026-04-18 10:18:14 +02:00
.vscode	Updated template	2026-04-08 07:06:48 +00:00
src	fix 12-2	2026-04-28 14:11:46 +00:00
tests	Add unit tests with dummy .md test files for _extract_headings_for_filename	2026-04-21 19:21:01 +02:00
ui	fix 13	2026-04-28 14:48:17 +00:00
.dockerignore	Some more fixes by claude	2026-04-27 12:28:12 +00:00
.gitignore	claude fix 3	2026-04-27 12:34:12 +00:00
.python-version	Updated template	2026-04-08 07:06:48 +00:00
AGENTS.md	fix: address review feedback	2026-04-18 21:05:39 +02:00
docker-compose.yml	Some more fixes by claude	2026-04-27 12:28:12 +00:00
Dockerfile	Fix docker	2026-04-27 12:43:46 +00:00
pyproject.toml	Some more fixes by claude	2026-04-27 12:28:12 +00:00
README.md	Fix #41 : Remove deprecated files build.sh, scripts/cli.py, scripts/setup.sh	2026-04-21 18:43:46 +02:00
settings.yaml	Fix #36 : New Chunk Transcription Mode	2026-04-21 09:19:57 +02:00
uv.lock	claude fix 3	2026-04-27 12:34:12 +00:00

README.md

whisper-transcribe

Audio Transcription Service with Whisper.cpp – local-first, chunked recording support.

Quick Start

1. Clone the repo

git clone https://git.fuksa.de/Tasch/whisper-transcribe.git
cd whisper-transcribe

2. Start Server

uv run uvicorn src.main:app --reload

3. Or use Docker

docker compose up --build

Web UI

A simple recording UI is available at ui/index.html.

# Serve the UI (or open directly in browser)
# Just open ui/index.html in your browser

Features:

One-click recording
Visual timer
Sends chunks automatically to server
Shows transcription when done

API Endpoints

Method	Endpoint	Description
GET	`/api/v1/health`	Health check
POST	`/api/v1/record/start`	Start recording session
POST	`/api/v1/record/chunk/{session_id}`	Add audio chunk
POST	`/api/v1/record/stop/{session_id}`	Stop + transcribe
POST	`/api/v1/transcribe`	Direct file upload

Configuration

Edit settings.yaml:

whisper:
  model: base              # tiny, base, small, medium, large
  language: null           # auto-detect, or: "de", "en", etc.
  model_path: models/ggml-base.bin

recording:
  chunk_duration_sec: 30   # Chunk size for recording sessions

routing:
  to_tasch:
    enabled: true          # Send transcripts to Tasch
  to_repo:
    enabled: false
    repo_url: "https://git.fuksa.de/..."

# Git Notes Export — send structured transcripts to a GitHub repo
git_notes:
  enabled: false
  # GitHub repository in format "owner/repo"
  repo: "YourUsername/your-repo"
  # Branch to commit to
  branch: "main"
  # Environment variable name containing the GitHub PAT
  token_env: "GIT_NOTES_TOKEN"
  # Path template: {date} = recording date (YYYY-MM-DD), {heading} = first H1/H2/H3 from MD
  path_template: "{date}/{heading}.md"

Structured Transcript Export

After transcribing and structuring recordings, you can:

💾 Download — download the .md file to your machine
📋 Copy — copy Markdown to clipboard
📤 Send to Repo — commit directly to a GitHub repository

Setup: Send to Repo

1. Create a GitHub Personal Access Token (PAT)

Go to https://github.com/settings/tokens → Generate new token (classic)
Name: e.g. whisper-transcribe-export
Scope: ✅ repo (full control of private repositories)
Expiration: choose as desired
→ Generate token → copy it now (you won't see it again)

2. Configure the container

Create a .env file in the project root (never commit this!):

# .env — DO NOT COMMIT THIS FILE!
GIT_NOTES_TOKEN=ghp_your_token_here

.env is already in .gitignore.

3. Enable in `settings.yaml`

git_notes:
  enabled: true
  repo: "YourUsername/your-private-repo"
  branch: "main"
  token_env: "GIT_NOTES_TOKEN"
  path_template: "{date}/{heading}.md"

4. Enable the Docker environment variable

In docker-compose.yml, uncomment the line:

environment:
  - MODEL_NAME=base
  - GIT_NOTES_TOKEN=${GIT_NOTES_TOKEN}  # ← uncomment this

5. Build and start

docker compose up --build

Path Template

Files are saved as path_template with these variables:

Variable	Description	Example
`{date}`	Recording date	`2026-04-19`
`{heading}`	First H1/H2/H3 from the MD	`Kapitel 3: Klassifikation`
`{folder_name}`	Raw folder name	`20260419_143022`

Examples for path_template: "{date}/{heading}.md":

2026-04-19/Kapitel 3 Klassifikation.md
2026-04-20/Maschinelles Lernen.md

Invalid filename characters (< > : " / \ | ? *) are automatically removed from the heading.

Home Server Deployment

The service runs as a Docker container on your home server.

Prerequisites

Docker & Docker Compose installed on the server
SSH access to the server
A domain/subdomain pointing to your server (optional but recommended)

Setup

# 1. SSH into your server
ssh user@your-server

# 2. Clone the repo
git clone https://git.fuksa.de/Tasch/whisper-transcribe.git
cd whisper-transcribe

# 3. Create .env with your secrets
nano .env
# Add:
#   GIT_NOTES_TOKEN=ghp_your_github_token

# 4. Edit settings.yaml — enable git_notes and configure as needed
nano settings.yaml

# 5. Edit docker-compose.yml — uncomment the GIT_NOTES_TOKEN line

# 6. Build and start
docker compose up --build -d

# 7. Check logs
docker compose logs -f

Updating

cd whisper-transcribe
git pull
docker compose up --build -d

Accessing the UI

The Web UI is served by the FastAPI backend:

Service	URL
API + Web UI	`http://your-server:8000/api/v1/`
Web UI directly	`http://your-server:8000/`

For external access, configure a reverse proxy (e.g. nginx, Caddy) with HTTPS.

GPU Acceleration (optional)

If your server has an NVIDIA GPU, uncomment the deploy section in docker-compose.yml for Whisper inference acceleration.

Development

# Install dependencies
uv sync

# Run tests
uv run pytest -m "not integration"

# Lint
uv run ruff check src/ tests/

Architecture

src/
├── audio/           # Recording sessions, file handling
├── transcription/   # Whisper.cpp wrapper
├── routing/         # Output dispatcher
├── api/             # FastAPI routes
└── main.py          # App entry point

README.md Unescape Escape