Fix #36: New Chunk Transcription Mode #37
No reviewers
Labels
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
Tasch/whisper-transcribe!37
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "batch/36"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes #36
Implements Chunk Transcription Mode with configurable overlap_time. Each chunk is transcribed immediately after recording, with overlap from the previous chunk to preserve context. Part transcripts are merged into transcript.txt after all chunks are done. full_audio.wav is skipped in this mode.
0e8bc710745c05a98063@ -134,2 +145,4 @@raise HTTPException(status_code=400, detail="Cannot add chunk to inactive session")# In chunk_transcription_mode, trigger immediate transcription of this chunkif chunk_mode and result.chunk_count > 0:VERY BAD! Too long - bad code quality! refactor it, make it more granular - USE CLASSES AND FUNCTIONS! DONT BE A BAD/LAZY DEV!
@ -173,2 +271,2 @@transcript_path.write_text(transcription_text)logger.debug(f"Saved transcript to {transcript_path}")if chunk_mode:MAYBE USE CLASSES!? EVER HEARD OF CLASSES!? if chunk_mode: do ChunkTranscription ... or do AudioFileTranscription ...
@ -413,0 +435,4 @@return Nonedef _transcribe_chunk_to_part_file(self,Ever heard of clean code!? I guess not! Good functions are fine granular and good testable!
@ -56,2 +55,4 @@# Request/Response models# =============================================================================class StartRecordingResponse(BaseModel):How about a models.py and declare data classes there?
@ -243,4 +84,1 @@# --- Chunks management endpoints ---class ChunkFoldersResponse(BaseModel):How about a models.py and declare data classes there?
@ -281,33 +119,289 @@ class SendToRepoResponse(BaseModel):error: Optional[str] = Noneclass OutputConfig(BaseModel):How about a models.py and declare data classes there?
@ -308,0 +207,4 @@)class ChunkFolderTranscriptionMode:This should be the same like AudioFileTranscriptionMode because the audio file (wav) already existis? So I see only two modes, please explain why we need this one
@ -308,0 +243,4 @@# Transcription helpers# =============================================================================def run_transcription(audio_path: Path, mode: TranscriptionMode) -> Optional[TranscriptionResult]:Please add a transcript_helper.py and move the helper functions to helper script
@ -515,0 +614,4 @@return safe[:100].strip() or "structured"def _extract_headings_for_filename(md_content: str) -> tuple[str | None, str | None, str | None]:Should be part of file_handler?
@ -515,0 +644,4 @@return h1, h2, h3def _safe_filename_part(text: str) -> str:Should be part of file_handler?
@ -515,0 +655,4 @@h1: str | None,h2: str | None,h3: str | None,) -> str:Should be part of file_handler?
@ -515,0 +667,4 @@return "-".join(parts)def _build_structured_filename(folder_name: str, md_content: str) -> str:Should be part of file_handler?
Ready for re-review