Fix #35: Add hhmmss-based unique filename for structured.md send-to-repo #38

Merged
SaschaFuksa merged 3 commits from batch/35 into master 2026-04-21 11:59:51 +00:00
Owner

Fix #35: New naming conventions for *.md file

When sending to a repository, we now include the hhmmss timestamp in the filename to prevent overwrites when the same book title appears in multiple audio sessions.

Changes

  • extract_headings_for_filename(): Extract H1/H2/H3 respecting start-of-document rules (H1 only at doc start, H2 only at doc start or after H1, H3 only at doc start or after H2)
  • build_filename_from_headings(): Build filename as hhmmss-h1[-h2[-h3]].md
  • _build_structured_filename(): Combines folder_name time component (hhmmss from yyyymmdd_hhmmss) with headings
  • send_structured_to_repo(): Now uses _build_structured_filename() instead of simple extract_title_from_markdown()

Example

Folder: 20260418_162032
MD content starts with:

# Praxiseinstieg Machine Learning

## Kapitel 3: Klassifikation

### Absatz: Klassifikatoren mit mehreren Kategorien

Old filename: Praxiseinstieg Machine Learning.md (would overwrite)
New filename: 162032-Praxiseinstieg Machine Learning-Kapitel 3 Klassifikation-Absatz Klassifikatoren mit mehreren Kategorien.md

Closes #35
Reviewer: @SaschaFuksa

## Fix #35: New naming conventions for *.md file When sending to a repository, we now include the hhmmss timestamp in the filename to prevent overwrites when the same book title appears in multiple audio sessions. ### Changes - **`extract_headings_for_filename()`**: Extract H1/H2/H3 respecting start-of-document rules (H1 only at doc start, H2 only at doc start or after H1, H3 only at doc start or after H2) - **`build_filename_from_headings()`**: Build filename as `hhmmss-h1[-h2[-h3]].md` - **`_build_structured_filename()`**: Combines folder_name time component (hhmmss from yyyymmdd_hhmmss) with headings - **`send_structured_to_repo()`**: Now uses `_build_structured_filename()` instead of simple `extract_title_from_markdown()` ### Example Folder: `20260418_162032` MD content starts with: ```markdown # Praxiseinstieg Machine Learning ## Kapitel 3: Klassifikation ### Absatz: Klassifikatoren mit mehreren Kategorien ``` Old filename: `Praxiseinstieg Machine Learning.md` (would overwrite) New filename: `162032-Praxiseinstieg Machine Learning-Kapitel 3 Klassifikation-Absatz Klassifikatoren mit mehreren Kategorien.md` Closes #35 Reviewer: @SaschaFuksa
Fix #35: Add hhmmss-based filename for structured.md send-to-repo
Some checks are pending
CI / lint (pull_request) Waiting to run
CI / test (pull_request) Waiting to run
9900e3bf0d
- Add extract_headings_for_filename() respecting start-of-document rules
  for H1, H2, H3 (only use heading if valid at document start per issue #35)
- Add build_filename_from_headings() to construct hhmmss-h1-h2-h3.md
- Add _safe_filename_part() for safe filename characters
- Add _build_structured_filename() to build full filename from folder_name
  time component and md headings
- Update send_structured_to_repo() to use new filename builder instead
  of simple extract_title_from_markdown

This ensures every send-to-repo operation creates a unique filename
even when the same book title appears in multiple sessions.
Fix #35: Remove duplicate .md extension in structured filename
Some checks are pending
CI / lint (pull_request) Waiting to run
CI / test (pull_request) Waiting to run
5e3d9afe2b
The build_filename_from_headings function was adding .md, but the
path_template already includes .md, causing double extension.
@ -31,2 +31,4 @@
llm_config = app_config.get("llm", {})
git_notes_config = app_config.get("git_notes", {})
chunk_transcription_config = app_config.get("chunk_transcription", {})
chunk_transcription_enabled_default = chunk_transcription_config.get("enabled", False)
Owner

THIS IS NOT PART OF THIS PR!? REMOVE IT!

THIS IS NOT PART OF THIS PR!? REMOVE IT!
@ -58,6 +61,7 @@ class StartRecordingResponse(BaseModel):
session_id: str
message: str
chunk_duration_sec: int
chunk_transcription_enabled: bool = False
Owner

NOT PART OF THIS PR, REMOVE!

NOT PART OF THIS PR, REMOVE!
@ -65,6 +69,7 @@ class StopRecordingResponse(BaseModel):
status: str
chunks_count: int
transcription: Optional[str] = None
chunk_transcription_mode: bool = False
Owner

NOT PART OF THIS PR, REMOVE!

NOT PART OF THIS PR, REMOVE!
@ -299,0 +338,4 @@
h3: str | None = None
line_idx = 0
for line in lines:
Owner

Very bad method design! Too long, very bad! Don't do bad code! And for line in lies: WILL go through the whole document! Requirement is, only to take a look at the START of the document (First ~7 lines)

Very bad method design! Too long, very bad! Don't do bad code! And for line in lies: WILL go through the whole document! Requirement is, only to take a look at the START of the document (First ~7 lines)
Fix #35: Remove chunk_transcription feature (not part of this PR) and simplify extract_headings_for_filename
Some checks failed
CI / lint (pull_request) Has been cancelled
CI / test (pull_request) Has been cancelled
a2093dcde5
SaschaFuksa review feedback:
- Remove chunk_transcription config loading and response fields (not part of PR #38)
- Remove chunk_transcription and overlap_time_sec params from start_recording
- Simplify extract_headings_for_filename to scan only first 7 lines instead of whole doc
SaschaFuksa deleted branch batch/35 2026-04-21 11:59:52 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Tasch/whisper-transcribe!38
No description provided.