Architecture
System design and component structure
Overview
Ozen-web is a fully client-side web application built with SvelteKit, designed to run entirely in the browser without any backend server. The architecture emphasizes:
- Reactive state management via Svelte stores
- WASM-powered analysis for Praat-accurate acoustic computation
- Canvas-based rendering for high-performance visualization
- Progressive enhancement for long audio files
- Modular design with clear separation of concerns
Application Structure
Routes
Ozen-web uses SvelteKit’s file-based routing with two main routes:
src/routes/
├── +page.svelte # Main desktop application
├── +layout.svelte # App shell (shared layout)
├── +layout.ts # Prerender config
└── viewer/
├── +page.svelte # Mobile-optimized viewer
└── +layout.ts # Viewer prerender config
Main Application (/): - Full-featured desktop interface - Editable annotations and data points - File drop zone, settings panels, toolbar - Keyboard shortcuts for efficient workflow
Mobile Viewer (/viewer): - Touch-optimized, read-only interface - URL-based audio loading (?audio=...) - Compact values display - Gesture support (tap, drag, pinch, pan)
Build System
Static Site Generation: - All routes are prerendered at build time (not SPA mode) - Uses relative paths for portable deployment - Post-build script (scripts/fix-relative-paths.js) ensures base path detection works in subdirectories - No server required - deploy to any static host
Build output:
npm run build # → build/ directoryDeployment flexibility:
# Works at root
https://example.com/
# Works in subdirectory
https://example.com/subfolder/
# Works on GitHub Pages
https://username.github.io/ozen-web/
Component Hierarchy
graph TD
A[+layout.svelte] --> B[+page.svelte Main App]
A --> C[viewer/+page.svelte Mobile Viewer]
B --> D[FileDropZone]
B --> E[Waveform]
B --> F[Spectrogram]
B --> G[AnnotationEditor]
B --> H[ValuesPanel]
B --> I[TimeAxis]
G --> J[Tier x N]
F -.overlay.-> K[Pitch overlay]
F -.overlay.-> L[Formant overlay]
F -.overlay.-> M[Data points]
C --> N[Compact ValuesPanel]
C --> O[Spectrogram read-only]
C --> P[Touch gesture layer]
Component Responsibilities
Layout Components:
| Component | File | Purpose |
|---|---|---|
| App shell | +layout.svelte |
HTML structure, global styles, WASM initialization |
| Main app | +page.svelte |
Desktop UI orchestration, toolbar, panels |
| Mobile viewer | viewer/+page.svelte |
Touch-optimized view-only interface |
Core Visualization:
| Component | File | Responsibility |
|---|---|---|
Waveform |
Waveform.svelte |
Amplitude display, downsampling, synchronized cursor |
Spectrogram |
Spectrogram.svelte |
Time-frequency visualization, overlay rendering, interaction |
TimeAxis |
TimeAxis.svelte |
Time ruler with tick marks and labels |
Annotation & Data:
| Component | File | Responsibility |
|---|---|---|
AnnotationEditor |
AnnotationEditor.svelte |
Tier container, add/remove tiers, toolbar |
Tier |
Tier.svelte |
Individual tier display, boundary editing, text input |
ValuesPanel |
ValuesPanel.svelte |
Real-time acoustic measurements at cursor |
Utilities:
| Component | File | Responsibility |
|---|---|---|
FileDropZone |
FileDropZone.svelte |
Drag-drop and file picker for audio files |
Modal |
Modal.svelte |
Reusable modal dialog container |
State Management
Ozen-web uses Svelte stores for all shared state. Stores are the single source of truth; components subscribe and react to changes.
Store Architecture
graph LR
A[User Action] --> B[Component]
B --> C[Store Update]
C --> D[Store]
D --> E[Reactive $binding]
E --> F[Component Re-render]
D -.derives.-> G[Derived Store]
G --> E
Core Stores
src/lib/stores/
| Store | File | State |
|---|---|---|
| Audio | audio.ts |
audioBuffer, sampleRate, fileName, duration |
| View | view.ts |
timeRange, cursorPosition, selection, hoverPosition |
| Analysis | analysis.ts |
analysisResults, isAnalyzing, analysisParams |
| Annotations | annotations.ts |
tiers, activeTier, annotation functions |
| Data Points | dataPoints.ts |
dataPoints array with measurements |
| Undo/Redo | undoManager.ts |
Unified history stack for all edits |
| Config | config.ts |
Colors, formant presets, UI preferences |
Data Flow Example
Loading audio file:
sequenceDiagram
participant U as User
participant D as FileDropZone
participant A as audio.ts
participant V as view.ts
participant An as analysis.ts
participant W as Waveform
participant S as Spectrogram
U->>D: Drop audio file
D->>D: Decode with Web Audio API
D->>A: Set audioBuffer, sampleRate, fileName
A->>V: Reset timeRange to [0, duration]
A->>An: Trigger runAnalysis()
An->>An: Compute pitch, formants, etc.
An->>An: Set analysisResults
A-->>W: Reactive update (audioBuffer changed)
W->>W: Redraw waveform
An-->>S: Reactive update (analysisResults changed)
S->>S: Render spectrogram + overlays
Editing annotation:
sequenceDiagram
participant U as User
participant T as Tier
participant UM as undoManager.ts
participant An as annotations.ts
participant T2 as Tier (all)
U->>T: Double-click to add boundary
T->>UM: saveUndo()
UM->>UM: Capture current state snapshot
T->>An: addBoundary(tierIndex, time)
An->>An: Update tiers array
An-->>T2: Reactive update
T2->>T2: Re-render all tiers
Derived Stores
Computed state is implemented with derived stores:
// src/lib/stores/view.ts
export const visibleDuration = derived(
timeRange,
($timeRange) => $timeRange.end - $timeRange.start
);Components can subscribe to $visibleDuration which automatically updates when timeRange changes.
WASM Integration
Backend Abstraction Layer
File: src/lib/wasm/acoustic.ts
The abstraction layer provides a unified API across multiple WASM backends:
| Backend | Source | License |
|---|---|---|
praatfan-local |
static/wasm/praatfan/ |
MIT/Apache-2.0 |
praatfan |
CDN (GitHub Pages) | MIT/Apache-2.0 |
praatfan-gpl |
CDN (GitHub Pages) | GPL |
Why abstraction matters: - Different backends may have slightly different APIs - Wrapper functions normalize the interface - Easy to add new backends without touching components
WASM Call Flow
sequenceDiagram
participant C as Component/Store
participant A as acoustic.ts (abstraction)
participant W as WASM Module
participant M as Memory
C->>A: computePitch(sound, params)
A->>W: sound.to_pitch(...)
W->>M: Allocate arrays
W->>M: Run autocorrelation
W->>M: Store results
W-->>A: Return Pitch object
A->>W: pitch.ts() → Float64Array
A->>W: pitch.selected_array() → Float64Array
W-->>A: Return arrays
A->>W: pitch.free()
W->>M: Deallocate
A-->>C: Return { times, values }
Critical pattern: Always .free() WASM objects to prevent memory leaks.
Initialization
On app load (+layout.svelte):
- Check for
config.yamlor URL parameter for backend choice - Call
initWasm(backend)fromacoustic.ts - Set
wasmReadystore totrue - Components react to
$wasmReadyand enable analysis features
Lazy loading for CDN backends:
// praatfan backend downloads ~5MB on first init
await initWasm('praatfan'); // 1-3 second delayInstant for local backend:
// praatfan-local uses bundled WASM
await initWasm('praatfan-local'); // ~200msRendering Pipeline
Canvas Strategy
Ozen-web uses HTML5 Canvas for all visualizations (not SVG/DOM) to handle: - Large datasets (thousands of time points, frequency bins) - Real-time updates (cursor tracking, playback) - Smooth zoom/pan interactions
Spectrogram Rendering
Multi-stage pipeline:
graph TD
A[Load audio] --> B[Compute full spectrogram via WASM]
B --> C[Apply grayscale colormap]
C --> D[Create ImageData]
D --> E[Draw to off-screen canvas cache]
E --> F[On zoom/pan: draw visible region]
F --> G{Zoom > 2x?}
G -->|Yes| H[Debounce 300ms]
G -->|No| I[Draw from cache]
H --> J[Recompute high-res spectrogram for visible window]
J --> K[Update cache for this region]
K --> I
I --> L[Draw overlays on top]
L --> M[Pitch track]
L --> N[Formants]
L --> O[Data points]
L --> P[Cursor/selection]
Why this approach: - Full spectrogram cache: Fast redraw when panning at low zoom - Dynamic resolution: High-quality detail when zoomed in - Debouncing: Prevents excessive recomputation during smooth zoom - Overlay separation: Overlays drawn each frame without recomputing spectrogram
Code location: src/lib/components/Spectrogram.svelte
Waveform Rendering
Downsampling strategy:
// Pseudocode from Waveform.svelte
function renderWaveform(audioBuffer, timeRange, canvasWidth) {
const samplesPerPixel = Math.ceil(visibleSamples / canvasWidth);
for (let x = 0; x < canvasWidth; x++) {
const startSample = visibleStart + (x * samplesPerPixel);
const endSample = startSample + samplesPerPixel;
// Find min/max in this pixel column
let min = Infinity, max = -Infinity;
for (let i = startSample; i < endSample; i++) {
const sample = audioBuffer[i];
if (sample < min) min = sample;
if (sample > max) max = sample;
}
// Draw vertical line from min to max
drawVerticalLine(x, min, max);
}
}Result: One vertical line per pixel showing amplitude range, efficient for any zoom level.
Overlay Rendering
Layers drawn on spectrogram canvas:
- Base spectrogram (ImageData from cache)
- Selection (semi-transparent blue rectangle)
- Pitch track (blue line with dots)
- Formant tracks (red dots for F1-F4)
- Intensity (green line)
- HNR/CoG/Spectral Tilt (additional colored tracks)
- Data points (yellow dashed vertical lines + circles)
- Cursor (red vertical line, drawn last)
Drawing order matters: Cursor must be on top to remain visible during playback.
Touch Handling
File: src/lib/touch/gestures.ts
The mobile viewer (/viewer route) uses custom touch gesture recognition:
Gesture Types
| Gesture | Fingers | Action | Effect |
|---|---|---|---|
| Tap | 1 | Quick touch | Set cursor position |
| Drag | 1 | Touch + move | Create selection |
| Pan | 2 | Two-finger drag | Scroll time axis |
| Pinch | 2 | Spread/pinch | Zoom in/out |
Touch Event Flow
sequenceDiagram
participant U as User
participant G as gestures.ts
participant V as view.ts
participant S as Spectrogram
U->>G: touchstart (2 fingers)
G->>G: Detect pinch gesture
U->>G: touchmove
G->>G: Calculate pinch distance
G->>G: Compute zoom delta
G->>V: Update timeRange (zoom)
V-->>S: Reactive update
S->>S: Redraw at new zoom level
U->>G: touchend
G->>G: Reset gesture state
Conflict resolution: - 1 finger: Wait 150ms to distinguish tap vs drag - 2 fingers: Immediately enter pan/pinch mode - >2 fingers: Ignore (prevent accidental gestures)
Code location: viewer/+page.svelte integrates gestures.ts
File I/O
Loading Files
Audio files:
graph LR
A[File source] --> B{Input method}
B -->|Drag & drop| C[FileDropZone]
B -->|File picker| C
B -->|Microphone| D[MediaRecorder API]
B -->|URL parameter| E[Fetch with CORS]
B -->|Data URL| F[Base64 decode]
C --> G[Decode with Web Audio API]
D --> G
E --> G
F --> G
G --> H[Float64Array samples]
H --> I[audio.ts store]
TextGrid files:
graph LR
A[File picker] --> B[FileReader.readAsText]
B --> C[textgrid/parser.ts]
C --> D{Format?}
D -->|Short| E[Parse short format]
D -->|Long| F[Parse long format]
E --> G[Tier objects]
F --> G
G --> H[annotations.ts store]
Saving Files
Two approaches based on browser support:
Modern browsers (File System Access API):
const handle = await window.showSaveFilePicker({
suggestedName: 'annotations.TextGrid',
types: [{ description: 'TextGrid', accept: { 'text/plain': ['.TextGrid'] } }]
});
const writable = await handle.createWritable();
await writable.write(textGridContent);
await writable.close();Fallback (Download link):
const blob = new Blob([textGridContent], { type: 'text/plain' });
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'annotations.TextGrid';
a.click();
URL.revokeObjectURL(url);File types supported:
| Format | Load | Save | Store | Parser |
|---|---|---|---|---|
| WAV audio | ✅ | ✅ | audio.ts |
Web Audio API |
| MP3 audio | ✅ | ❌ | audio.ts |
Web Audio API |
| OGG audio | ✅ | ❌ | audio.ts |
Web Audio API |
| TextGrid | ✅ | ✅ | annotations.ts |
textgrid/parser.ts |
| TSV data | ✅ | ✅ | dataPoints.ts |
Built-in |
Long Audio Handling
Problem: Analyzing >60 second files upfront causes UI freezing.
Solution: Progressive, on-demand analysis.
Analysis Strategy
graph TD
A[Load audio file] --> B{Duration > 60s?}
B -->|No| C[Run full analysis immediately]
B -->|Yes| D[Skip analysis, show waveform only]
C --> E[Display all features]
D --> F[User zooms in]
F --> G{Visible window ≤ 60s?}
G -->|No| H[Wait for more zoom]
G -->|Yes| I[Debounce 300ms]
H --> F
I --> J[Compute analysis for visible range only]
J --> K[Cache results for this region]
K --> E
Implementation:
// src/lib/stores/analysis.ts
export const MAX_ANALYSIS_DURATION = 60;
export async function runAnalysis(): Promise<void> {
const audioDuration = get(duration);
if (audioDuration > MAX_ANALYSIS_DURATION) {
console.log('Audio too long, skipping upfront analysis');
analysisResults.set(null);
return;
}
// Run full analysis
await computeAllFeatures();
}
// Called when user zooms (in Spectrogram.svelte)
export async function runAnalysisForRange(start: number, end: number): Promise<void> {
if (end - start > MAX_ANALYSIS_DURATION) {
return; // Still too wide
}
// Compute just for this window
await computeFeaturesForRange(start, end);
}User experience: - File loads instantly (no hanging) - Waveform always visible - Spectrogram shows “Zoom in to see spectrogram” message - Overlays appear once zoomed to analyzable window
Code location: src/lib/stores/analysis.ts, src/lib/components/Spectrogram.svelte
Unified Undo System
File: src/lib/stores/undoManager.ts
Architecture
State-snapshot approach: - Before each change, capture full state (all tiers + all data points) - Use JSON deep-copy for isolation - Single history stack ensures chronological order
graph LR
A[User edits annotation] --> B[saveUndo]
B --> C[Capture current state]
C --> D[Push to history stack]
D --> E[Perform mutation]
F[User clicks Undo] --> G[Pop from history]
G --> H[Restore previous state]
H --> I[Update stores]
Undoable Operations
Annotations: - Add boundary - Remove boundary - Move boundary - Edit interval text
Data Points: - Add data point - Remove data point - Move data point
Not undoable (by design): - Add/remove/rename tier (structural changes) - Load audio/TextGrid (file operations)
Usage Pattern
import { saveUndo, undo, redo } from '$lib/stores/undoManager';
export function addBoundary(tierIndex: number, time: number): void {
saveUndo(); // MUST call before mutation
tiers.update(t => {
// Modify tiers...
return t;
});
}
// In component
function handleUndo(event: KeyboardEvent) {
if ((event.ctrlKey || event.metaKey) && event.key === 'z') {
undo();
}
}Important: Always call saveUndo() before the mutation, not after.
Audio Playback
File: src/lib/audio/player.ts
Uses Web Audio API for precise playback:
graph LR
A[audioBuffer store] --> B[AudioBufferSourceNode]
B --> C[GainNode volume control]
C --> D[AudioContext destination]
D --> E[System audio output]
F[cursorPosition store] --> G[requestAnimationFrame loop]
G --> H[Update cursor during playback]
H --> G
Playback Modes
Selection playback:
// Play only selected region
playSelection(selection.start, selection.end);Visible window playback:
// Play what's currently visible on screen
playVisibleWindow(timeRange.start, timeRange.end);Full file playback:
// Play from cursor to end
playFromCursor(cursorPosition, audioDuration);Cursor Synchronization
During playback: 1. AudioBufferSourceNode starts at offset 2. requestAnimationFrame loop checks AudioContext.currentTime 3. Calculate playback position: offset + (currentTime - startTime) 4. Update cursorPosition store (60 FPS) 5. All components with {$cursorPosition} binding re-render
Result: Smooth cursor tracking across waveform, spectrogram, and annotations.
Deployment Architecture
Static Build
No server required - entire app is static HTML/CSS/JS:
build/
├── index.html # Main app HTML
├── viewer.html # Mobile viewer HTML
├── _app/
│ ├── immutable/
│ │ ├── chunks/ # Code-split JS bundles
│ │ ├── entry/ # Entry points
│ │ └── nodes/ # Page components
│ └── version.json # Build version
└── ... # Static assets
Base path handling:
Post-build script (scripts/fix-relative-paths.js) injects runtime base path detection:
// In generated HTML
<script>
// Detect actual deployment path
window.__base = document.currentScript.src.replace(/\/[^\/]*$/, '').replace(/\/_app.*/, '');
</script>This allows deploying to: - Root: https://example.com/ - Subdirectory: https://example.com/tools/ozen/ - GitHub Pages: https://user.github.io/ozen-web/
No rebuild needed - same build works everywhere.
Performance Considerations
Memory Management
WASM objects: - Always .free() after use (Pitch, Formant, Spectrogram objects) - Use abstraction layer to enforce cleanup
Canvas caching: - Off-screen canvas for spectrogram (reuse without recomputing) - Debounce zoom to avoid excessive regeneration
Audio buffer: - Stored as Float64Array (full resolution, never downsampled) - Mono conversion done once on load (stereo → mono mix)
Rendering Optimization
Waveform: - Downsample to one vertical line per pixel - O(canvas width) complexity, not O(sample count)
Spectrogram: - Cache full spectrogram as ImageData - Draw visible region only (viewport clipping) - Regenerate high-res only when zoomed >2x
Overlays: - Skip rendering points outside visible time range - Use requestAnimationFrame for smooth cursor updates
Lazy Analysis
For audio >60s: - Skip upfront computation (prevents hanging) - Compute only visible window when zoomed - Cache computed regions (avoid re-analysis on pan)
Testing Strategy
Currently: Manual testing workflow (see Setup)
Future: Automated tests could include: - Unit tests for stores (Vitest) - WASM integration tests (mock acoustic.ts) - Component tests (Svelte Testing Library) - E2E tests (Playwright - already used for screenshots)
See Also
- Setup - Development environment
- Stores - Detailed store documentation
- WASM Integration - Backend development
- Contributing - How to contribute