File Formats
Supported file formats and specifications
Overview
Ozen-web supports multiple file formats for audio input, annotation import/export, and data export. This page documents format specifications and compatibility.
Audio Formats
WAV (Waveform Audio File Format)
Extension: .wav MIME type: audio/wav, audio/x-wav Support: ✅ Full (recommended)
Specifications: - Encoding: PCM (uncompressed) - Bit depth: 8-bit, 16-bit, 24-bit, 32-bit - Sample rate: Any (8 kHz - 96 kHz typical) - Channels: Mono or stereo (stereo is mixed to mono)
Advantages: - Uncompressed (no quality loss) - Universally compatible - Fast to decode
Disadvantages: - Large file size - Not suitable for web streaming
Example:
# Convert audio to WAV
ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wavMP3 (MPEG Audio Layer III)
Extension: .mp3 MIME type: audio/mpeg Support: ✅ Full
Specifications: - Encoding: Lossy compression - Bit rate: 64-320 kbps - Sample rate: 8-48 kHz - Channels: Mono or stereo
Advantages: - Small file size - Good for web distribution - Widely supported
Disadvantages: - Lossy compression artifacts - Not recommended for precise acoustic analysis - Compression can affect formant measurements
Example:
# Convert to MP3 (192 kbps)
ffmpeg -i input.wav -b:a 192k output.mp3OGG Vorbis
Extension: .ogg MIME type: audio/ogg Support: ✅ Full
Specifications: - Encoding: Lossy compression (Vorbis codec) - Bit rate: Variable (typically 64-320 kbps) - Sample rate: 8-192 kHz - Channels: Mono or stereo
Advantages: - Open format (no licensing fees) - Better quality than MP3 at same bitrate - Smaller than WAV
Disadvantages: - Lossy compression - Less universal than WAV/MP3
Format Recommendations
| Use Case | Recommended Format |
|---|---|
| Research analysis | WAV (16-bit, 16-44.1 kHz) |
| Long recordings | MP3 (192+ kbps) |
| Web embedding | MP3 or OGG |
| Archival | WAV (24-bit, 48 kHz) |
| Field recordings | WAV (16-bit, 16 kHz) |
Annotation Formats
TextGrid (Praat Format)
Extension: .TextGrid Support: ✅ Import and export
Description:
TextGrid is Praat’s native annotation format, widely used in phonetics research. Ozen-web supports both short and long TextGrid formats.
Format variants:
Short Format
File type = "ooTextFile"
Object class = "TextGrid"
xmin = 0
xmax = 2.5
tiers? <exists>
size = 2
item []:
item [1]:
class = "IntervalTier"
name = "words"
xmin = 0
xmax = 2.5
intervals: size = 3
intervals [1]:
xmin = 0
xmax = 0.8
text = "the"
intervals [2]:
xmin = 0.8
xmax = 1.5
text = "cat"
intervals [3]:
xmin = 1.5
xmax = 2.5
text = ""
Long Format
File type = "ooTextFile"
Object class = "TextGrid"
0
2.5
<exists>
2
"IntervalTier"
"words"
0
2.5
3
0
0.8
"the"
0.8
1.5
"cat"
1.5
2.5
""
Ozen-web support:
| Feature | Import | Export |
|---|---|---|
| Interval tiers | ✅ Yes | ✅ Yes |
| Point tiers | ❌ No | ❌ No |
| Multiple tiers | ✅ Yes | ✅ Yes |
| UTF-8 text | ✅ Yes | ✅ Yes |
| IPA characters | ✅ Yes | ✅ Yes |
| Short format | ✅ Yes | ✅ Yes (default) |
| Long format | ✅ Yes | ❌ No |
Compatibility:
- ✅ Praat (all versions)
- ✅ Montreal Forced Aligner
- ✅ Elan (via conversion)
- ✅ WebMAUS
Example (programmatic creation):
# Python (using pympi or textgrid library)
import textgrid
tg = textgrid.TextGrid()
tier = textgrid.IntervalTier(name='words', minTime=0, maxTime=2.5)
tier.add(0, 0.8, 'the')
tier.add(0.8, 1.5, 'cat')
tier.add(1.5, 2.5, '')
tg.append(tier)
tg.write('output.TextGrid')Limitations:
Ozen-web currently does not support point tiers. Convert point tiers to interval tiers in Praat before importing:
# In Praat
textgrid = selected("TextGrid")
Create TextGrid... 0 2.5 "words" ""
# Manually convert points to intervals
Data Export Formats
TSV (Tab-Separated Values)
Extension: .tsv Support: ✅ Export only
Description:
Data points are exported as tab-separated values for analysis in R, Python, Excel, or Praat.
Format:
time freq pitch intensity f1 f2 f3 f4 b1 b2 b3 b4 hnr cog spectral_tilt a1_p0 label_words label_phones
1.234 720 245 68 720 1240 2650 3500 80 110 150 200 15.3 5420 -2.1 -5.4 "cat" "æ"
1.567 850 250 70 850 1180 2580 3450 85 105 145 195 16.1 5380 -1.9 -4.8 "sat" "æ"
Column descriptions:
| Column | Unit | Description |
|---|---|---|
time |
seconds | Time position of data point |
freq |
Hz | Frequency at cursor position |
pitch |
Hz | Fundamental frequency (F0) |
intensity |
dB SPL | Sound pressure level |
f1, f2, f3, f4 |
Hz | Formant frequencies |
b1, b2, b3, b4 |
Hz | Formant bandwidths |
hnr |
dB | Harmonics-to-noise ratio |
cog |
Hz | Spectral center of gravity |
spectral_tilt |
dB/Hz | Spectral slope |
a1_p0 |
dB | Nasal measure |
label_* |
text | Annotation labels (one column per tier) |
Missing values:
- Empty cells indicate measurement not available
- Common for pitch in unvoiced regions
- Common for labels when no annotation at that time
Importing into tools:
R:
data <- read.table("data-points.tsv", header=TRUE, sep="\t", quote="\"")Python/Pandas:
import pandas as pd
df = pd.read_csv("data-points.tsv", sep='\t')Praat:
table = Read Table from tab-separated file: "data-points.tsv"
Excel: - File → Open → Select TSV file - Choose “Tab” as delimiter
Audio Save Format
WAV Export
Support: ✅ Save audio
Ozen-web can save the loaded audio as WAV file:
- Format: 16-bit PCM WAV
- Sample rate: Original (preserved from input)
- Channels: Mono (if stereo input, mixed down)
- Encoding: Uncompressed
Use cases: - Save microphone recording - Export processed audio - Convert MP3/OGG to WAV
Limitations: - Cannot edit audio (read-only) - Cannot apply effects - Save as-is only
File Size Guidelines
Audio Files
| Duration | WAV (16-bit, 16 kHz) | MP3 (192 kbps) | OGG (Q5) |
|---|---|---|---|
| 10 seconds | ~320 KB | ~240 KB | ~160 KB |
| 1 minute | ~1.9 MB | ~1.4 MB | ~960 KB |
| 5 minutes | ~9.5 MB | ~7.2 MB | ~4.8 MB |
| 30 minutes | ~57 MB | ~43 MB | ~29 MB |
| 1 hour | ~115 MB | ~86 MB | ~58 MB |
TextGrid Files
TextGrid files are plain text:
- Small: <1 KB (single tier, few intervals)
- Medium: 1-10 KB (multiple tiers, detailed annotation)
- Large: 10-100 KB (many tiers, long recordings)
TSV Files
Data point exports:
- ~100 bytes per data point
- 100 data points ≈ 10 KB
- 1000 data points ≈ 100 KB
Character Encoding
Text Files
All text files use UTF-8 encoding:
- TextGrid files
- TSV files
- Configuration files
This ensures support for: - IPA characters (ɑ, ə, ʃ, etc.) - Unicode symbols - Non-Latin scripts (中文, العربية, etc.)
Encoding issues:
If you see garbled characters, the file may be in a different encoding. Convert to UTF-8:
# Linux/Mac
iconv -f ISO-8859-1 -t UTF-8 input.TextGrid > output.TextGrid
# Or use Praat (always saves UTF-8)Browser Compatibility
All modern browsers support: - File picker (input type=“file”) - Drag and drop - File System Access API (save files) - WAV, FLAC, MP3 audio formats - OGG support varies by browser
Conversion Tools
ffmpeg (Command Line)
Convert to WAV:
# From MP3
ffmpeg -i input.mp3 output.wav
# Specify sample rate and channels
ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav
# From video (extract audio)
ffmpeg -i video.mp4 -vn -acodec pcm_s16le output.wavConvert to MP3:
# High quality
ffmpeg -i input.wav -b:a 192k output.mp3
# Smaller file
ffmpeg -i input.wav -b:a 128k output.mp3SoX (Sound eXchange)
# Convert format
sox input.mp3 output.wav
# Resample
sox input.wav -r 16000 output.wav
# Mono conversion
sox input.wav -c 1 output.wavPraat
# Read audio
sound = Read from file: "input.mp3"
# Save as WAV
Save as WAV file: "output.wav"
Online Tools
Future Format Support
Planned additions:
- FLAC - Lossless compression audio
- WebM - Web-native audio/video
- ELAN (.eaf) - Alternative annotation format
- JSON - Machine-readable annotation export
- CSV - Alternative to TSV
See Also
- Tutorial: Loading Audio - Supported loading methods
- Tutorial: Exporting - Export workflows
- Annotations - TextGrid features
- Data Points - TSV export details