File Formats

Supported file formats and specifications

Overview

Ozen-web supports multiple file formats for audio input, annotation import/export, and data export. This page documents format specifications and compatibility.

Audio Formats

WAV (Waveform Audio File Format)

Extension: .wav MIME type: audio/wav, audio/x-wav Support: ✅ Full (recommended)

Specifications: - Encoding: PCM (uncompressed) - Bit depth: 8-bit, 16-bit, 24-bit, 32-bit - Sample rate: Any (8 kHz - 96 kHz typical) - Channels: Mono or stereo (stereo is mixed to mono)

Advantages: - Uncompressed (no quality loss) - Universally compatible - Fast to decode

Disadvantages: - Large file size - Not suitable for web streaming

Example:

# Convert audio to WAV
ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav

MP3 (MPEG Audio Layer III)

Extension: .mp3 MIME type: audio/mpeg Support: ✅ Full

Specifications: - Encoding: Lossy compression - Bit rate: 64-320 kbps - Sample rate: 8-48 kHz - Channels: Mono or stereo

Advantages: - Small file size - Good for web distribution - Widely supported

Disadvantages: - Lossy compression artifacts - Not recommended for precise acoustic analysis - Compression can affect formant measurements

Example:

# Convert to MP3 (192 kbps)
ffmpeg -i input.wav -b:a 192k output.mp3

OGG Vorbis

Extension: .ogg MIME type: audio/ogg Support: ✅ Full

Specifications: - Encoding: Lossy compression (Vorbis codec) - Bit rate: Variable (typically 64-320 kbps) - Sample rate: 8-192 kHz - Channels: Mono or stereo

Advantages: - Open format (no licensing fees) - Better quality than MP3 at same bitrate - Smaller than WAV

Disadvantages: - Lossy compression - Less universal than WAV/MP3

Format Recommendations

Use Case Recommended Format
Research analysis WAV (16-bit, 16-44.1 kHz)
Long recordings MP3 (192+ kbps)
Web embedding MP3 or OGG
Archival WAV (24-bit, 48 kHz)
Field recordings WAV (16-bit, 16 kHz)

Annotation Formats

TextGrid (Praat Format)

Extension: .TextGrid Support: ✅ Import and export

Description:

TextGrid is Praat’s native annotation format, widely used in phonetics research. Ozen-web supports both short and long TextGrid formats.

Format variants:

Short Format

File type = "ooTextFile"
Object class = "TextGrid"

xmin = 0
xmax = 2.5
tiers? <exists>
size = 2
item []:
    item [1]:
        class = "IntervalTier"
        name = "words"
        xmin = 0
        xmax = 2.5
        intervals: size = 3
        intervals [1]:
            xmin = 0
            xmax = 0.8
            text = "the"
        intervals [2]:
            xmin = 0.8
            xmax = 1.5
            text = "cat"
        intervals [3]:
            xmin = 1.5
            xmax = 2.5
            text = ""

Long Format

File type = "ooTextFile"
Object class = "TextGrid"

0
2.5
<exists>
2
"IntervalTier"
"words"
0
2.5
3
0
0.8
"the"
0.8
1.5
"cat"
1.5
2.5
""

Ozen-web support:

Feature Import Export
Interval tiers ✅ Yes ✅ Yes
Point tiers ❌ No ❌ No
Multiple tiers ✅ Yes ✅ Yes
UTF-8 text ✅ Yes ✅ Yes
IPA characters ✅ Yes ✅ Yes
Short format ✅ Yes ✅ Yes (default)
Long format ✅ Yes ❌ No

Compatibility:

  • ✅ Praat (all versions)
  • ✅ Montreal Forced Aligner
  • ✅ Elan (via conversion)
  • ✅ WebMAUS

Example (programmatic creation):

# Python (using pympi or textgrid library)
import textgrid

tg = textgrid.TextGrid()
tier = textgrid.IntervalTier(name='words', minTime=0, maxTime=2.5)
tier.add(0, 0.8, 'the')
tier.add(0.8, 1.5, 'cat')
tier.add(1.5, 2.5, '')
tg.append(tier)
tg.write('output.TextGrid')

Limitations:

WarningPoint Tiers Not Supported

Ozen-web currently does not support point tiers. Convert point tiers to interval tiers in Praat before importing:

# In Praat
textgrid = selected("TextGrid")
Create TextGrid... 0 2.5 "words" ""
# Manually convert points to intervals

Data Export Formats

TSV (Tab-Separated Values)

Extension: .tsv Support: ✅ Export only

Description:

Data points are exported as tab-separated values for analysis in R, Python, Excel, or Praat.

Format:

time    freq    pitch   intensity   f1  f2  f3  f4  b1  b2  b3  b4  hnr cog spectral_tilt   a1_p0   label_words label_phones
1.234   720 245 68  720 1240    2650    3500    80  110 150 200 15.3    5420    -2.1    -5.4    "cat"   "æ"
1.567   850 250 70  850 1180    2580    3450    85  105 145 195 16.1    5380    -1.9    -4.8    "sat"   "æ"

Column descriptions:

Column Unit Description
time seconds Time position of data point
freq Hz Frequency at cursor position
pitch Hz Fundamental frequency (F0)
intensity dB SPL Sound pressure level
f1, f2, f3, f4 Hz Formant frequencies
b1, b2, b3, b4 Hz Formant bandwidths
hnr dB Harmonics-to-noise ratio
cog Hz Spectral center of gravity
spectral_tilt dB/Hz Spectral slope
a1_p0 dB Nasal measure
label_* text Annotation labels (one column per tier)

Missing values:

  • Empty cells indicate measurement not available
  • Common for pitch in unvoiced regions
  • Common for labels when no annotation at that time

Importing into tools:

R:

data <- read.table("data-points.tsv", header=TRUE, sep="\t", quote="\"")

Python/Pandas:

import pandas as pd
df = pd.read_csv("data-points.tsv", sep='\t')

Praat:

table = Read Table from tab-separated file: "data-points.tsv"

Excel: - File → Open → Select TSV file - Choose “Tab” as delimiter

Audio Save Format

WAV Export

Support: ✅ Save audio

Ozen-web can save the loaded audio as WAV file:

  • Format: 16-bit PCM WAV
  • Sample rate: Original (preserved from input)
  • Channels: Mono (if stereo input, mixed down)
  • Encoding: Uncompressed

Use cases: - Save microphone recording - Export processed audio - Convert MP3/OGG to WAV

Limitations: - Cannot edit audio (read-only) - Cannot apply effects - Save as-is only

File Size Guidelines

Audio Files

Duration WAV (16-bit, 16 kHz) MP3 (192 kbps) OGG (Q5)
10 seconds ~320 KB ~240 KB ~160 KB
1 minute ~1.9 MB ~1.4 MB ~960 KB
5 minutes ~9.5 MB ~7.2 MB ~4.8 MB
30 minutes ~57 MB ~43 MB ~29 MB
1 hour ~115 MB ~86 MB ~58 MB

TextGrid Files

TextGrid files are plain text:

  • Small: <1 KB (single tier, few intervals)
  • Medium: 1-10 KB (multiple tiers, detailed annotation)
  • Large: 10-100 KB (many tiers, long recordings)

TSV Files

Data point exports:

  • ~100 bytes per data point
  • 100 data points ≈ 10 KB
  • 1000 data points ≈ 100 KB

Character Encoding

Text Files

All text files use UTF-8 encoding:

  • TextGrid files
  • TSV files
  • Configuration files

This ensures support for: - IPA characters (ɑ, ə, ʃ, etc.) - Unicode symbols - Non-Latin scripts (中文, العربية, etc.)

Encoding issues:

If you see garbled characters, the file may be in a different encoding. Convert to UTF-8:

# Linux/Mac
iconv -f ISO-8859-1 -t UTF-8 input.TextGrid > output.TextGrid

# Or use Praat (always saves UTF-8)

Browser Compatibility

All modern browsers support: - File picker (input type=“file”) - Drag and drop - File System Access API (save files) - WAV, FLAC, MP3 audio formats - OGG support varies by browser

Conversion Tools

ffmpeg (Command Line)

Convert to WAV:

# From MP3
ffmpeg -i input.mp3 output.wav

# Specify sample rate and channels
ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav

# From video (extract audio)
ffmpeg -i video.mp4 -vn -acodec pcm_s16le output.wav

Convert to MP3:

# High quality
ffmpeg -i input.wav -b:a 192k output.mp3

# Smaller file
ffmpeg -i input.wav -b:a 128k output.mp3

SoX (Sound eXchange)

# Convert format
sox input.mp3 output.wav

# Resample
sox input.wav -r 16000 output.wav

# Mono conversion
sox input.wav -c 1 output.wav

Praat

# Read audio
sound = Read from file: "input.mp3"

# Save as WAV
Save as WAV file: "output.wav"

Online Tools

Future Format Support

Planned additions:

  • FLAC - Lossless compression audio
  • WebM - Web-native audio/video
  • ELAN (.eaf) - Alternative annotation format
  • JSON - Machine-readable annotation export
  • CSV - Alternative to TSV

See Also

Back to top