Audio & Output

Rendering

Rendering

The render stage maps numeric spectrogram and feature data onto pixels, applies the chosen palette, composes panels into a grid, and encodes the result to JPEG or PNG.

#Output path

songsee track.mp3                       # writes track.jpg next to the input
songsee track.mp3 -o out.png            # explicit path; format inferred from extension
songsee track.mp3 -o spectro            # no extension; appends  by default
songsee - -o -                          # stdin in, encoded image to stdout

If --format is set explicitly, it overrides extension-based inference. If -o already ends in .png, .jpg, or .jpeg, the encoder follows the extension regardless of --format.

When the input is - (stdin) and no -o is given, the output filename is songsee.jpg (or .png) in the current directory.

#Format

songsee track.mp3 --format png          # PNG, lossless
songsee track.mp3 --format jpg          # JPEG, quality 95 (default)

JPEG quality is fixed at 95 — high enough that compression artifacts disappear at typical viewing sizes while keeping file sizes reasonable. PNG is the right choice for archival, transparency, or downstream processing that doesn't tolerate JPEG quantization.

#Dimensions

songsee track.mp3 --width 2560 --height 1440
songsee track.mp3 --width 3840 --height 2160       # 4K
songsee track.mp3 --width 800 --height 200         # banner strip

Defaults are 1920×1080. Both must be positive. With multiple visualizations, songsee divides the canvas into a grid; very small canvases with many panels can leave cells too small to render and produce an error.

#Grid layout

When --viz selects more than one mode, panels are tiled into a ceil(sqrt(n))-column grid with an 8 px gap, sized to fit --width × --height exactly:

panelsgrid
11×1
22×1
3, 42×2
5–63×2
7–93×3

Cells are equal width and height; the canvas is filled top-down, left-right, in the order panels appear in --viz.

#Frequency range

--min-freq and --max-freq clamp the visible band (Hz) for spectrogram, mel, and MFCC panels. The default upper bound is the Nyquist frequency (sampleRate / 2).

# Vocal range, 80 Hz – 4 kHz.
songsee track.mp3 --min-freq 80 --max-freq 4000

# Sub-bass focus.
songsee track.mp3 --min-freq 20 --max-freq 200

--max-freq must be greater than --min-freq; songsee rejects the run with exit code 2 otherwise.

#Auto-contrast

Every panel runs an independent percentile clamp on its values before palette mapping (typically 0.05 / 0.98 — ~5% black floor, ~2% white ceiling). This keeps a quiet ambient track and a loud rock track equally readable in the same grid; it also means absolute brightness is not comparable across panels.

The base spectrogram converts magnitudes to decibels (20·log10(mag + 1e-9)) before normalizing.

#Stdout streaming

Pass -o - to write the encoded image bytes to stdout. Combine with --quiet to silence the trailing path echo:

songsee track.mp3 -o - --quiet > spectro.jpg
songsee track.mp3 -o - --format png --quiet | imgcat
ssh host "songsee /audio/x.flac -o -" > x.jpg

Verbose decode info still goes to stderr under --verbose, so it doesn't corrupt the binary stream on stdout.

#Batch usage

There's no built-in songsee batch; lean on the shell.

# All MP3s in a directory.
for f in *.mp3; do songsee "$f" --style magma; done

# Parallel via xargs.
ls *.mp3 | xargs -P 8 -I{} songsee {} --width 1920 --height 540

# find + GNU parallel.
find . -name '*.flac' -print0 | parallel -0 songsee {} --style viridis -o {.}.png

songsee is single-threaded internally, so parallelism comes from running multiple processes.

  • Visualizations — what each panel shows.
  • Palettes — color maps applied at render time.
  • Pipeline — windowing, FFT, normalization details.
  • CLI — every flag with its default.