Skip to main content
ZeroUtil

Extract Audio from Video

Pull the audio track out of MP4, WebM, MOV and MKV. Output as MP3, WAV, AAC or Opus. Files auto-deleted after 15 minutes.

Maintained by

How to extract audio from video

  1. Drop your video file onto the upload area. MP4, WebM, MOV, and MKV up to 500 MB are accepted; the file uploads to api.zeroutil.com over HTTPS for server-side processing.
  2. Pick the output format: MP3 at 192 kbps for universal compatibility, WAV for uncompressed PCM, AAC at 192 kbps for modern players, or Opus at 128 kbps for the smallest high-quality result.
  3. Press Extract audio. The job enters a BullMQ queue, FFmpeg drops the video stream and re-encodes the audio track, and the page polls the job status until it completes.
  4. Download the audio file via the signed URL the page surfaces. Both the original video and the extracted audio are auto-deleted from the server within 15 minutes.

What the extractor does under the hood

This is a backend tool, not a client-side widget. The page POSTs the multipart upload to api.zeroutil.com/process/video with op extract-audio. The Hono server validates the request, persists the upload to a temporary directory, and pushes a job onto Redis-backed BullMQ. A worker process spawns FFmpeg with arguments along the lines of -i input.mp4 -vn -c:a libmp3lame -b:a 192k output.mp3: -vn drops the video stream, -c:a selects the encoder (libmp3lame for MP3, pcm_s16le for WAV, libfdk_aac or aac for AAC, libopus for Opus), and -b:a sets the target bitrate. The output is moved to a download directory and exposed via an HMAC-signed URL with a 15-minute expiry.

The 15-minute auto-delete is enforced by a cleanup cron that scans the upload and download directories every minute and unlinks anything older than the FILE_EXPIRY_MIN config (default 15). The signed-URL secret is rotated independently of the file lifetime, so even if a URL leaked it stops working at the same expiry boundary. There is no logging of audio content, no transcription, and no retention beyond the deletion window.

When this tool earns its keep

  • Extracting the audio from a Zoom or Google Meet recording to feed into a separate transcription pipeline.
  • Pulling the music bed out of a wedding or family video to identify a song with Shazam or AudD.
  • Salvaging an interview when only the video file survived a backup mishap and you need an MP3 for the editor.
  • Producing a ringtone source from a short video clip, combined with the audio trimmer to cut to length.
  • Extracting podcast audio from a YouTube re-upload of a livestream when the original RSS feed is gone.
  • Creating an MP3 of a lecture or conference talk from a screen-capture MP4 for offline listening on a phone.

Common pitfalls and edge cases

  • Lossy-on-lossy quality drop. YouTube and most video sources store audio as AAC at 128-192 kbps. Re-encoding to MP3 introduces a second compression pass; pick a higher target bitrate (192 kbps MP3 from 128 kbps AAC) or choose WAV to avoid it entirely.
  • WAV files are huge. Roughly 10 MB per minute at 44.1 kHz stereo. A 90-minute movie soundtrack is over 800 MB; only pick WAV when you genuinely need a sample-exact copy for DAW editing or archival.
  • Opus is not universally supported. Chrome, Firefox, Edge, and Android play Opus natively; iOS only added support in iOS 17, and many older Bluetooth devices fall back to lower-quality streaming. If the destination is unknown, MP3 is safer.
  • The source caps the ceiling. If the video has 64 kbps mono audio (common for old phone uploads), the extracted MP3 cannot sound better than the source no matter the target bitrate.
  • Multi-track audio collapses to the default. FFmpeg picks the first audio stream by default. Movies with separate director-commentary tracks lose the alternates; if you need a specific stream, pre-process with FFmpeg locally.
  • Some MOV files have AC-3 audio from older camcorders, which re-encodes cleanly but takes longer than AAC. Expect a small additional latency on those.

Container formats and audio codecs in 2026

MP4 is the ISO Base Media File Format (ISO/IEC 14496-12) container, almost universally paired with H.264 video and AAC audio. WebM is Google's container variant of Matroska, paired with VP8/VP9/AV1 video and Vorbis/Opus audio. MOV is Apple's QuickTime container, structurally close to MP4 because MP4 was derived from QuickTime in the late 1990s. MKV (Matroska) is the most flexible container, accepting nearly any codec combination. The audio codec inside the container determines the realistic quality ceiling: AAC and Opus are the modern defaults, MP3 is the legacy interoperability format, and PCM is what raw WAV captures. RFC 6716 specifies Opus and is the technical reason it consistently outperforms MP3 at lower bitrates.

Alternatives and when they beat this tool

Local FFmpeg (ffmpeg -i in.mp4 -vn -c:a libmp3lame -b:a 192k out.mp3) is the fastest option for batch jobs and the right pick when you need precise control over codec parameters or stream selection. Audacity (free, cross-platform) imports a video and exports the audio with a GUI, which is handy when you also want to edit. yt-dlp can extract the audio directly from a YouTube URL without downloading the video, useful when the source is online rather than already on your disk. The on-page extractor wins when your video is local, you do not want to install FFmpeg, and you do not want to upload private content to a free service whose retention policy is unclear.

Frequently Asked Questions

Does extracting audio reduce quality?

Lossy formats (MP3, AAC, Opus) re-encode the audio, which means some detail is discarded. For most material the difference is inaudible at 192 kbps. For perfect quality choose WAV - it copies the PCM waveform exactly without re-encoding loss. Note that the source audio in a video is usually already compressed (AAC inside MP4), so the absolute ceiling is set by the source.

Why is the MP3 quality lower than the original?

YouTube and most video sources store audio as AAC at roughly 128-192 kbps. Re-encoding from AAC to MP3 introduces a second lossy compression pass on top, called transcoding loss. To minimize the loss use the same bitrate or higher (192 kbps MP3 from 128 kbps AAC is fine), or pick WAV to skip the second compression entirely.

Can I extract audio without re-encoding?

In theory yes (FFmpeg with `-c:a copy` writes the audio stream as-is), but the output container has to match the codec. AAC inside MP4 can be copied as M4A; MP3 inside MKV can be copied as MP3. We default to re-encoding because it produces a predictable, universally playable file. If you need a stream copy ask for it as a feature request.

What is the difference between MP3 and AAC?

AAC is the newer, more efficient codec. At the same bitrate AAC sounds noticeably cleaner than MP3, especially on cymbals, applause and other high-frequency material. The downside is compatibility - very old devices and some legacy players only handle MP3. For 2026 use cases AAC is usually the better technical choice; MP3 wins only on universal compatibility.

How big will the audio file be?

At 192 kbps a 60-minute MP3 is around 80-90 MB; at 320 kbps it is 130-140 MB. WAV at 44.1 kHz stereo is ~10 MB per minute, so a one-hour file is ~600 MB. Opus at 128 kbps is the smallest of the lossy options at ~55 MB per hour.

Why is the output file larger than expected?

Most likely because the source video had higher-bitrate audio than the output target (e.g. a music video with 320 kbps source extracted to 192 kbps MP3 - the MP3 is smaller). Or you picked WAV which stores raw uncompressed samples. To make a smaller file pick MP3 or Opus at a lower bitrate.

Are my files private?

Files travel over HTTPS to api.zeroutil.com (EU server). FFmpeg processes them locally on the server, returns a signed download URL, and both the input video and the extracted audio are auto-deleted after 15 minutes. We do not log content, do not transcribe audio, and do not retain anything beyond the deletion window.

More Video & Audio