Home » Blog
date 28.Nov.2021

■ Extract time series data from music files, with 100 lines of C++


Amazing as it sounds, any musical masterpiece is really a squiggly waveform, a bunch of sine waves added together, digitized and sampled 44100 times each second, like the wave visualizations you see in windows media player. This pure (uncompressed) sound signal is what makes WAV files so huge. Audio formats like MP3 on the other hand compress the sound wave so that it sounds almost as good at a fraction of the storage required.

sound signal in time

Any kind of sound processing (e.g. finding similar songs by content) requires the uncompressed data, e.g. for Fourier transforms (FFT), MFCC extraction etc. There are tons of different audio compression algorithms and file formats (MP3, WMA, MPA, CDA, AIF, OGG, AC3, M4A, FLAC, RMI, ...), each based on a different codec to decode the sound data. Finding a library to decode all these formats seems like an impossible task. There are cross-platform solutions like FFMPEG, OPENCV(?) and partial solutions like MPG123 (which only decodes MP3).

If you develop on Windows on the other hand, and don't care about cross platform, and target windows vista and later, there's Microsoft Media Foundation (MMF), which is the evolution of DirectShow of yore. Media Foundation is a combination of C++ API and several COM objects, that deal with reading, decoding and playing media files (audio and video). The full MMF is quite complex, dealing with topologies, sessions and transformations. Luckily for us, MMF's source reader objects were designed just for the task at hand, loading any media file and mapping the correct installed MFT codec to extract pure audio (it also decodes video frames). The procedure is quite easy:

Media foundation source readers can also do resampling and channel merging, essential steps for preprocessing audio. This way you can standardize all your variable sampling frequencies (44.1kHZ, 48, 22), mono/stereo channels and compression formats, and extract a homogenous PCM signal. You can even extract audio streams from movies. I put together a simple Visual Studio project (C++) that loads a media file and shows the first few milliseconds of the sound wave. No bells and whistles, but demonstrates how to decompress any media file with very little programming effort.
Click to download soundwave C++ source code (50 KB)

Minimum requirements: windows Vista or later

Post a comment on this topic »

Share |

©2002-2021 ZABKAT LTD, all rights reserved | Privacy policy | Sitemap