As I was learning XAudio2 I came across countless tutorials showing how to read in uncompressed .wav files and feed them into an XAudio2 source voice. What was even worse was most of these tutorials reinvented the wheel on parsing and validating a .wav file (even the sample on MSDN “How to: Load Audio Data Files in XAudio2” performs such manual parsing). While reinventing the wheel is never a good thing you also might not want to utilize uncompressed audio files in your game because, well... they are just to big! The .mp3 compression format reduces audio file size by about 10x and provides no inherently noticeable degradation in sound quality. This would certainly be great for the music your games play!
Microsoft Media Foundation
Microsoft Media Foundation, as described by Microsoft, is the next generation multimedia platform for Windows. It was introduced as a replacement for DirectShow and offers capabilities such as the following
- Playing Media
- Transcoding Media
- Decoding Media
- Encoding Media
NOTE: I use Media to represent audio, video, or a combination of both
The Pipeline Architecture
Media Foundation is well architectured and consists of many various components. These components are designed to connect together like Lego pieces to produce what is known as a Media Foundation Pipeline. A full Media Foundation pipeline consists of reading a media file from some location, such as the file system, to sending the it to one or more optional components that can transform the audio in someway and then finally sending it to a renderer that forwards the media to some output device.
The Media Foundation Source Reader
The Source reader was introduced to allow applications to utilize features of Media Foundation without having to build a full MF Pipeline. For Example, you might want to read and possibly decode an audio file and then pass it to the XAudio2 engine for playback.
Source Readers can be thought of as a component that can read an audio file and produce media samples to be consumed by your application in any way you see fit.
Media Types
Media Types are used in MF to describe the format of a particular media stream that came from possibly a file system. Your applications generally use media types to determine the format and the type of media in the stream. Objects within Media Foundation, such as the source reader, use these as well such as for loading the correct decoder for the media type output you are wanting.
Parts of a Media Type
Media Types consist of 2 parts that provide information about the type of media in a data stream. The 2 parts are described below:
- A Major Type
- The Major Type indicates the type of data (audio or video)
- A Sub Type
- The Sub Type indicates the format of the data (compressed mp3, uncompressed wav, etc)
Getting our hands dirty
With the basics out of the way, let’s now see how we can utilize Media Foundation’s Source Reader to read in any type of audio file (compressed or uncompressed) and extract the bytes to be sent to XAudio2 for playback.
First Things First, before we can begin using Media Foundation we must load and initialize the framework within our application. This is done with a call to MSStartup(MF_VERSION). We should also be good citizens and be sure to unload it once we are done using it with MSShutdown(). This seems like a great opportunity to use the RAII idiom to create a class that handles all of this for us.
struct MediaFoundationInitialize { MediaFoundationInitialize() { HR(MFStartup(MF_VERSION)); } ~MediaFoundationInitialize() { HR(MFShutdown()); } }; int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; return 0; }
Once Media Foundation has been initialized the next thing we need to do is create the source reader. This is done using the MFCreateSourceReaderFromURL() factory method that accepts the following 3 arguments.
- Location to the media file on disk
- Optional list of attributes that will configure settings that affect how the source reader operates
- The output parameter of the newly allocated source reader
int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr<IMFAttributes> sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr<IMFSourceReader> sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); return 0; }
Notice we set 1 attribute for our source reader
- MF_LOW_LATENCY – This attribute informs the source reader we want data as quick as possible for in near real time operations
With the source reader created and attached to our media file we can query the source reader for the native media type of the file. This will allow us to do some validation such as verifying that the file is indeed an audio file and also if its compressed so that we can branch off and perform extra work needed by MF to uncompress it.
int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr<IMFAttributes> sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr<IMFSourceReader> sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr<IMFMediaType> nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed } return 0; }
If the audio file happens to be compressed (such as if we were reading in an .mp3 file) then we need to inform the source reader we would like it to decode the audio file so that it can be sent to our audio device. This is done by creating a Partial Media Type object and setting the MAJOR and SUBTYPE options for the type of output we would like. When passed to the source reader it will look throughout the system for registered decoders that can perform such requested conversion. Calling IMFSourceReader::SetCurrentMediaType() will pass if a decoder exists or fail otherwise
int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr<IMFAttributes> sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr<IMFSourceReader> sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr<IMFMediaType> nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr<IMFMediaType> partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } return 0; }
Now that we have the source reader configured we must next create a WAVEFORMATEX object from the source reader. This data structure essentially represent the fmt chunk in a RIFF file. This is needed so that XAudio2 or more generally anything that wants to play the audio knows the speed at which playback should happen. This is done by Calling IMFSourceReader::MFCreateWaveFormatExFromMFMediaType(). This function takes the following 3 parameters
- The Current Media Type of the Source Reader
- The address to a WAVEFORMATEX struct that will be filled in by the function
- The address of an unsigned int that will be filled in with the size of the above struct
int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr<IMFAttributes> sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr<IMFSourceReader> sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr<IMFMediaType> nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr<IMFMediaType> partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } ComPtr<IMFMediaType> uncompressedAudioType = nullptr; HR(sourceReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, uncompressedAudioType.GetAddressOf())); WAVEFORMATEXTENSIBLE d; WAVEFORMATEX * waveformatex; unsigned int waveformatlength; HR(MFCreateWaveFormatExFromMFMediaType(uncompressedAudioType.Get(), &waveformatex, &waveformatlength)); return 0; }
lastly we synchronously read all the audio from the file and store them in a vector<byte>.
NOTE: In production software you would definitely not want to synchronously read bytes into memory. This is only meant for this example
int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr<IMFAttributes> sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr<IMFSourceReader> sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr<IMFMediaType> nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr<IMFMediaType> partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } ComPtr<IMFMediaType> uncompressedAudioType = nullptr; HR(sourceReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, uncompressedAudioType.GetAddressOf())); WAVEFORMATEXTENSIBLE d; WAVEFORMATEX * waveformatex; unsigned int waveformatlength; HR(MFCreateWaveFormatExFromMFMediaType(uncompressedAudioType.Get(), &waveformatex, &waveformatlength)); std::vector<BYTE> bytes; // Get Sample ComPtr<IMFSample> sample; while (true) { DWORD flags{}; HR(sourceReader->ReadSample(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nullptr, &flags, nullptr, sample.GetAddressOf())); // Check for eof if (flags & MF_SOURCE_READERF_ENDOFSTREAM) { break; } // Convert data to contiguous buffer ComPtr<IMFMediaBuffer> buffer; HR(sample->ConvertToContiguousBuffer(buffer.GetAddressOf())); // Lock Buffer & copy to local memory BYTE* audioData = nullptr; DWORD audioDataLength{}; HR(buffer->Lock(&audioData, nullptr, &audioDataLength)); for (size_t i = 0; i < audioDataLength; i++) { bytes.push_back(*(audioData + i)); } // Unlock Buffer HR(buffer->Unlock()); } return 0; }
Now that we have the WAVEFORMATEX object and vector<byte> of our audio file we are reading to send it to XAudio2 for playback!
int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr<IMFAttributes> sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr<IMFSourceReader> sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr<IMFMediaType> nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr<IMFMediaType> partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } ComPtr<IMFMediaType> uncompressedAudioType = nullptr; HR(sourceReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, uncompressedAudioType.GetAddressOf())); WAVEFORMATEXTENSIBLE d; WAVEFORMATEX * waveformatex; unsigned int waveformatlength; HR(MFCreateWaveFormatExFromMFMediaType(uncompressedAudioType.Get(), &waveformatex, &waveformatlength)); std::vector<BYTE> bytes; // Get Sample ComPtr<IMFSample> sample; while (true) { DWORD flags{}; HR(sourceReader->ReadSample(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nullptr, &flags, nullptr, sample.GetAddressOf())); // Check for eof if (flags & MF_SOURCE_READERF_ENDOFSTREAM) { break; } // Convert data to contiguous buffer ComPtr<IMFMediaBuffer> buffer; HR(sample->ConvertToContiguousBuffer(buffer.GetAddressOf())); // Lock Buffer & copy to local memory BYTE* audioData = nullptr; DWORD audioDataLength{}; HR(buffer->Lock(&audioData, nullptr, &audioDataLength)); for (size_t i = 0; i < audioDataLength; i++) { bytes.push_back(*(audioData + i)); } // Unlock Buffer HR(buffer->Unlock()); } // Create XAudio2 stuff auto xAudioEngine = CreateXAudioEngine(); auto masteringVoice = CreateMasteringVoice(xAudioEngine); auto sourceVoice = CreateSourceVoice(xAudioEngine, *waveformatex); XAUDIO2_BUFFER xAudioBuffer{}; xAudioBuffer.AudioBytes = bytes.size(); xAudioBuffer.pAudioData = (BYTE* const)&bytes[0]; xAudioBuffer.pContext = nullptr; sourceVoice->Start(); HR(sourceVoice->SubmitSourceBuffer(&xAudioBuffer)); // Sleep for some time to hear to song by preventing the main thread from sleep // XAudio2 plays the sound on a seperate audio thread <img src="http://tragiccode.com/wp-includes/images/smilies/simple-smile.png" alt=":)" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Sleep(1000000); return 0; }
And There you have it. Not too bad if you ask me!