About two years ago, I wrote an article titled the "
The Sorry State of Sound in Linux", hoping to get some sound issues in Linux fixed. Now two years later a lot has changed, and it's time to take another look at the state of sound in Linux today.
A quick summary of the last article for those that didn't read it:
- Sound in Linux has an interesting history, and historically lacked sound mixing on hardware that was more software based than hardware.
- Many sound servers were created to solve the mixing issue.
- Many libraries were created to solve multiple back-end issues.
- ALSA replaced OSS version 3 in the Kernel source, attempting to fix existing issues.
- There was a closed source OSS update which was superb.
- Linux distributions have been removing OSS support from applications in favor of ALSA.
- Average sound developer prefers a simple API.
- Portability is a good thing.
- Users are having issues in certain scenarios.
Now much has changed, namely:
- OSS is now free and open source once again.
- PulseAudio has become widespread.
- Existing libraries have been improved.
- New Linux Distributions have been released, and some existing ones have attempted an overhaul of their entire sound stack to improve users' experience.
- People read the last article, and have more knowledge than before, and in some cases, have become more opinionated than before.
- I personally have looked much closer at the issue to provide even more relevant information.
Let's take a closer look at the pros and cons of OSS and ALSA as they are, not five years ago, not last year, not last month, but as they are
today.
First off, ALSA.
ALSA consists of three components. First part is drivers in the Kernel with an API exposed for the other two components to communicate with. Second part is a sound developer API to allow developers to create programs which communicate with ALSA. Third part is a sound mixing component which can be placed between the other two to allow multiple programs using the ALSA API to output sound simultaneously.
To help make sense of the above, here is a diagram:
Note, the diagrams presented in this article are made by myself, a very bad artist, and I don't plan to win any awards for them. Also they may not be 100% absolutely accurate down to the last detail, but accurate enough to give the average user an idea of what is going on behind the scenes.
A sound developer who wishes to output sound in their application can take any of the following routes with ALSA:
- Output using ALSA API directly to ALSA's Kernel API (when sound mixing is disabled)
- Output using ALSA API to sound mixer, which outputs to ALSA's Kernel API (when sound mixing is enabled)
- Output using OSS version 3 API directly to ALSA's Kernel API
- Output using a wrapper API which outputs using any of the above 3 methods
As can be seen, ALSA is quite flexible, has sound mixing which OSSv3 lacked, but still provides legacy OSSv3 support for older programs. It also offers the option of disabling sound mixing in cases where the sound mixing reduced quality in any way, or introduced latency which the end user may not want at a particular time.
Two points should be clear, ALSA has optional sound mixing outside the Kernel, and the path ALSA's OSS legacy API takes lacks sound mixing.
An obvious con should be seen here, ALSA which was initially designed to fix the sound mixing issue at a lower and more direct level than a sound server doesn't work for "older" programs.
Obvious pros are that ALSA is free, open source, has sound mixing, can work with multiple sound cards (all of which OSS lacked during much of version 3's lifespan), and included as part of the Kernel source, and tries to cater to old and new programs alike.
The less obvious cons are that ALSA is Linux only, it doesn't exist on FreeBSD or Solaris, or Mac OS X or Windows. Also, the average developer finds ALSA's native API too hard to work with, but that is debatable.
Now let's take a look at OSS today. OSS is currently at version 4, and is a completely different beast than OSSv3 was.
Where OSSv3 went closed source, OSSv4 is open sourced today, under GPL, 3 clause BSD, and CDDL.
While a decade old OSS was included in the Linux Kernel source, the new greatly improved OSSv4 is not, and thus may be a bit harder for the average user to try out. Older OSSv3 lacked sound mixing and support for multiple sound cards, OSSv4 does not. Most people who discuss OSS or try OSS to see how it stacks up against ALSA unfortunately are referring to, or are testing out the one that is a decade old, providing a distortion of the facts as they are today.
Here's a diagram of OSSv4:
A sound developer wishing to output sound has the following routes on OSSv4:
- Output using OSS API right into the Kernel with sound mixing
- Output using ALSA API to the OSS API with sound mixing
- Output using a wrapper API to any of the above methods
Unlike in ALSA, when using OSSv4, the end user
always has sound mixing. Also because sound mixing is running in the Kernel itself, it doesn't suffer from the latency ALSA generally has.
Although OSSv4 does offer their own ALSA emulation layer, it's pretty bad, and I haven't found a single ALSA program which is able to output via it properly. However, this isn't an issue, since as mentioned above, ALSA's own sound developer API can output to OSS, providing perfect compatibility with ALSA applications today. You can read more about how to set that up in one of my
recent articles.
ALSA's own library is able to do this, because it's actually structured as follows:
As you can see, it can output to either OSS or ALSA Kernel back-ends (other back-ends too which will be discussed lower down).
Since both OSS and ALSA based programs can use an OSS or ALSA Kernel back-end, the differences between the two are quite subtle (note, we're not discussing OSSv3 here), and boils down to what I know from research and testing, and is not immediately obvious.
OSS always has sound mixing, ALSA does not.
OSS sound mixing is of higher quality than ALSA's, due to OSS using more precise math in its sound mixing.
OSS has less latency compared to ASLA when mixing sound due to everything running within the Linux Kernel.
OSS offers per application volume control, ALSA does not.
ALSA can have the Operating System go into suspend mode when sound was playing and come out of it with sound still playing, OSS on the other hand needs the application to restart sound.
OSS is the only option for certain sound cards, as ALSA drivers for a particular card are either really bad or non existent.
ALSA is the only option for certain sound cards, as OSS drivers for a particular card are either really bad or non existent.
ALSA is included in Linux itself and is easy to get ahold of, OSS (v4) is not.
Now the question is where does the average user fall in the above categories? If the user has a sound card which only works (well) with one or the other, then obviously they should use the one that works properly. Of course a user may want to try both to see if one performs better than the other one.
If the user really needs to have a program output sound right until Linux goes into suspend mode, and then continues where it left off when resuming, then ALSA is (currently) the only option. I personally don't find this to be a problem, and furthermore I doubt it's a large percentage of users that even use suspend in Linux. Suspend in general in Linux isn't great, due to some rogue piece of hardware like a network or video card which screws it up.
If the user doesn't want a hassle, ALSA also seems the obvious choice, as it's shipped directly with the Linux Kernel, so it's much easier for the user to use a modern ALSA than it is a modern OSS. However it should be up to the Linux Distribution to handle these situations, and to the end user, switching from one to the other should be seamless and transparent. More on this later.
Yet we also see due to better sound mixing and latency when sound mixing is involved, that OSS is the better choice, as long as none of the above issues are present. But the better mixing is generally only noticed at higher volume levels, or rare cases, and latency as I'm referring to is generally only a problem if you play heavy duty games, and not a problem if you just want to listen to some music or watch a video.
But wait this is all about the back-end, what about the whole developer API issue?
Many people like to point fingers at the various APIs (I myself did too to some extent in my previous article). But they really don't get it. First off, this is how your average sound wrapper API works:
The program outputs sound using a wrapper, such as OpenAL, SDL, or libao, and then sound goes to the appropriate high level or low level back-end, and the user doesn't have to worry about it.
Since the back-ends can be various Operating Systems sound APIs, they allow a developer to write a program which has sound on Windows, Mac OS X, Linux, and more pretty easily.
Some like
Adobe like to say how this is some kind of problem, and makes it impossible to output sound in Linux. Nothing could be further from the truth.
Graphs like these are very misleading. OpenAL, SDL, libao, GStreamer, NAS, Allegro, and more all exist on Windows too. I don't see anyone complaining there.
I can make a similar diagram for Windows:
This above diagram is by no means complete, as there's XAudio, other wrapper libs, and even some Windows only sound libraries which I've forgotten the name of.
This by no means bothers anybody, and should not be made an issue.
In terms of usage, the libraries stack up as follows:
OpenAL - Powerful, tricky to use, great for "3D audio". I personally was able to get a lot done by following a couple of example and only spent an hour or two adding sound to an application.
SDL - Simplistic, uses a callback API, decent if it fits your program design. I personally was able to add sound to an application in half an hour with SDL, although I don't think it fits every case load.
libao - Very simplistic, incredibly easy to use, although problematic if you need your application to not do sound blocking. I added sound to a multitude of applications using libao in a matter of minutes. I just think it's a bit more annoying to do if you need to give your program its own sound thread, so again depends on the case load.
I haven't played with the other sound wrappers, so I can't comment on them, but the same ideas are played out with each and every one.
Then of course there's the actual OSS and ALSA APIs on Linux. Now why would anyone use them when there are lovely wrappers that are more portable, customized to match any particular case load? In the average case, this is in fact true, and there is no reason to use OSS or ALSA's API to output sound. In some cases, using a wrapper API can add latency which you may not want, and you don't need any of the advantages of using a wrapper API.
Here's a breakdown of how OSS and ALSA's APIs stack up.
OSSv3 - Easy to use, most developers I spoke to like it, exists on every UNIX but Mac OS X. I added sound to applications using OSSv3 in 10 minutes.
OSSv4 - Mostly backwards compatible with v3, even easier to use, exists on every UNIX except Mac OS X and Linux when using the ALSA back-end, has sound re-sampling, and AC3 decoding out of the box. I added sound to several applications using OSSv4 in 10 minutes each.
ALSA - Hard to use, most developers I spoke to dislike it, poorly documented, not available anywhere but Linux. Some developers however prefer it, as they feel it gives them more flexibility than the OSS API. I personally spent 3 hours trying to make heads or tails out of the documentation and add sound to an application. Then I found sound only worked on the machine I was developing on, and had to spend another hour going over the docs and tweaking my code to get it working on both machines I was testing on at the time. Finally, I released my application with the ALSA back-end, to find several people complaining about no sound, and started receiving patches from several developers. Many of those patches fixed sound on their machine, but broke sound on one of my machines. Here we are a year later, and my application after many hours wasted by several developers, ALSA now seems to output sound decently on all machines tested, but I sure don't trust it. We as developers don't need these kinds of issues. Of course, you're free to disagree, and even cite examples how you figured out the documentation, added sound quickly, and have it work flawlessly everywhere by everyone who tested your application. I must just be stupid.
Now I
previously thought the OSS vs. ALSA API issue was significant to end users, in so far as what they're locked into, but really it only matters to developers. The main issue is though, if I want to take advantage of all the extra features that OSSv4's API has to offer (and I do), I have to use the OSS back-end. Users however don't have to care about this one, unless they use programs which take advantage of these features, which there are few of.
However regarding wrapper APIs, I did find a few interesting results when testing them in a variety of programs.
App -> libao -> OSS API -> OSS Back-end - Good sound, low latency.
App -> libao -> OSS API -> ALSA Back-end - Good sound, minor latency.
App -> libao -> ALSA API -> OSS Back-end - Good sound, low latency.
App -> libao -> ALSA API -> ALSA Back-end - Bad sound, horrible latency.
App -> SDL -> OSS API -> OSS Back-end - Good sound, really low latency.
App -> SDL -> OSS API -> ALSA Back-end - Good sound, minor latency.
App -> SDL -> ALSA API -> OSS Back-end - Good sound, low latency.
App -> SDL -> ALSA API -> ALSA Back-end - Good sound, minor latency.
App -> OpenAL -> OSS API -> OSS Back-end - Great sound, really low latency.
App -> OpenAL -> OSS API -> ALSA Back-end - Adequate sound, bad latency.
App -> OpenAL -> ALSA API -> OSS Back-end - Bad sound, bad latency.
App -> OpenAL -> ALSA API -> ALSA Back-end - Adequate sound, bad latency.
App -> OSS API -> OSS Back-end - Great sound, really low latency.
App -> OSS API -> ALSA Back-end - Good sound, minor latency.
App -> ALSA API -> OSS Back-end - Great sound, low latency.
App -> ALSA API -> ALSA Back-end - Good sound, bad latency.
If you're having a hard time trying to wrap your head around the above chart, here's a summary:
- OSS back-end always has good sound, except when using OpenAL->ALSA to output to it.
- ALSA generally sounds better when using the OSS API, and has lower latency (generally because that avoids any sound mixing as per an earlier diagram).
- OSS related technology is generally the way to go for best sound.
But wait, where do sound servers fit in?
Sounds servers were initially created to deal with problems caused by OSSv3 which currently are non existent, namely sound mixing. The sound server stack today looks something like this:
As should be obvious, these sound servers today do nothing except add latency, and should be done away with. KDE 4 has moved away from the aRts sound server, and instead uses a wrapper API known as Phonon, which can deal with a variety of back-ends (which some in themselves can go through a particular sound server if need be).
However as mentioned above, ALSA's mixing is not of the same high quality as OSS's is, and ALSA also lacks some nice features such as per application volume control.
Now one could turn off ALSA's low quality mixer, or have an application do it's own volume control internally via modifying the sound wave its outputting, but these choices aren't friendly towards users or developers.
Seeing this, Fedora and Ubuntu has both stepped in with a so called state of the art sound server known as PulseAudio.
If you remember this:
As you can see, ALSA's API can also output to PulseAudio, meaning programs written using ALSA's API can output to PulseAudio and use PulseAudio's higher quality sound mixer seamlessly without requiring the modification of old programs. PulseAudio is also able to send sound to another PulseAudio server on the network to output sound remotely. PulseAudio's stack is something like this:
As you can see it looks very complex, and a 100% accurate breakdown of PulseAudio is even more complex.
Thanks to PulseAudio being so advanced, most of the wrapper APIs can output to it, and Fedora and Ubuntu ship with all that set up for the end user, it can in some cases also receive sound written for another sound server such as ESD, without requiring ESD to run on top of it. It also means that many programs are now going through
many layers before they reach the sound card.
Some have seen PulseAudio as the new Voodoo which is our new savior, sound written to any particular API can be output via it, and it has great mixing to boot.
Except many users who play games for example are crying that this adds a
TREMENDOUS amount of latency, and is very noticeable even in not so high-end games. Users don't like hearing enemies explode a full 3 seconds after they saw the enemy explode on screen. Don't let anyone kid you, there's no way a sound server, especially with this level of bloat and complexity ever work with anything approaching low latency acceptable for games.
Compare the insanity that is PulseAudio with this:
Which do you think looks like a better sound stack, considering that their sound mixing, per application volume control, compatibility with applications, and other features are on par?
And yes, lets not forget the applications. I'm frequently told about how some application is written to use a particular API, therefore either OSS or ALSA need to be the back-end they use. However as explained above, either API can be used on either back-end. If
setup right, you don't have to have a lack of sound using newer version of Flash when using the OSS back-end.
So where are we today exactly?
The biggest issues I find is that the Distributions simply aren't setup to make the
choice easy on the users. Debian and derivatives provide a Linux sound base package to select whether you want OSS or ALSA to be your back-end, except it really doesn't do anything. Here's what we do need from such a package:
- On selecting OSS, it should install the latest OSS package, as well as ALSA's ALSA API->OSS back-end interface, and set it up.
- Minimally configure an installed OpenAL to use OSS back-end, and preferably SDL, libao, and other wrapper libraries as well.
- Recognize the setting when installing a new application or wrapper library and configure that to use OSS as well.
- Do all the above in reverse when selecting ALSA instead.
Such a setup would allow users to easily switch between them if their sound card only worked with the one which wasn't the distribution's default. It would also easily allow users to objectively test which one works better for them if they care to, and desire to use the best possible setup they can. Users should be given this capability. I personally believe OSS is superior, but we should leave the choice up to the user if they don't like whichever is the system default.
Now I repeatedly hear the claim: "But, but, OSS was taken out of the Linux Kernel source, it's never going to be merged back in!"
Let's analyze that objectively. Does it matter what is included in the default Linux Kernel? Can we not use VirtualBox instead of KVM when KVM is part of the Linux Kernel and VirtualBox isn't? Can we not use KDE or GNOME when neither of them are part of the Linux Kernel?
What matters in the end is what the distributions support, not what's built in. Who cares what's built in? The only difference is that the Kernel developers themselves won't maintain anything not officially part of the Kernel, but that's the precise jobs that the various distributions fill, ensuring their Kernel modules and related packages work shortly after each new Kernel comes out.
Anyways, a few closing points.
I believe OSS is the superior solution over ALSA, although your mileage may vary. It'd be nice if OSS and ALSA just shared
all their drivers, not having an issue where one has support for one sound card, but not the other.
OSS should get suspend support and anything else it lacks in comparison to ALSA even if insignificant. Here's a hint, why doesn't Ubuntu hire the OSS author and get it more friendly in these last few cases for the end user?
He is currently looking for a job. Also throw some people at it to improve the existing volume controlling widgets to be friendlier with the new OSSv4, and maybe get stuff like HAL to recognize OSSv4 out of the box.
Problems should be fixed directly, not in a roundabout matter as is done with PulseAudio, that garbage needs to go. If users need remote sound (and few do), one should just be easily able to map /dev/dsp over NFS, and output everything to OSS that way, achieving network transparency on the file level as UNIX was designed for (everything is a file), instead of all these non UNIX hacks in place today in regards to sound.
The distributions really need to get their act together. Although in recent times
Draco Linux has come out which is OSS only, and
Arch Linux seems to treat OSSv4 as a full fledged citizen to the end user, giving them choice, although I'm told they're both bad in the the ALSA compatibility department not setting it up properly for the end user, and in the case of Arch Linux, requiring the user to modify the config files of each application/library that uses sound.
OSS is portable thanks to its OS abstraction API, being more relevant to the UNIX world as a whole, unlike ALSA. FreeBSD however uses their own take on OSS to avoid the abstraction API, but it's still mostly compatible, and one can install the official OSSv4 on FreeBSD if they so desire.
Sound in Linux really doesn't have to be that sorry after all, the distributions just have to get their act together, and stop with all the finger pointing, propaganda, and FUD that is going around, which is only relevant to ancient versions of OSS, if not downright irrelevant or untrue. Let's stop the madness being perpetrated by the likes of Adobe, PulseAudio propaganda machine, and whoever else out there. Let's be objective and use the best solutions instead of settling for mediocrity or hack upon hack.