Showing posts with label Mac OS X. Show all posts
Showing posts with label Mac OS X. Show all posts

Wednesday, December 14, 2011


Progression and Regression of Desktop User Interfaces



As this Gregorian year comes to a close, with various new interfaces out now, and some new ones on the horizon, I decided to recap my personal experiences with user interfaces on the desktop, and see what the future will bring.

When I was younger, there were a few desktop environments floating around, and I've seen a couple of them at school or a friend's house. But the first one I had on my own personal computer, and really played around with was Windows 3.

Windows 3 came with something called Program Manager. Here's what it looked like:




The basic idea was that you had a "start screen", where you had your various applications grouped by their general category. Within each of these "groups", you had shortcuts to the specific applications. Now certain apps like a calculator you only used in a small window, but most serious apps were only used maximized. If you wanted to switch to another running app, you either pressed the alt+tab keyboard shortcut to cycle through them, or you minimized everything, where you then saw a screen listing all the currently running applications.

Microsoft also shipped a very nice file manager, known as "File Manager", which alone made it useful to use Windows. It was rather primitive though, and various companies released various add-ons for it to greatly enhance its abilities. I particularly loved Wiz File Manager Pro.



Various companies also made add-ons for Program Manager, such as to allow nested groups, or shortcuts directly on the main screen outside of a group, or the ability to change the icons of selected groups. Microsoft would've probably built some of these add-ons in if it continued development of Program Manager.

Now not everyone used Windows all the time back then, but only selectively started it up when they wanted something from it. I personally did everything in DOS unless I wanted to use a particular Windows app, such as file management or painting, or copying and pasting stuff around. Using Windows all the time could be annoying as it slowed down some DOS apps, or made some of them not start at all due to lack of memory and other issues.

In the summer of 1995, Microsoft released Windows 4 to the world. It came bundled with MS-DOS 7, and provided a whole new user experience.


Now the back-most window, the "desktop", no longer contained a list of the running programs (in explorer.exe mode), but rather you could put your own shortcuts and groups and groups of groups there. Rather running programs would appear at the bottom of the screen in a "taskbar". The taskbar now also contained a clock, and a "start menu", to launch applications. Some always running applications which were meant to be background tasks appeared as a tiny icon next to the clock in an area known as a "tray".

This was a major step forward in usability. Now, no matter which application you were currently using, you could see all the ones that were currently running on the edge of your screen. You could also easily click on one of them to switch to it. You didn't need to minimize all to see them anymore. Thanks to the start menu, you could also easily launch all your existing programs without needing to minimize all back down to Program Manager. The clock always being visible was also a nice touch.

Now when this came out, I could appreciate these improvements, but at the same time, I also hated it. A lot of us were using 640x480 resolutions on 14" screens back then. Having something steal screen space was extremely annoying. Also with how little memory systems had back at the time (4 or 8 MB of RAM), you generally weren't running more than 2 or 3 applications at a time and could not really appreciate the benefits of having an always present taskbar. Some people played with taskbar auto-hide because of this.

The start menu was also a tad ridiculous. Lots of clicks were needed to get anywhere. The default appearance also had too many useless things on it.

Did anyone actually use help? Did most people launch things from documents? Microsoft released a nice collection of utilities called "PowerToys" which contained "TweakUI" which you could use to make things work closer to how you want.

The default group programs installed to within the start menu was quite messy though. Software installers would not organize their shortcuts into the groups that Windows came with, but each installed their apps into their own unique group. Having 50 submenus pop out was rather unwieldy, and I personally organized each app after I installed it. Grouping into "System Tools", "Games", "Internet Related Applications", and so on. It was annoying to manually do all this though, as when removing an app, you had to remove its group manually. On upgrades, one would also have to readjust things each time too.

Windows 4 also came with the well known Windows Explorer file manager to replace the older one. It was across the board better than the vanilla version of File Manager that shipped with Windows 3.

I personally started dual booting MS-DOS 6.22 + Windows for Workgroups 3.11 and Windows 95 (and later tri-booted with OS/2 Warp). Basically I used DOS and Windows for pretty much everything, and Windows 95 for those apps that required it. Although I managed to get most apps to work with Windows 3 using Win32s.

As I got a larger screen and more RAM though, I finally started to appreciate what Windows 4 offered, and started to use it almost exclusively. I still exited Windows into DOS 7 though for games that needed to use more RAM, or ran quicker that way on our dinky processors from back then.

Then Windows 4.10 / Internet Explorer 4 came out which offered a couple of improvements. First was "quick launch" which allowed you to put shortcuts directly on your taskbar. You could also make more than one taskbar and put all your shortcuts on it. I personally loved this feature, I put one taskbar on top of my screen, and loaded it with shortcuts to all my common applications, and had one on the bottom for classical use. Now I only had to dive into the start menu for applications I rarely used.

It also offered a feature called "active desktop" which made the background of the desktop not just an image, but a web page. I initially loved the feature, as I edited my own page, and stuck in an input line which I would use to launch a web browser to my favorite search engine at the time (which changed monthly) with my text already searched for. After a while active desktop got annoying though, as every time IE crashed, it disabled it, and you had to deal with extra error messages, and go turn it on manually.

By default this new version also made every Windows Explorer window have this huge sidebar stealing your precious screen space. Thankfully though, you could turn it off.

All in all though, as our CPUs got faster, RAM became cheaper, and large screens more available, this interface was simply fantastic. I stopped booting into other OSs, or exiting Windows itself.

Then Windows 5 came out for consumers, and UI wise, there weren't really any significant changes. The default look used these oversized bars and buttons on each window, but one could easily turn that off. The start menu got a bit of a rearrangement to now feature your most used programs up front, and various actions got pushed off to the side. Since I already put my most used programs on my quick launch on top, this start menu was a complete waste of space. Thankfully, it could also be turned off. I still kept using Windows 98 though, as I didn't see any need for this new Windows XP, and it was just a memory hog in comparison at the time.

What was more interesting to me however was that at work, all our machines ran Linux with GNOME and KDE. When I first started working there, they made me take a course on how to use Emacs, as every programmer needs a text editor. I was greatly annoyed by the thing however, where was my shift highlight with shift+delete and shift+insert or ctrl+x and ctrl+v cut and paste? Thankfully though I soon found gedit and kedit which was like edit/notepad/wordpad but for Linux.

Now I personally don't use a lot of windowsy type software often. My primary usage of a desktop consists of using a text editor, calculator, file manager, console/terminal/prompt, hex editor, paint, cd/dvd burner, and web browser. Only rarely do I launch anything else. Perhaps a PDF, CHM, or DJVU reader when I need to read something.

After using Linux with GNOME/KDE at work for a while, I started to bring more of my work home with me and wanted these things installed on my own personal computer. So dual booting Windows 98 + Linux was the way to go. I started trying to tweak my desktop a bit, and found that KDE was way more capable than GNOME, as were most of their similar apps that I was using. KDE basically offered me everything that I loved about Windows 98, but on steroids. KWrite/KATE was notepad/wordpad but on steroids. The syntax highlighting was absolutely superb. KCalc was a fine calc.exe replacement. Konqueror made Windows Explorer seem like a joke in comparison.

Konqueror offered network transparency, thumbnails of everything rather quickly, even of text files (no pesky thumbs.db either!). An info list view which was downright amazing:
This is a must have feature here. Too often are documents distributed under meaningless file names. With this, and many other small features, nothing else I've seen has even come close to Konqueror in terms of file management.

Konsole was just like my handy DOS Prompt, except with tab support, and better maximizing, and copy and paste support. KHexEdit was simply better than anything I had for free on Windows. KolourPaint is mspaint.exe with support for way more image formats. K3b was also head and shoulders above Easy CD Creator or Nero Burning ROM. For Web Browsers, I was already using Mozilla anyway on Windows, and had the same option on Linux too.

For the basic UI, not only did KDE have everything I liked about Windows, it came with an organized start menu. Which also popped out directly, instead from a "programs" submenu.

The taskbar was also enhanced that I could stick various applets on it. I could stick volume control directly on it. Not a button which popped out a slider, the sliders themselves could appear on the taskbar. Now for example, I could easily adjust my microphone volume directly, without popping up or clicking on anything extra. There was an eyedropper which you could just push to find out the HTML color of anything appearing on the screen - great for web developers. Another thing which I absolutely love, I can see all my removable devices listed directly on my taskbar. If I have a disk in my drive, I see an icon for that drive appearing directly on my taskbar, and I can browse it, burn to it, eject it, whatever. With this, everything I need is basically at my finger tips.

Before long I found myself using Linux/KDE all the time. On newer machines I started to dual boot Windows XP with Linux/KDE, so I could play the occasional Windows game when I wanted to, but for real work, I'd be using Linux.

Then KDE 4 comes out, and basically half the stuff I loved about KDE was removed. No longer is it Windows on steroids. Now KDE 4.0 was not intended for public consumption. Somehow all the distros except for Debian seemed to miss that. Everyone wants to blame KDE for miscommunicating this, but it's quite clear 4.0 was only for developers if you watched the KDE 4 release videos. Any responsible package maintainer with a brain in their skull should've also realized that 4.0 was not ready for prime time. Yet it seems the people managing most distros are idiots that just need the "latest" version of everything, ignoring if it's better or not, or even stable.

At the time, all these users were upset, and all started switching to GNOME. I don't know why anyone who truly loved the power KDE gave you would do that. If KDE 3 > GNOME 2 > KDE 4, why did people migrate to GNOME 2 when they could just not "upgrade" from KDE 3? It seems to me that people never really appreciated what KDE offers in the first place if they bothered moving to GNOME instead of keeping what was awesome.

Nowadays people tell me that KDE 4 has feature parity with KDE 3, but I have no idea what they're talking about. The Konqueror info list feature that I described above still doesn't seem to exist in KDE 4. You can no longer have applets directly on your taskbar. Now I have to click a button to pop up a list of my devices, and only then can I do something with them. No way to quickly glance to see what is currently plugged in. Konsole's tabs now stretch to the full width of your screen for some pointless reason. If you want to switch between tabs with your mouse, prepare for carpal tunnel syndrome. Who thought that icons should grow if they can? It's exactly like those idiotic theoretical professors who spout that CPU usage must be maximized at all times, and therefore use algorithms that schedule things for maximum power draw, despite that in normal cases performance does not improve by using these algorithms. I'd rather pack in more data if possible, having multiple columns of information instead of just huge icons.

KHexEdit has also been completely destroyed. No longer is the column count locked to hex (16). I can't imagine anyone who seriously uses a hex editor designed the new version. For some reason, KDE now also has to act like your mouse only has one button, and right click context menus vanished from all over the place. It's like the entire KDE development team got invaded by Mac and GNOME users who are too stupid to deal with anything other than a big button in the middle of the screen.

Over in the Windows world. Windows 6 (with 6.1 comically being consumerized as "Windows 7") came out with a bit of a revamp. The new start menu seems to fail basic quality assurance tests for anything other than default settings. Try to set things to offer a classic start menu, this is what you get:


If you use extra large icons for your grandparents, you also find that half of the Control Panel is now inaccessible. Ever since Bill Gates left, it seems Windows is going down the drain.

But hardly are problems solely for KDE and Windows, GNOME 3 is also a major step back according to what most people tell me. Many of these users are now migrating to XFCE. If you like GNOME 2, why are you migrating to something else for? And what is it with people trying to fix what isn't broken? If you want to offer an alternate interface, great, but why break or remove the one you already have?

Now a new version of Windows is coming out with a new interface being called "Metro". They should really be calling it "Retro". It's Windows 3 Program Manager with a bunch of those third party add-ons, with a more modern look to it. Gone is the Windows 4+ taskbar so you can see what was running, and easily switch applications via mouse. Now you'll need to press the equivalent of a minimize all to get to the actual desktop. Another type of minimize to get back to Program Manager to launch something else, or "start screen" as they're now calling it.

So say goodbye to all the usability and productivity advantages Windows 4 offered us, they want to take us back to the "dark ages" of modern computing. Sure a taskbar-less interface makes sense on handheld devices with tiny screens or low resolution, but on modern 19"+ screens? The old Windows desktop+taskbar in the upcoming version of Windows is now just another app in their Metro UI. So "Metro" apps won't appear on the classic taskbar, and "classic" applications won't appear on the Metro desktop where running applications are listed.

I'm amazed at how self destructive the entire market became over the last few years. I'm not even sure who to blame, but someone has to do something about it. It's nice to see that a small group of people took KDE 3.5, and are continuing to develop it, but they're rebranding everything with crosses everywhere and calling it "Trinity" desktop. Just what we need, to bring religious issues now into desktop environments. What next? The political desktop?

Thursday, June 3, 2010


Undocumented APIs



This topic has been kicked around all over for at least a decade. It's kind of hard to read a computer programming magazine, third party documentation, technology article site, third party developer mailing list or the like without running into it.

Inevitably someone always mentions how some Operating System has undocumented APIs, and the creators of the OS are using those undocumented APIs, but telling third party developers not to use them. There will always be the rant about unfair competition, hypocrisy, and other evils.

Large targets are Microsoft and Apple, for creating undocumented APIs that are used by Internet Explorer, Microsoft Office, Safari, iPhone development, and other in-house applications. To the detriment of Mozilla, OpenOffice.org, third parties, and hobbyist developer teams.

If you wrote one of those articles/rants/complaints, or agreed with them, I'm asking you to turn in your Geek Membership card now. You're too stupid, too idiotic, too dumb, too nearsighted, and too vilifying (add other derogatory adjectives you can think of) to be part of our club. You have no business writing code or even discussing programming.

Operating Systems and other applications that communicate via APIs provide public interfaces that are supposed to be stable. If you find a function in official public documentation, that generally signifies it is safe to use the function in question, and the company will support it for the foreseeable future versions. If a function is undocumented, it's because the company doesn't think they will be supporting the function long term.

Now new versions which provide even the exact same API inevitably break things. It is so much worse for APIs which you're not supposed to be using in the first place. Yet people who complain about undocumented APIs, also complain when the new version removes a function they were using. They blame the company for purposely removing features they knew competition or hobbyists were using. That's where the real hypocrisy is.

As much as you hate a new version breaking your existing software, so does Microsoft, Apple, and others. The popularity of their products is tied to what third party applications exist. Having popular third party applications break on new versions is a bad thing. That's why they don't document APIs they plan on changing.

Now why do they use them then? Simple, you can write better software when you intimately understand the platform you're developing for, and write the most direct code possible. When issues occur because of their use, it's easy for the in-house teams to test their own software for compatibility with upcoming versions and fix them. If Microsoft sees a new version of Windows break Internet Explorer, they can easily patch the Internet Explorer provided with the upcoming version, or provide a new version altogether. But don't expect them to go patching third party applications, especially ones which they don't have the source to. They may have done so in the past, and introduced compatibility hacks, but this really isn't their problem, and you should be grateful when they go out of their way to do so.

So quit your complaining about undocumented APIs, or please go around introducing yourself as "The Dumb One".

Friday, November 6, 2009


FatELF Dead?



A while back, someone came up with a project called FatELF. I won't go into the exact details of all its trying to accomplish, but the basic idea was that like Mac OS X has universal binaries using the Mach-o object format which can run on multiple architectures, the same should be possible with software for Linux and FreeBSD, which use the ELF object format.

The creators of FatELF cite many different reasons why FatELF is a good idea, which most of us probably disagree with. But I found it could solve a pretty crucial issue today.

The x86 line of processors which is what everyone uses for their home PCs recently switched from 32-bit to 64-bit. 64-bit x86 known as x86-64 is backwards compatible with the old architecture. However programs written for the new one generally run faster.

x86-64 CPUs contain more registers than traditional x86-32 ones, so a CPU can juggle more data internally without offloading it to much slower RAM. Also, most distributions offered precompiled binaries designed for a very low common denominator, generally a 4x86 or the original Pentium. Programs compiled for these older processors can't take advantage of much of the improvements that have been done to the x86 line in the past 15 years. A distribution which targets the lowest common denominator for x86-64 on the other hand is targeting a much newer architecture, where every chip already contains MMX, SSE, similar technologies, and other general enhancements.

Installing a distribution geared for x86-64 can mean a much better computing experience for the most part. Except certain programs unfortunately are not yet 64 bit ready, or are closed source and can't be easily recompiled. In the past year or two, a lot of popular proprietary software were ported by their companies to x86-64, but some which are important for business fail completely under x86-64, such as Cisco's Webex.

x86-32 binaries can run on x86-64, provided all the libraries it needs are available on the system. However, many distributions don't provide x86-32 libraries on their x86-64 platform, or they provide only a couple, or provide ones which simply don't work.

All these issues could be fixed if FatELF was supported by the operating system. A distribution could provide an x86-64 platform, with all the major libraries containing both 32 and 64 bit versions within. Things like GTK, Qt, cURL, SDL, libao, OpenAL, and etc. We wouldn't have to worry about one of these libraries conflicting when installing two variations, or simply missing from the system.

It would make it easier on those on an x86-64 bit platform knowing they can run any binary they get elsewhere without a headache. It would also ease deployment issues for those that don't do anything special to take advantage of x86-64, and just want to pass out a single version of their software.

I as a developer have to have an x86-32 chroot on all my development systems to make sure I can produce 32 bit binaries properly, which is also a hassle. All too often I have to jump back and forth between a 32 bit shell to compile the code, and a 64 bit shell where I have the rest of my software needed to analyze it, and commit it.

But unfortunately, it now seems FatELF is dead, or on its way.

I wish we could find a fully working solution to the 32 on 64 bit problem that crops up today.

Thursday, June 18, 2009


State of sound in Linux not so sorry after all



About two years ago, I wrote an article titled the "The Sorry State of Sound in Linux", hoping to get some sound issues in Linux fixed. Now two years later a lot has changed, and it's time to take another look at the state of sound in Linux today.


A quick summary of the last article for those that didn't read it:
  • Sound in Linux has an interesting history, and historically lacked sound mixing on hardware that was more software based than hardware.
  • Many sound servers were created to solve the mixing issue.
  • Many libraries were created to solve multiple back-end issues.
  • ALSA replaced OSS version 3 in the Kernel source, attempting to fix existing issues.
  • There was a closed source OSS update which was superb.
  • Linux distributions have been removing OSS support from applications in favor of ALSA.
  • Average sound developer prefers a simple API.
  • Portability is a good thing.
  • Users are having issues in certain scenarios.


Now much has changed, namely:
  • OSS is now free and open source once again.
  • PulseAudio has become widespread.
  • Existing libraries have been improved.
  • New Linux Distributions have been released, and some existing ones have attempted an overhaul of their entire sound stack to improve users' experience.
  • People read the last article, and have more knowledge than before, and in some cases, have become more opinionated than before.
  • I personally have looked much closer at the issue to provide even more relevant information.


Let's take a closer look at the pros and cons of OSS and ALSA as they are, not five years ago, not last year, not last month, but as they are today.

First off, ALSA.
ALSA consists of three components. First part is drivers in the Kernel with an API exposed for the other two components to communicate with. Second part is a sound developer API to allow developers to create programs which communicate with ALSA. Third part is a sound mixing component which can be placed between the other two to allow multiple programs using the ALSA API to output sound simultaneously.

To help make sense of the above, here is a diagram:


Note, the diagrams presented in this article are made by myself, a very bad artist, and I don't plan to win any awards for them. Also they may not be 100% absolutely accurate down to the last detail, but accurate enough to give the average user an idea of what is going on behind the scenes.

A sound developer who wishes to output sound in their application can take any of the following routes with ALSA:
  • Output using ALSA API directly to ALSA's Kernel API (when sound mixing is disabled)
  • Output using ALSA API to sound mixer, which outputs to ALSA's Kernel API (when sound mixing is enabled)
  • Output using OSS version 3 API directly to ALSA's Kernel API
  • Output using a wrapper API which outputs using any of the above 3 methods


As can be seen, ALSA is quite flexible, has sound mixing which OSSv3 lacked, but still provides legacy OSSv3 support for older programs. It also offers the option of disabling sound mixing in cases where the sound mixing reduced quality in any way, or introduced latency which the end user may not want at a particular time.

Two points should be clear, ALSA has optional sound mixing outside the Kernel, and the path ALSA's OSS legacy API takes lacks sound mixing.

An obvious con should be seen here, ALSA which was initially designed to fix the sound mixing issue at a lower and more direct level than a sound server doesn't work for "older" programs.

Obvious pros are that ALSA is free, open source, has sound mixing, can work with multiple sound cards (all of which OSS lacked during much of version 3's lifespan), and included as part of the Kernel source, and tries to cater to old and new programs alike.

The less obvious cons are that ALSA is Linux only, it doesn't exist on FreeBSD or Solaris, or Mac OS X or Windows. Also, the average developer finds ALSA's native API too hard to work with, but that is debatable.


Now let's take a look at OSS today. OSS is currently at version 4, and is a completely different beast than OSSv3 was.
Where OSSv3 went closed source, OSSv4 is open sourced today, under GPL, 3 clause BSD, and CDDL.
While a decade old OSS was included in the Linux Kernel source, the new greatly improved OSSv4 is not, and thus may be a bit harder for the average user to try out. Older OSSv3 lacked sound mixing and support for multiple sound cards, OSSv4 does not. Most people who discuss OSS or try OSS to see how it stacks up against ALSA unfortunately are referring to, or are testing out the one that is a decade old, providing a distortion of the facts as they are today.

Here's a diagram of OSSv4:
A sound developer wishing to output sound has the following routes on OSSv4:
  • Output using OSS API right into the Kernel with sound mixing
  • Output using ALSA API to the OSS API with sound mixing
  • Output using a wrapper API to any of the above methods


Unlike in ALSA, when using OSSv4, the end user always has sound mixing. Also because sound mixing is running in the Kernel itself, it doesn't suffer from the latency ALSA generally has.

Although OSSv4 does offer their own ALSA emulation layer, it's pretty bad, and I haven't found a single ALSA program which is able to output via it properly. However, this isn't an issue, since as mentioned above, ALSA's own sound developer API can output to OSS, providing perfect compatibility with ALSA applications today. You can read more about how to set that up in one of my recent articles.

ALSA's own library is able to do this, because it's actually structured as follows:

As you can see, it can output to either OSS or ALSA Kernel back-ends (other back-ends too which will be discussed lower down).

Since both OSS and ALSA based programs can use an OSS or ALSA Kernel back-end, the differences between the two are quite subtle (note, we're not discussing OSSv3 here), and boils down to what I know from research and testing, and is not immediately obvious.

OSS always has sound mixing, ALSA does not.
OSS sound mixing is of higher quality than ALSA's, due to OSS using more precise math in its sound mixing.
OSS has less latency compared to ASLA when mixing sound due to everything running within the Linux Kernel.
OSS offers per application volume control, ALSA does not.
ALSA can have the Operating System go into suspend mode when sound was playing and come out of it with sound still playing, OSS on the other hand needs the application to restart sound.
OSS is the only option for certain sound cards, as ALSA drivers for a particular card are either really bad or non existent.
ALSA is the only option for certain sound cards, as OSS drivers for a particular card are either really bad or non existent.
ALSA is included in Linux itself and is easy to get ahold of, OSS (v4) is not.

Now the question is where does the average user fall in the above categories? If the user has a sound card which only works (well) with one or the other, then obviously they should use the one that works properly. Of course a user may want to try both to see if one performs better than the other one.

If the user really needs to have a program output sound right until Linux goes into suspend mode, and then continues where it left off when resuming, then ALSA is (currently) the only option. I personally don't find this to be a problem, and furthermore I doubt it's a large percentage of users that even use suspend in Linux. Suspend in general in Linux isn't great, due to some rogue piece of hardware like a network or video card which screws it up.

If the user doesn't want a hassle, ALSA also seems the obvious choice, as it's shipped directly with the Linux Kernel, so it's much easier for the user to use a modern ALSA than it is a modern OSS. However it should be up to the Linux Distribution to handle these situations, and to the end user, switching from one to the other should be seamless and transparent. More on this later.

Yet we also see due to better sound mixing and latency when sound mixing is involved, that OSS is the better choice, as long as none of the above issues are present. But the better mixing is generally only noticed at higher volume levels, or rare cases, and latency as I'm referring to is generally only a problem if you play heavy duty games, and not a problem if you just want to listen to some music or watch a video.


But wait this is all about the back-end, what about the whole developer API issue?

Many people like to point fingers at the various APIs (I myself did too to some extent in my previous article). But they really don't get it. First off, this is how your average sound wrapper API works:

The program outputs sound using a wrapper, such as OpenAL, SDL, or libao, and then sound goes to the appropriate high level or low level back-end, and the user doesn't have to worry about it.

Since the back-ends can be various Operating Systems sound APIs, they allow a developer to write a program which has sound on Windows, Mac OS X, Linux, and more pretty easily.

Some like Adobe like to say how this is some kind of problem, and makes it impossible to output sound in Linux. Nothing could be further from the truth. Graphs like these are very misleading. OpenAL, SDL, libao, GStreamer, NAS, Allegro, and more all exist on Windows too. I don't see anyone complaining there.

I can make a similar diagram for Windows:

This above diagram is by no means complete, as there's XAudio, other wrapper libs, and even some Windows only sound libraries which I've forgotten the name of.

This by no means bothers anybody, and should not be made an issue.

In terms of usage, the libraries stack up as follows:
OpenAL - Powerful, tricky to use, great for "3D audio". I personally was able to get a lot done by following a couple of example and only spent an hour or two adding sound to an application.
SDL - Simplistic, uses a callback API, decent if it fits your program design. I personally was able to add sound to an application in half an hour with SDL, although I don't think it fits every case load.
libao - Very simplistic, incredibly easy to use, although problematic if you need your application to not do sound blocking. I added sound to a multitude of applications using libao in a matter of minutes. I just think it's a bit more annoying to do if you need to give your program its own sound thread, so again depends on the case load.

I haven't played with the other sound wrappers, so I can't comment on them, but the same ideas are played out with each and every one.

Then of course there's the actual OSS and ALSA APIs on Linux. Now why would anyone use them when there are lovely wrappers that are more portable, customized to match any particular case load? In the average case, this is in fact true, and there is no reason to use OSS or ALSA's API to output sound. In some cases, using a wrapper API can add latency which you may not want, and you don't need any of the advantages of using a wrapper API.

Here's a breakdown of how OSS and ALSA's APIs stack up.
OSSv3 - Easy to use, most developers I spoke to like it, exists on every UNIX but Mac OS X. I added sound to applications using OSSv3 in 10 minutes.
OSSv4 - Mostly backwards compatible with v3, even easier to use, exists on every UNIX except Mac OS X and Linux when using the ALSA back-end, has sound re-sampling, and AC3 decoding out of the box. I added sound to several applications using OSSv4 in 10 minutes each.
ALSA - Hard to use, most developers I spoke to dislike it, poorly documented, not available anywhere but Linux. Some developers however prefer it, as they feel it gives them more flexibility than the OSS API. I personally spent 3 hours trying to make heads or tails out of the documentation and add sound to an application. Then I found sound only worked on the machine I was developing on, and had to spend another hour going over the docs and tweaking my code to get it working on both machines I was testing on at the time. Finally, I released my application with the ALSA back-end, to find several people complaining about no sound, and started receiving patches from several developers. Many of those patches fixed sound on their machine, but broke sound on one of my machines. Here we are a year later, and my application after many hours wasted by several developers, ALSA now seems to output sound decently on all machines tested, but I sure don't trust it. We as developers don't need these kinds of issues. Of course, you're free to disagree, and even cite examples how you figured out the documentation, added sound quickly, and have it work flawlessly everywhere by everyone who tested your application. I must just be stupid.

Now I previously thought the OSS vs. ALSA API issue was significant to end users, in so far as what they're locked into, but really it only matters to developers. The main issue is though, if I want to take advantage of all the extra features that OSSv4's API has to offer (and I do), I have to use the OSS back-end. Users however don't have to care about this one, unless they use programs which take advantage of these features, which there are few of.

However regarding wrapper APIs, I did find a few interesting results when testing them in a variety of programs.
App -> libao -> OSS API -> OSS Back-end - Good sound, low latency.
App -> libao -> OSS API -> ALSA Back-end - Good sound, minor latency.
App -> libao -> ALSA API -> OSS Back-end - Good sound, low latency.
App -> libao -> ALSA API -> ALSA Back-end - Bad sound, horrible latency.
App -> SDL -> OSS API -> OSS Back-end - Good sound, really low latency.
App -> SDL -> OSS API -> ALSA Back-end - Good sound, minor latency.
App -> SDL -> ALSA API -> OSS Back-end - Good sound, low latency.
App -> SDL -> ALSA API -> ALSA Back-end - Good sound, minor latency.
App -> OpenAL -> OSS API -> OSS Back-end - Great sound, really low latency.
App -> OpenAL -> OSS API -> ALSA Back-end - Adequate sound, bad latency.
App -> OpenAL -> ALSA API -> OSS Back-end - Bad sound, bad latency.
App -> OpenAL -> ALSA API -> ALSA Back-end - Adequate sound, bad latency.
App -> OSS API -> OSS Back-end - Great sound, really low latency.
App -> OSS API -> ALSA Back-end - Good sound, minor latency.
App -> ALSA API -> OSS Back-end - Great sound, low latency.
App -> ALSA API -> ALSA Back-end - Good sound, bad latency.

If you're having a hard time trying to wrap your head around the above chart, here's a summary:
  • OSS back-end always has good sound, except when using OpenAL->ALSA to output to it.
  • ALSA generally sounds better when using the OSS API, and has lower latency (generally because that avoids any sound mixing as per an earlier diagram).
  • OSS related technology is generally the way to go for best sound.


But wait, where do sound servers fit in?

Sounds servers were initially created to deal with problems caused by OSSv3 which currently are non existent, namely sound mixing. The sound server stack today looks something like this:

As should be obvious, these sound servers today do nothing except add latency, and should be done away with. KDE 4 has moved away from the aRts sound server, and instead uses a wrapper API known as Phonon, which can deal with a variety of back-ends (which some in themselves can go through a particular sound server if need be).

However as mentioned above, ALSA's mixing is not of the same high quality as OSS's is, and ALSA also lacks some nice features such as per application volume control.

Now one could turn off ALSA's low quality mixer, or have an application do it's own volume control internally via modifying the sound wave its outputting, but these choices aren't friendly towards users or developers.

Seeing this, Fedora and Ubuntu has both stepped in with a so called state of the art sound server known as PulseAudio.

If you remember this:

As you can see, ALSA's API can also output to PulseAudio, meaning programs written using ALSA's API can output to PulseAudio and use PulseAudio's higher quality sound mixer seamlessly without requiring the modification of old programs. PulseAudio is also able to send sound to another PulseAudio server on the network to output sound remotely. PulseAudio's stack is something like this:

As you can see it looks very complex, and a 100% accurate breakdown of PulseAudio is even more complex.

Thanks to PulseAudio being so advanced, most of the wrapper APIs can output to it, and Fedora and Ubuntu ship with all that set up for the end user, it can in some cases also receive sound written for another sound server such as ESD, without requiring ESD to run on top of it. It also means that many programs are now going through many layers before they reach the sound card.

Some have seen PulseAudio as the new Voodoo which is our new savior, sound written to any particular API can be output via it, and it has great mixing to boot.

Except many users who play games for example are crying that this adds a TREMENDOUS amount of latency, and is very noticeable even in not so high-end games. Users don't like hearing enemies explode a full 3 seconds after they saw the enemy explode on screen. Don't let anyone kid you, there's no way a sound server, especially with this level of bloat and complexity ever work with anything approaching low latency acceptable for games.

Compare the insanity that is PulseAudio with this:

Which do you think looks like a better sound stack, considering that their sound mixing, per application volume control, compatibility with applications, and other features are on par?

And yes, lets not forget the applications. I'm frequently told about how some application is written to use a particular API, therefore either OSS or ALSA need to be the back-end they use. However as explained above, either API can be used on either back-end. If setup right, you don't have to have a lack of sound using newer version of Flash when using the OSS back-end.

So where are we today exactly?
The biggest issues I find is that the Distributions simply aren't setup to make the choice easy on the users. Debian and derivatives provide a Linux sound base package to select whether you want OSS or ALSA to be your back-end, except it really doesn't do anything. Here's what we do need from such a package:
  • On selecting OSS, it should install the latest OSS package, as well as ALSA's ALSA API->OSS back-end interface, and set it up.
  • Minimally configure an installed OpenAL to use OSS back-end, and preferably SDL, libao, and other wrapper libraries as well.
  • Recognize the setting when installing a new application or wrapper library and configure that to use OSS as well.
  • Do all the above in reverse when selecting ALSA instead.

Such a setup would allow users to easily switch between them if their sound card only worked with the one which wasn't the distribution's default. It would also easily allow users to objectively test which one works better for them if they care to, and desire to use the best possible setup they can. Users should be given this capability. I personally believe OSS is superior, but we should leave the choice up to the user if they don't like whichever is the system default.

Now I repeatedly hear the claim: "But, but, OSS was taken out of the Linux Kernel source, it's never going to be merged back in!"

Let's analyze that objectively. Does it matter what is included in the default Linux Kernel? Can we not use VirtualBox instead of KVM when KVM is part of the Linux Kernel and VirtualBox isn't? Can we not use KDE or GNOME when neither of them are part of the Linux Kernel?

What matters in the end is what the distributions support, not what's built in. Who cares what's built in? The only difference is that the Kernel developers themselves won't maintain anything not officially part of the Kernel, but that's the precise jobs that the various distributions fill, ensuring their Kernel modules and related packages work shortly after each new Kernel comes out.

Anyways, a few closing points.

I believe OSS is the superior solution over ALSA, although your mileage may vary. It'd be nice if OSS and ALSA just shared all their drivers, not having an issue where one has support for one sound card, but not the other.

OSS should get suspend support and anything else it lacks in comparison to ALSA even if insignificant. Here's a hint, why doesn't Ubuntu hire the OSS author and get it more friendly in these last few cases for the end user? He is currently looking for a job. Also throw some people at it to improve the existing volume controlling widgets to be friendlier with the new OSSv4, and maybe get stuff like HAL to recognize OSSv4 out of the box.

Problems should be fixed directly, not in a roundabout matter as is done with PulseAudio, that garbage needs to go. If users need remote sound (and few do), one should just be easily able to map /dev/dsp over NFS, and output everything to OSS that way, achieving network transparency on the file level as UNIX was designed for (everything is a file), instead of all these non UNIX hacks in place today in regards to sound.

The distributions really need to get their act together. Although in recent times Draco Linux has come out which is OSS only, and Arch Linux seems to treat OSSv4 as a full fledged citizen to the end user, giving them choice, although I'm told they're both bad in the the ALSA compatibility department not setting it up properly for the end user, and in the case of Arch Linux, requiring the user to modify the config files of each application/library that uses sound.

OSS is portable thanks to its OS abstraction API, being more relevant to the UNIX world as a whole, unlike ALSA. FreeBSD however uses their own take on OSS to avoid the abstraction API, but it's still mostly compatible, and one can install the official OSSv4 on FreeBSD if they so desire.

Sound in Linux really doesn't have to be that sorry after all, the distributions just have to get their act together, and stop with all the finger pointing, propaganda, and FUD that is going around, which is only relevant to ancient versions of OSS, if not downright irrelevant or untrue. Let's stop the madness being perpetrated by the likes of Adobe, PulseAudio propaganda machine, and whoever else out there. Let's be objective and use the best solutions instead of settling for mediocrity or hack upon hack.

Tuesday, May 12, 2009


What the heck is a PC?



So the other day a friend of mine comes over to discuss some business strategy. After chatting for a bit, he asks me if he could use one of my computers to check his E-Mail. Happily obliging, I log into one of my x86-64 machines running Linux/KDE, open up Firefox for him, and tell him to go check.

So my friend sits down, grabs for the mouse and keyboard, and starts looking at the screen with a weird expression on his face. After a couple of seconds, he clicks on the address bar, looks around a bit more, then he motions to me that he could use some help.

I asked him if he used a particular web based e-mail service, he answered in the affirmative that he uses G-Mail. So I typed in the address to G-Mail for him, which brought up the login page asking for his username and password.

He again looks at the screen, types in his username, then pauses again. He turns to me and tells me he feels really uncomfortable using this system, do I have a PC available that he can use?

This question made me completely flabbergasted. My friend is technically oriented, he's been using computers for a decade, he knows various operating systems exist, he knows various web browsers exist. He uses Firefox both at home and work. He even does some VBA scripting in his Excel spreadsheets. What in the world is scaring him about Firefox in Linux/KDE?

As the story continues, I turn to my friend and tell him this is a PC. He looks back at the screen, looks a bit startled, then seems to think for a few moments. Finally he turns to me and says, you know, Windows? So I reboot the machine into Windows XP (all my computers are minimally dual boot), and open up Firefox for him. He sits down happily and checks his E-Mail.

This experience left me thinking though. What the heck is a PC? It seems some people today have got it in their skulls that PC is a connotation for Windows. PC-DOS, PC-BSD, and other OSs must appear oxymoronish to these people if they ever saw them.

Thinking about it, I always for years viewed a PC as something IBM put together, most importantly containing an x86 processor. The basic machine changed over the years, but it was always x86. But then Apple released computers on x86 chips too recently, and they vehemently deny that they use PC hardware. Their basic difference being they don't use any classic PC BIOS.

Should we now take the term PC to mean x86+BIOS? Thinking more about it, a PC should probably be defined as a computer capable of running MS/PC DOS. Maybe even Apple decided to use their whole EFI setup just so they can classify their computers as not PCs. And thanks to dumb marketing, people are now thinking PC means running Windows.

So, do you know anyone who is scared of the same browser on a different OS? What do you think of when you hear "PC"? Has the PC definition changed? Should it change?

Saturday, November 3, 2007


PATH_MAX simply isn't



Many C/C++ programmers at some point may run into a limit known as PATH_MAX. Basically, if you have to keep track of paths to files/directories, how big does your buffer have to be?
Most Operating Systems/File Systems I've seen, limit a filename or any particular path component to 255 bytes or so. But a full path is a different matter.

Many programmers will immediately tell you that if your buffer is PATH_MAX, or PATH_MAX+1 bytes, it's long enough. A good C++ programmer of course would use C++ strings (std::string or similar with a particular API) to avoid any buffer length issues. But even when having dynamic strings in your program taking care of the nitty gritty issue of how long your buffers need to be, they only solve half the problem.

Even a C++ programmer may at some point want to call the getcwd() or realpath() (fullpath() on Windows) functions, which take a pointer to a writable buffer, and not a C++ string, and according to the standard, they don't do their own allocation. Even ones that do their own allocation very often just allocate PATH_MAX bytes.

getcwd() is a function to return what the current working directory is. realpath() can take a relative or absolute path to any filename, containing .. or levels of /././. or extra slashes, and symlinks and the like, and return a full absolute path without any extra garbage. These functions have a flaw though.

The flaw is that PATH_MAX simply isn't. Each system can define PATH_MAX to whatever size it likes. On my Linux system, I see it's 4096, on my OpenBSD system, I see it's 1024, on Windows, it's 260.

Now performing a test on my Linux system, I noticed that it limits a path component to 255 characters on ext3, but it doesn't stop me from making as many nested ones as I like. I successfully created a path 6000 characters long. Linux does absolutely nothing to stop me from creating such a large path, nor from mounting one large path on another. Running getcwd() in such a large path, even with a huge buffer, fails, since it doesn't work with anything past PATH_MAX.

Even a commercial OS like Mac OS X defines it as 1024, but tests show you can create a path several thousand characters long. Interestingly enough, OSX's getcwd() will properly identify a path which is larger than its PATH_MAX if you pass it a large enough buffer with enough room to hold all the data. This is possible, because the prototype for getcwd() is:
char *getcwd(char *buf, size_t size);


So a smart getcwd() can work if there's enough room. But unfortunately, there is no way to determine how much space you actually need, so you can't allocate it in advance. You'd have to keep allocating larger and larger buffers hoping one of them will finally work, which is quite retarded.

Since a path can be longer than PATH_MAX, the define is useless, writing code based off of it is wrong, and the functions that require it are broken.

An exception to this is Windows. It doesn't allow any paths to be created larger than 260 characters. If the path was created on a partition from a different OS, Windows won't allow anything to access it. It sounds strange that such a small limit was chosen, considering that FAT has no such limit imposed, and NTFS allows paths to be 32768 characters long. I can easily imagine someone with a sizable audio collection having a 300+ character path like so:
"C:\Documents and Settings\Jonathan Ezekiel Cornflour\My Documents\My Music\My Personal Rips\2007\Technological\Operating System Symphony Orchestra\The GNOME Musical Men\I Married Her For Her File System\You Don't Appreciate Marriage Until You've Noticed Tax Pro's Wizard For Married Couples.Track 01.MP5"


Before we forget, here's the prototype for realpath:
char *realpath(const char *file_name, char *resolved_name);


Now looking at that prototype, you should immediately say to yourself, but where's the size value for resolved_name? We don't want a buffer overflow! Which is why OSs will implement it based on the PATH_MAX define.
The resolved_name argument must refer to a buffer capable of storing at least PATH_MAX characters.

Which basically means, it can never work on a large path, and no clever OS can implement around it, unless it actually checks how much RAM is allocated on that pointer using an OS specific method - if available.

For these reasons, I've decided to implement getcwd() and realpath() myself. We'll discuss the exact specifics of realpath() next time, for now however, we will focus on how one can make their own getcwd().

The idea is to walk up the tree from the working directory, till we reach the root, along the way noting which path component we just went across.
Every modern OS has a stat() function which can take a path component and return information about it, such as when it was created, which device it is located on, and the like. All these OSs except for Windows return the fields st_dev and st_ino which together can uniquely identify any file or directory. If those two fields match the data retrieved in some other way on the same system, you can be sure they're the same file/directory.
To start, we'd determine the unique ID for . and /, once we have those, we can construct our loop. At each step, when the current doesn't equal the root, we can change directory to .., then scan the directory (using opendir()+readdir()+closedir()) for a component with the same ID. Once a matching ID is found, we can denote that as the correct name for the current level, and move up one.

Code demonstrating this in C++ is as follows:


bool getcwd(std::string& path)
{
typedef std::pair<dev_t, ino_t> file_id;

bool success = false;
int start_fd = open(".", O_RDONLY); //Keep track of start directory, so can jump back to it later
if (start_fd != -1)
{
struct stat sb;
if (!fstat(start_fd, &sb))
{
file_id current_id(sb.st_dev, sb.st_ino);
if (!stat("/", &sb)) //Get info for root directory, so we can determine when we hit it
{
std::vector<std::string> path_components;
file_id root_id(sb.st_dev, sb.st_ino);

while (current_id != root_id) //If they're equal, we've obtained enough info to build the path
{
bool pushed = false;

if (!chdir("..")) //Keep recursing towards root each iteration
{
DIR *dir = opendir(".");
if (dir)
{
dirent *entry;
while ((entry = readdir(dir))) //We loop through each entry trying to find where we came from
{
if ((strcmp(entry->d_name, ".") && strcmp(entry->d_name, "..") && !lstat(entry->d_name, &sb)))
{
file_id child_id(sb.st_dev, sb.st_ino);
if (child_id == current_id) //We found where we came from, add its name to the list
{
path_components.push_back(entry->d_name);
pushed = true;
break;
}
}
}
closedir(dir);

if (pushed && !stat(".", &sb)) //If we have a reason to contiue, we update the current dir id
{
current_id = file_id(sb.st_dev, sb.st_ino);
}
}//Else, Uh oh, can't read information at this level
}
if (!pushed) { break; } //If we didn't obtain any info this pass, no reason to continue
}

if (current_id == root_id) //Unless they're equal, we failed above
{
//Built the path, will always end with a slash
path = "/";
for (std::vector<std::string>::reverse_iterator i = path_components.rbegin(); i != path_components.rend(); ++i)
{
path += *i+"/";
}
success = true;
}
fchdir(start_fd);
}
}
close(start_fd);
}

return(success);
}


Before we accept that as the defacto method to use in your application, let us discuss the flaws.

As mentioned above, it doesn't work on Windows, but a simple #ifdef for Windows can just make it a wrapper around the built in getcwd() with a local buffer of size PATH_MAX, which is fine for Windows, and pretty much no other OS.

This function uses the name getcwd() which can conflict with the built in C based one which is a problem for certain compilers. The fix is to rename it, or put it in its own namespace.

Next, the built in getcwd() implementations I checked only have a trailing slash on the root directory. I personally like having the slash appended, since I'm usually concatenating a filename onto it, but note that if you're not using it for concatenation, but to pass to functions like access(), stat(), opendir(), chdir(), and the like, an OS may not like doing the call with a trailing slash. I've only noticed that being an issue with DJGPP and a few functions. So if it matters to you, the loop near the end of the function can easily be modified to not have the trailing slash, except in the case that the root directory is the entire path.

This function also changes the directory in the process, so it's not thread safe. But then again, many built in implementations aren't thread safe either. If you use threads, calculate all the paths you need prior to creating the threads. Which is probably a good idea, and keep using path names based off of your absolute directories in your program, instead of changing directories during the main execution elsewhere in the program. Otherwise, you'll have to use a mutex around the call, which is also a valid option.

There could also be the issue that some level of the path isn't readable. Which can happen on UNIX, where to enter a directory, one only needs execute permission, and not read permission. I'm not sure what one can do in that case, except maybe fall back on the built in one hoping it does some magical Kernel call to get around it. If anyone has any advice on this one, please post about it in the comments.

Lastly, this function is written in C++, which is annoying for C users. The std::vector can be replaced with a linked list keeping track of the components, and at the end, allocate the buffer size needed, and return the allocated buffer. This requires the user to free the buffer on the outside, but there really isn't any other safe way of doing this.
Alternatively, instead of a linked list, a buffer which is constantly reallocated can be used while building the path, constantly memmove()'ing the built components over to the higher part of the buffer.

During the course of the rest of the program, all path manipulation should be using safe allocation managing strings such as std::string, or should be based off of the above described auto allocating getcwd() and similar functions, and constantly handling the memory management, growing as needed. Be careful when you need to get any path information from elsewhere, as you can never be sure how large it will be.

I hope developers realize that when not on Windows, using the incorrect define PATH_MAX is just wrong, and fix their applications. Next time, we'll discuss how one can implement their own realpath().