Showing posts with label strlcat. Show all posts
Showing posts with label strlcat. Show all posts

Sunday, April 27, 2014


LibreSSL: The good and the bad



OpenSSL & LibreSSL

OpenBSD recently forked the popular SSL/TLS library OpenSSL into LibreSSL. Most of the reaction to this that I've seen tends to be pretty angry. People don't like the idea of a project being forked, they'd rather people work together, and have the OpenBSD team instead join OpenSSL.

Now, for those of you that don't know it, OpenSSL is at the same time the best and most popular SSL/TLS library available, and utter junk. It's the best because it covers a wide array of the capabilities that exist across the many standards that make up SSL/TLS, and has seen years of development to iron out a multitude of issues and attacks levied against the specifications. The developers also seem to know a lot more about programming than the developers behind some of the competing SSL/TLS libraries. There are a ton of gotchas when it comes to developing an SSL/TLS stack, way beyond other areas of development, and things must be done absolutely correctly.

Cryptographic development challenges

Aside from requiring meticulous programming to avoid typical programming issues and specific problems of the language in question, for SSL/TLS, one also needs to worry about:
  • Ensuring libraries work correctly for tons of cases which are difficult to test for, and where unit tests cannot be exhaustive, and regression testing is far from complete.
  • Work in a completely hostile environment where every single outside variable must be validated by itself, and as part of a larger whole.
  • Data has to be correct both as raw data, and in their particular meaning.
  • Ensure data dependent operations run in actual constant time despite differences in their values to avoid timing attacks.
  • Ensure errors caught aren't actually handled immediately, but only at the very end of the single cryptographic unit as a whole completes, in order to avoid timing attacks. The rest of the unit must function normally despite the error(s).
  • Random data must be taken from an unpredictable source, and random streams must not have any detectable patterns or biases to them.
  • Sensitive/secret data must be protected, without allowing mathematical trickery or the computer environment somehow revealing them. 
  • Ensuring data is wiped clean, without the compiler optimizations or virtual machine ignoring what they deem to be pointless operations.
  • The inability to use some high-level languages because they lack a way to tie in forceful cleanup of primitive data types, and their error handling mechanisms may end up leaving no way to wipe data, or data is duplicated without permission.
  • Almost every single thing which may be the right way of doing things elsewhere is completely wrong where cryptography is concerned.
  • Things still have to be extremely well optimized, otherwise the expenses are too high to be viable in most scenarios.
The above list is hardly exhaustive, and is pretty generic. Various cryptographic algorithms also have a ton of specific issues with them that need to be avoided. These issues are not necessarily specified in the standards, but have been learned over time from those that made mistakes, or by various research.

Some examples of things to avoid which is part of the collective knowledge of SSL/TLS developers:
  • Particular constants with certain algorithms must be avoided.
  • Particular constants with certain algorithms must be used.
  • Certain algorithms don't work well together.
  • Strings must also be handled as raw data.
  • Certain magnitudes are dangerous.
  • Some platforms process certain algorithms or data in a way which must be worked around.
  • Data broken up in certain ways is dangerous.
Now, one can read the specifications and implement them using best software engineering practices, and best cryptographic engineering practices, but where does one learn all the things to avoid? I've read 3 different books which cover how to implement significant parts or all of SSL/TLS, and each one listed unique gotchas that weren't in the other two. They're not mentioned in the various standards, and many of them are far from obvious. Unless you've spent years developing SSL/TLS, and took note of every mistake ever made, you're probably doing it wrong. In fact, most alternative SSL/TLS implementations I've seen make these various mistakes that OpenSSL already learned from.

Who should design cryptographic libraries


In order to create a proper SSL/TLS implementation you need to be a master of:
  • Cryptographic algorithms.
  • Cryptographic practice.
  • Software engineering.
  • Software optimization.
  • The language(s) used.
  • Domain specific knowledge.
Rarely are developers a true master at one of these, let alone being a master of the algorithms and software engineering. Which means most SSL/TLS libraries will either fail at being at the forefront of cryptography or fail nice sane design, or both.

This is also why OpenSSL is utter junk. Some of its developers may have been decent at the algorithms and software engineering, but not particulary good at either of them. Other developers may have been a master of one, but absolutely abysmal at the other. Due to its popularity, OpenSSL is also a dumping ground for every new cryptographic idea, good or bad, and is constantly pushed in every direction, being spread far too thin.

The OpenSSL API for most things is absolutely horrid. No self-respecting software engineer could have designed them. Objects which in reality share a sibling or parent-child relationship are implemented drastically differently, with dissimilar methods to work with them. Methods need to be used on objects in a certain unintuitive order for no apparent reason, or else things break.

The source code under the hood is terrible too. A huge lack of consistency. Operations are done in a very roundabout manner. Issues with one platform are solved by making things worse for all other platforms. Most things are not implemented particularly well. In fact the only thing OpenSSL is particularly good at is being ubiquitous.

Alternatives to OpenSSL

Now, several of the alternatives are much more nicely engineered and with saner APIs. However they all seem to fail basic quality in certain areas. For example, ASN.1 handling is generally written by people who don't know the ins and outs of data structures and best practice and algorithms. If you want good ASN.1, you'll need to have a database design expert to create it for you, not the engineers who are better at cryptography or simple API design. It should also be repeated that nice engineering doesn't equate secure, there's just so much collective messy knowledge which needs to be added.

So now enter the OpenBSD developers, who have a track record for meticulous programming and generally know what they're doing. They're looking to vastly cleanup OpenSSL. I've been waiting for something like this for years. They're going to take the best SSL/TLS library, and going to bring it up to decent engineering standards.

Fixing OpenSSL itself with the OpenSSL team and development structure is really not an option. Since it's a dumping ground, there's no true quality control as needed. Since its main aim is to be ubiquitous, the ancient platforms will inherently be dragging down the modern ones.

LibreSSL progress

So far, the OpenBSD fork of OpenSSL has deleted tons of code. This is crucial, as the more code there is, the more opportunity for bugs. Newbie programmers don't fully understand this point. Newbies think the more verbose some code is, the better. However, more code means more room for mistakes, and it also increases the code size. The more code used in a function, the harder it is to fully comprehend all of it. We humans have limits to how much data we can juggle in our heads. In order to ensure things can be fully conceptualized and analyzed, it has to be as short and as modular as possible. More library usage for similar techniques, and removing similar yet different copies of code littered throughout, as OpenBSD is aiming towards will ensure much higher levels of quality.

OpenBSD is also working to ensure LibreSSL does not contain the year 2038 problem, extending compatibility far into the future. Some of the random method order usage requirements are removed, making development less error prone with LibreSSL. LibreSSL in a very short time is becoming much more lean and more correct than OpenSSL ever was.

LibreSSL pitfalls


However, with the good, there is also bad. LibreSSL is aiming to be compatible with existing software using OpenSSL, which means brain damaged APIs will continue to exist. LibreSSL right now is also being modified to use OpenBSD specific functionality when OpenBSD's existing technology is more secure than what OpenSSL was doing. This means that for the time being, LibreSSL will only work correctly with OpenBSD. Since some functionality won't exist on other platforms, or that the functionality will exist, but not be nearly as secure as OpenBSD's implementation, potentially making ports which seem straight forward to actually be less secure than OpenSSL currently is.

Once I saw what OpenBSD was doing, I predicted to myself that there would be those looking to port LibreSSL to other platforms, and fail to account for all the security considerations. It seems I was right. In just a few short days since LibreSSL's announcement, I'm seeing a multitude of porting projects pop up all over for Linux, Windows, OS X, FreeBSD, and more, all by developers who may know some software engineering, but don't know the first thing about proper cryptographic implementations, or understanding the specific advantages of the OpenBSD implementations.

These porting implementations are failing many of the generic things listed above:
  • Memory wiping is being optimized out.
  • Random values are not being seeded with proper entropy.
  • Random values contain noticeable patterns or bias.
  • Bugs fixed in OpenSSL by switching from generic memory functions to OpenBSD's are being reintroduced by being switched back to generic or naive implementations of the OpenBSD memory functions.
  • Constant-time functions aren't constant on the platforms being ported to.
The developers not understanding what they're doing is why LibreSSL was created in the first place. Unfortunately these porters are not security experts, and are creating or nabbing naive implementations from various sources. They're sometimes also grabbing source from OpenBSD itself or other security focused projects without realizing the source in question is being compiled with certain parameters or with special compilers to gain their security, and the implementation without the same build setup is hardly secure.

As with OpenSSH, in the future, the OpenBSD team will probably be providing a portable LibreSSL. But until then, avoid cheap knockoffs.

LibreSSL future

Will LibreSSL be successful? From what I've seen, it already is. You just may not have it available for your platform at the moment. The API is still crud, and as much as the OpenBSD team is cleaning things up, their code usually could go a bit farther in terms of brevity and clarity. However, the library will undoubtedly be superior to use compared to OpenSSL, and will serve as a much cleaner starting point to document how to create other implementations. Let's just hope the OpenBSD team is or will learn to be as knowledgeable regarding the SSL/TLS algorithms and gotchas as the OpenSSL team was.

Now before you start clamoring to start creating a high level implementation of SSL/TLS, or complaining that C is bad, and it should be written in Java. Consider that it's not even possible to program a secure SSL/TLS library in Java! (Source: Cryptography Engineering page 122).

Monday, March 26, 2007


Methods for safe string handling



Every now and then you hear about how a buffer overflow was discovered in some program. Immediately, everyone jumps on the story with their own take on the matter. Fans of a different programming language will say: "of course this wouldn't have happened if you'd have used my programming language". Secure library advocates will say: "you should have used that library instead". While experts of that language will say: "The guy was an idiot, he coded that all wrong".

I'd like to look at basic "C string" handling in C. We're talking about functions like strlen(), strcpy(), strcat(), and strcmp(). These functions report length, copy, concatenate, and compare C strings respectively. A C string is an array of characters, which is terminated with a null character to signify the end. strlen() would find the length by seeing how far from the beginning the null was. strcat() would find the null in the first C string, and then to that location copy characters from the second string, till it finds the null. So on and so forth with all the C string functions.

Now these functions are seen by some as inherently broken, since they can read/write data right off the end of the buffer, and the terminating nulls can sometimes vanish if one isn't careful. You see these kinds of issues with C++ programs too. Usually when a Java programmer hears about this, they tell you to use Java, since Java has a built in string class which can't get screwed up by any of these simple operations. Learned C++ programmers know that C++ also has a built in string class, just as good, if not better than Java's string class. Knowledgeable C++ programmers will use C++'s strings to avoid all these issues, and have really nice features, such as sane string comparison using ==.

Switching from C++ to Java, because people know that Java has a string class is kind of ridiculous. Yet I keep hearing from all kinds of people how programmers should switch to Java because of it. One using C++ should generally use the C++ class, unless they have a good reason otherwise. It's a shame most C++ programmers use C strings instead of C++ ones, probably due to them not knowing about it. However this doesn't help when programming in pure C.

For pure C, you can turn to one of the libraries for handling strings, such as SafeStr. But most people won't choose to go this route. Due to this, care has to be taken.

Now I won't kid you, if there's a buffer overflow in a C program, it is the programmer's fault. If s/he was more careful, it wouldn't have happened. However some areas of code are large, have many code paths, and are downright confusing. In those cases, it's easy to screw up. To help prevent screwing up, there exist some more C functions to handling strings properly, and some C libraries also provide extra non standard functions.

In answer to just overflowing a buffer when copying or concatenating, "n" versions are provided. These are strncpy(), strncat(). They take a third parameter to tell it how much they're dealing with. strncpy()'s n refers to how big the buffer is, and it'll only copy up to that amount of characters. strncat()'s n is up to how many characters can be stuck onto the end of the first string. In the case of strncpy(), if null isn't found in the copy process, the result is not null terminated. Leaving us with the other problem. strncat() on the other hand will always null terminate, because it attaches n+1 bytes whenever the second string's end isn't reached. Meaning that you have to tell strncat() length_of(remaining bytes in buffer)-1. This leads to confusion, because of different n meanings, and because strncpy() introduces another problem.

There's also strncmp(), for specifying the maximum amount of characters to compare, which you can use if one of the strings isn't null terminated. Surprisingly however, there is no strnlen(), to check how many characters a string is, without running off the end. Considering that strncpy() doesn't always null terminate, sounds like a useful feature to have.

Taking this into account, in some C libraries, you'll find strnlen() which returns the length, or instead will return the value of n, in the case where no null was found. Those needing it, it's an easy function to implement yourself:
size_t strnlen(const char *s, size_t n)
{
  const char *p = (const char *)memchr(s, 0, n);
  return(p ? p-s : n);
}


Although it would be intelligent to follow up every call to "strncpy(s, n)" with a "s[n-1] = 0" to terminate it yourself. But this hardly helps the confusion. Also take into account that str[n][cpy/cat]() return the destination for their return value, so you'll sometimes see code like:
if (!strcmp(strncpy(buf, entered_text, sizeof(buf)), param))
{
  do_something();
}

However, this code is broken as buf may not be null terminated. A correct version would perhaps be:
if (!strncmp(strncpy(buf, entered_text, sizeof(buf)), param, min(sizeof(param), sizeof(buf))))
{
  do_something();
}

Which is ugly at best.

To solve and work around these issues, OpenBSD has invented the strlcpy() and strlcat() functions, which have been implemented in all BSD derivatives (including Solaris and Mac OS X). Manpage here.

Although I found the standard descriptions confusing at best. Here's my take on it after some study:
size_t strlcpy(char *dest, const char *src, size_t size);

Description:
    The strlcpy() function copies the C string pointed to by src (including the null) to
the array pointed to by dest. However, not more than size bytes of src are copied. Meaning
at most size-1 characters will be copied. The copy will always be null terminated, unless
size was 0, in which case nothing is done to dest.

Return Value:
    Return is always the amount of characters needed to hold the copy. Meaning strlen(src).
If the return value is <size, everything was copied.


With strlcpy(), you can run it once with a size of 0 if you want to find out how much you need to allocate. Although pointless, you'd be better off with strlen(). Now this won't help much if src isn't null terminated, but it should avoid issues you have with misusing the return value like in the case offered above, or when there wasn't enough room. If you always pass strlcpy() the sizeof() the buffer, or the value passed to malloc() as the case may be, you should be safe.

If you read the manpage, you also see a usefulness to the return value. A problem with constantly using strcat() is that you have to keep iterating through the former strings, leading to a speed loss. With strlcpy(), you can do the following for concatenation:

if ((n1 = strlcpy(a, b, sizeof(a))) < sizeof(a))
{
  if ((n2 = strlcpy(a+n1, c, sizeof(a)-n1)) < sizeof(a)-n1)
  {
    if ((n3 = strlcpy(a+n2, d, sizeof(a)-n2)) < sizeof(a)-n2)
    {
      etc...
    }
  }
}

A nice trick for mass concatination, although as the manpage points out, ridiculous, and negates strlcat().

Moving onwards, the manpage listed above for strlcpy() is also for strlcat(), yet as above, I found it a bit confusing too. Here's my take:

size_t strlcat(char *dest, const char *src, size_t size);

Description:
    The strlcat() function appends the src string to the dest string overwriting the ‘\0’
character at the end of dest, and then adds a terminating ‘\0’ character. However, not more
than size-strlen(dest) bytes of src are copied. Meaning a maximum of size-1 characters will
fill dest in the end. The copy will always be null terminated, unless size was less than the
length of dest, or dest is not null terminated, in which case nothing is done to dest.

Return Value:
    Return is the amount of characters needed to hold the copy when dest initially is null
terminated and its length is less than size. Otherwise the return is size+strlen(src). If
the return value is <size, everything was copied.


What's nice about strlcat() is that for the size param, you can pass it the sizeof() or the malloc() value like you do for strlcpy(). But beware the return value, the OpenBSD code is rightly commented as follows: "Returns strlen(src) + min(siz, strlen(initial dst))". Take a moment to comprehend that.

If you're not using a BSD and you want these functions, code is here and here. Be wary of some of the other implementations you find online. I looked at some of them, and they acted differently in some other corner cases. One I looked at even crashed in one of the corner cases.

However looking at that code there, it looks a bit messy. Reviewing our previous multiple concatenation case, which is also spoken about in the manpage, one sees these as a bit weak. If one wants nice multi concat without too much fuss, they'd normally use snprintf() (C99) with a bunch of "%s%s%s" as the format. I myself though prefer a more elegant solution to all of this.

I therefor have created the following logical extension of OpenBSD's l functions, I give you strlmrg():
size_t strlmrg(char *dest, size_t size, ...)
{
  char *s, *end = dest + (size-1);
  size_t needed = 0;

  va_list ap;
  va_start(ap, size);

  while ((s = va_arg(ap, char *)))
  {
    if (s == dest)
    {
      size_t n = strnlen(s, (end+1)-s);
      needed += n;
      dest += n;
    }
    else
    {
      needed += strlen(s);
      if (dest && (dest < end))
      {
        while (*s && (dest < end))
        {
          *dest++ = *s++;
        }
        *dest = 0;
      }
    }
  }

  va_end(ap);
  return(needed);
}

Pass strlmrg() the destination buffer, it's size (from sizeof() or the param to malloc()), and all the strings you want concatenated, followed by a null pointer.
Example 1:
printf("%zu; %s\n", strlmrg(line, sizeof(line), "I ", "Went ", "To ", "The ", "Park.", NULL), line);

It would print: "19; I Went To The Park."
Example 2:
n = strlmrg(buffer, sizeof(buffer), a, b, c, d, e, f, (void *)0);

Which would concatenate a to f inside buffer (given that it could fit), and return the amount of characters copied. Note, it returns how many characters would be copied, so you can use it to determine the size. See this example:
size_t n = strlmrg(0, 0, a, b, c, (void *)0);
char *p = malloc(n+1); //+1 for the null
strlmrg(p, n+1, a, b, c,(void *)0); //Again, +1 for the null

When strlmrg() returns less than size, everything was merged in. The result is always null terminated except when dest is null, size is 0, or it encounters one of the source pointers to match the location it is currently trying to copy to.
You should avoid passing one of the source pointers to be a location from the destination buffer. If you happened to pass in such an overlapping source pointer, and it's not null terminated prior to it reaching size, you will get size as the return value instead of the full size. Also don't try to pass it any non null terminated source pointer, or forget to pass the last null pointer.

Once we have strlmrg() implemented, it also paves the way for a simple and straightforward implementation for strlcpy() and strlcat().
size_t strlcpy(char *dest, const char *src, size_t size)
{
  return(strlmrg(dest, size, src, (void *)0));
}

size_t strlcat(char *dest, const char *src, size_t size)
{
  return(strlmrg(dest, size, dest, src, (void *)0));
}

And unlike the official ones, these won't crash if dest or src is null. I tested these wrappers, and they seemed to match results with the official ones in every regular and edge case I tried.

I also tested strlmrg() in a variety of cases, and it seems to be very good and secure. If you find a bug, or have an improvement to offer, feel free to post about it.

Thoughts?