Saturday, December 13, 2014

Effective Modern C++ Quick Review and Sale

For those of you familiar with Scott Meyers amazing Effective C++ and Effective STL (both of which I highly recommend to C++ programmers), Effective Modern C++ is now out which covers best practices for new features introduced in C++ 2011, as well as some upcoming features of C++ 2014.

For one more day only, you can also get $10 off any of the above links to Amazon if you use the code BOOKDEAL25 when you checkout (if your order totals $40 or more).

For those of you unfamiliar with the series, I'll provide you with a quick review.

Effective C++ is an excellent book for C++ beginners and experts alike. It helps beginners understand what is and how to choose the best tool for the job when C++ offers multiple ways of doing things. It can also help even experts to better understand how the language works, and what is and isn't good practice. You'll learn how to correctly navigate this large language, and end up writing programs where memory management or other resource issues are a thing of the past. You'll learn how to design programming interfaces that look sane, and you won't come back to them later asking yourself what drugs you were on when you originally wrote the code.

Effective STL dives more into the standard library of C++, helping the developer understand which container or algorithm is the best tool for the job, and how to use them most effectively. If you've ever wondered to yourself if you should use an std::vector or std::list, you need to read this book. If you've ever wondered what functors are or why bother with features which at first glance may seem like a roundabout way of doing things, you'll unlock the true potential of these features and more, and learn how what may seem weird to the untrained eye is actually powerful and produces better code.

Effective Modern C++ introduces the reader to new features added to the language in recent years, and as the other books in the series, helps them to better understand them, get the best out of them, and use them correctly. You'll learn how to choose and make the best out of the new resource management introduced with std::shared_ptr, std::unique_ptr, and std::weak_ptr, avoid common pitfalls with the new threading library, new functional programming techniques exposed in modern C++, as well as how to achieve even more optimization with all the new features.

The series also includes More Effective C++, however I do not recommend this book. Unlike the previously mentioned books, the latest edition of More Effective C++ came out before C++ was even standardized, and has been out of date for well over a decade. It does explain certain features of the language, most of which are still present today, but those features are best covered by an introduction to the language itself like The C++ Programming Language (hardcover), Principals and Practice Using C++, and The C++ Standard Library. One thing More Effective C++ covers that isn't in these great C++ learning and reference books is how to use C and C++ together effectively. Although in my opinion that's better covered in the more C focused book Expert C Programming: Deep C Secrets.

For those of you thinking to yourself that C++ is a horrible language, especially since it has so many books printed for it on how to correctly use it and get the best out of it, you might want to realize that most well established languages have such books. For example, Effective Java, Effective JavaScript, and the groundbreaking JavaScript: The Good Parts. Proper programming design is an art, and one could even go as far to say that if a book of this nature does not exist for a language, it's because no one has yet figured out how to really use it effectively.

Thursday, October 2, 2014

What's really wrong with systemd?

Unless you've been living under a rock or on a park bench during the past two years, you've probably heard of systemd and the numerous issues and controversies involved. You probably also have also heard about some new alternatives.

Now instead of boring you with long lengthy arguments and debatable concepts and anecdotes, I'm going to boil the core problem with systemd down to two simple points:

  • The singular aim of systemd is to get other projects to depend on it.
  • The systemd developers have a track record for not knowing what they're doing, thereby creating very fragile and unwieldy problematic software.
Now if you're looking for proof of these two points, you're free to read through the aforementioned pages I linked to, as well as search online for more. I hope you've enjoyed my summary.

Monday, June 30, 2014

Memory management in C and auto allocating sprintf() - asprintf()

Memory Management

Memory management in C is viewed by some to be quite tricky. One needs to work with pointers that can point anywhere in memory, and if misused, cause a program to misbehave, or worse.

The basic functions to allocate and deallocate memory in C are malloc() and free() respectively. The malloc() function takes a size in bytes of how much to allocate, and returns a pointer to the allocated memory to be used. Upon failure, a null pointer, which can be thought of as a pointer pointing to 0 is returned. The free() function takes the pointer returned by malloc(), and deallocates the memory, or frees up the memory it was once pointing to.

To work with malloc() in simple situations, typically, code along the following lines is used:
void *p = malloc(size);
if (p)
{
  ... work with p ...
  free(p);
}
else
{
  ... handle error scenario ...
}

Unfortunately many experienced programmers forget to handle the failure scenario. I've even heard some say they purposely don't, as they have no clue how to proceed, and just letting the program crash is good enough for them. If you meet someone who makes that argument, revoke their programming license. We don't need such near sighted idiots writing libraries.

In any case, even the above can lead to some misuse. After this block of code runs, what is p now pointing to?

After the above code runs, in the case that malloc() succeeded, p is now pointing to memory in middle of nowhere, and can't be used. This is known as a dangling pointer. Dangling pointers can be dangerous, as an if clause as above will think the pointer is valid, and may do something with it, or lead to the infamous use after free bug. This becomes more likely to occur as the situation becomes more complicated and there are loops involved, and how malloc() and free() interact can take multiple paths.

Pointers involved with memory management should always be pointing at 0 or at allocated memory. Anything else is just asking for trouble. Therefore, I deem any direct use of free() dangerous, as it doesn't set the pointer to 0.

So if free() is considered harmful, what should one use?

In C++, I recommend the following:

static inline void insane_free(void *&p)
{
  free(p);
  p = 0;
}

This insane_free() is now a drop in replacement for free(), and can be used instead. (Since C++ programs normally use new and delete/delete[] instead, I leave it as an exercise to the reader how to work with those.)

However, C doesn't support direct references. One can pass a pointer by a pointer to accomplish similar results, but that becomes clunky and is not a drop in replacement. So in C, I recommend the following:
#define insane_free(p) { free(p); p = 0; }
It makes use of the preprocessor, so some may consider it messy, but it can be used wherever free() currently is. One could also name the macro free in order to automatically replace existing code, but it's best not to program that way, as you begin to rely on these semantics. This in turn means someone copying your code may think a call to free() is the normal free() and not realize something special is occurring when they copy it elsewhere without the macro.

Correct usage in simple cases is then:
void *p = malloc(size);
if (p)
{
  ... work with p ...
  insane_free(p);
}
else
{
  ... handle error scenario ...
}
If you think using a wrapper macro or function is overkill, and just always manually assigning the pointer to 0 after freeing is the way to go, consider that it's unwieldy to constantly do so, and you may forget to. If the above technique was always used, all use after free bugs would never have occurred in the first place.

Something else to be aware of is that there's nothing wrong with calling free(0). However, calling free() upon a pointer which is not null and not pointing to allocated memory is forbidden and will crash your program. So stick to the advice here, and you may just find memory management became significantly easier.

If all this talk of pointers is beyond you, consider acquiring Understanding and Using C Pointers.

sprintf() and asprintf()

If you do a lot of C programming, at some point, you probably have used the sprintf() function, or its safer counterpart snprintf().

These two functions sprintf() and snprintf() act like printf(), but instead of printing to the standard output, they print to a fixed-length string buffer. Now a fixed-length string buffer is great and all, but what if you wanted something which automatically allocated the amount it needed?

Enter asprintf(), a function which acts like sprintf(), but is auto-allocating, and needs no buffer supplied. This function was invented ages ago by GLIBC, shortly thereafter copied to the modern BSDs, and found its way further still into all sorts of libraries, although is not yet ubiquitous.

Let's compare the prototypes of the two:
int sprintf(char *buffer, const char *format, ...); 
int asprintf(char **ret, const char *format, ...);
The sanest approach would have been for a function like asprintf() to have the prototype of:
char *asprintf(const char *format, ...);
But its creators wanted to make it act like sprintf(), and its design can also be potentially more useful.

Instead of passing asprintf() a buffer, a pointer to a variable of type char * needs to be passed, like so:
char *buffer;
asprintf(&buffer, ...whatever...);
Now how asprintf() actually works is no big secret. The C99 standard specified that snprintf() upon failure should return the amount of characters that would be needed to contain its output. Which means that conceptually something along the following lines would be all that asprintf() needs to do:
char *buffer = malloc(snprintf(0, 0, format, data...)+1);
sprintf(buffer, format, data...);
Of course though, the above taken verbatim would be incorrect, because it mistakenly assumes that nothing can go wrong, such as the malloc() or snprintf() failing.

First let's better understand what the *printf() functions return. Upon success, they return the amount of characters written to the string (which does not include the trailing null byte). Or in other words, the return value is equivalent to calling strlen() on the data being output, which can save you needing to use a strlen() call with sprintf() or similar functions for certain scenarios. Upon failure, for whatever reason, the return is -1. Of course there's the above mentioned exception to this with snprintf(), where the amount of characters needed to contain the output would be returned instead. If during the output, the size overflows (exceeds INT_MAX), many implementations will return a large negative value (failure with snprintf(), or success with all the functions).

Like the other functions, asprintf() also returns an integer of the nature described above. Which means working with asprintf() should go something like this:
char *buffer;
if (asprintf(&buffer, ...whatever...) != -1)
{
  do_whatever(buffer);
  insane_free(buffer);
}
However, unlike the other functions, asprintf() has a second return value, its first argument, or what the function sees as *ret. To comply with the memory management discussion above, this should also be set to 0 upon failure. Unfortunately, many popular implementations, including those in GLIBC and MinGW fail to do so.

Since I develop with the above systems, and I'm using asprintf() in loops with multiple paths, it becomes unwieldy to need to pass around the buffer and the returned int, so I'd of course want saner semantics which don't leave dangling pointers in my program.

In order to correct such mistakes, I would need to take code from elsewhere, or whip up my own function. Now I find developing functions such as these to be relatively simple, but even so, I always go to check other implementations to see if there's any important points I'm missing before I go implement one. Maybe, I'll even find one which meets my standards with a decent license which I can just copy verbatim.

In researching this, to my shock and horror, I came across implementations which properly ensure that *ret is set to 0 upon failure, but the returned int may be undefined in certain cases. That some of the most popular implementations get one half wrong, and that some of the less popular get the other half wrong is just downright terrifying. This means that there isn't any necessarily portable way to check for failure with the different implementations. I certainly was not expecting that, but with the amount of horrible code out there, I guess I really shouldn't be surprised anymore.

Also in the course of research, besides finding many implementations taking a non-portable approach, many have problems in all sorts of edge cases. Such as mishandling overflow, or not realizing that two successive calls to a *printf() function with the same data may not necessarily yield the same results. Some try to calculate the length needed with some extra logic and only call sprintf() once, but this logic may not be portable, or always needs updating as new types are added to the format string as standards progress, or the C library decided to offer new features. Some of the mistakes I found seem to be due to expecting a certain implementation of underlying functions, and then later the underlying functions were altered, or the code was copied verbatim to another library, without noting the underlying functions acted differently.

So, once again, I'm finding myself needing to supply the world with more usable implementations.

Let's dive into how to implement asprinf().

Every one of these kind of functions actually has two variants, the regular which takes an unlimited amount of arguments, and the v variants which take a va_list (defined in stdarg.h) after the format argument instead. These va_lists are what ... gets turned into after use, and in fact, every non-v *printf() function is actually wrapped to a counterpart v*printf() function. This makes implementing asprintf() itself quite straight forward:



To fix the aforementioned return problems, one could also easily throw in here a check upon the correct return variable used in the underlying vasprintf() implementation and use it to set the other. However, that's not a very portable fix, and the underlying implementation of vasprintf() can have other issues as described above.

A straight forward implementation of vasprintf() would be:



As long as you have a proper C99 implementation of stdarg.h and vsnprintf(), you should be good to go. However, some systems may have vsnprintf() but not va_copy(). The va_copy() macro is needed because a va_list may not just be a simple object, but have handles to elsewhere, and a deep copy is needed. Since vsnprintf() being passed the original va_list may modify its contents, a copy is needed because the function is called twice.

Microsoft Visual C++ (MSVC, or Microsoft Vs. C++ as I like to think of it) up until the latest versions has utterly lacked va_copy(). This and several other  systems that lack it though usually have simple va_lists that can be shallow copied. To gain compatibility with them, simply employ:


#ifndef va_copy 
#define va_copy(dest, src) dest = src 
#endif

Be warned though that if your system lacks va_copy(), and a deep copy is required, using the above is a recipe for disaster.

Once we're dealing with systems where shallow copy works though, the following works just as well, as vsnprintf() will be copying the va_list it receives and won't be modifying other data.



Before we go further, there's two points I'd like to make.
  • Some implementations of vsnprintf() are wrong, and always return -1 upon failure, not the size that would've been needed. On such systems, another approach will need to be taken to calculate the length required, and the implementations here of vasprintf() (and by extension asprintf()) will just always return -1 and *ret (or *strp) will be 0.
  • The code if ((r < 0) || (r > size)) could instead be if (r != size), more on that later.
Now on Windows, vsnprintf() always returns -1 upon failure, in violation of the C99 standard. However, in violation of Microsoft's own specifications, and undocumented, I found that vsnprintf() with the first two parameters being passed 0 as in the above code actually works correctly. It's only when you're passing data there that the Windows implementation violates the spec. But in any case, relying on undocumented behavior is never a good idea.

On certain versions of MinGW, if __USE_MINGW_ANSI_STDIO is defined before stdio.h is included, it'll cause the broken Windows *printf() functions to be replaced with C99 standards compliant ones.

In any case though, Windows actually provides a different function to retrieve the needed length, _vscprintf(). A simple implementation using it would be:



This however makes the mistake of assuming that vsnprintf() is implemented incorrectly as it currently is with MSVC. Meaning this will break if Microsoft ever fixes the function, or you're using MinGW with __USE_MINGW_ANSI_STDIO. So better to use:



Lastly, let me return to that second point from earlier. The vsnprintf() function call the second time may fail because the system ran out of memory to perform its activities once the call to malloc() succeeds, or something else happens on the system to cause it to fail. But also, in a multi-threaded program, the various arguments being passed could have their data change between the two calls.

Now if you're calling functions while another thread is modifying the same variables you're passing to said function, you're just asking for trouble. Personally, I think that all the final check should do is ensure that r is equal to size, and if not, something went wrong, free the data (with insane_free() of course), and set r to -1. However, any value between 0 and size (inclusive), even when not equal to size means the call succeeded for some definition of success, which the above implementations all allow for (except where not possible Microsoft). Based on this idea, several of the implementations I looked at constantly loop while vsnprintf() continues to indicate that the buffer needs to be larger. Therefore, I'll provide such an implementation as well:



Like the first implementation, if all you lacked was va_copy(), and shallow copy is fine, it's easy to get this to work on your platform as described above. But if vsnprintf() isn't implemented correctly (hello MSVC), this will always fail.

All the code here including the appropriate headers, along with notes and usage examples are all packed up and ready to use on my asprintf() implementation website. Between everything offered, you should hopefully find something that works well for you, and is better than what your platform provides, or alternative junk out there.

As always, I'm only human, so if you found any bugs, please inform me.

Sunday, June 22, 2014

Avoid incorrect ChaCha20 implementations

ChaCha20 is a stream cipher which is gaining a lot of popularity of late. Practically every library today which provides ciphers seems to have it as an addition in their latest releases.

In cryptography, there are two kinds of ciphers, block ciphers and stream ciphers. Block ciphers are where the underlying algorithm works with data with a certain fixed chunk size (or block). Popular blocks sizes are 16 and 64 bytes. Stream ciphers are effectively block ciphers where the chunk size is a single byte.

Classical stream ciphers, such as RC4, can work with data of arbitrary size, although every single byte is dependent on every previous byte. Which means encryption/decryption cannot begin in the middle of some data, and maintain compatibility where some other starting point was used. Block ciphers generally can have their blocks encrypted and decrypted arbitrarily, with none dependent upon any other, however, they cannot work with data of arbitrary size.

In order to allow block ciphers to work with data of arbitrary size, one needs to pad the data to be encrypted to a multiple of the block size. However, a clever alternative is counter mode.

Different modes for working with block ciphers exist. Some try to improve security by making each block depend on every other, some utilize various interesting techniques for other properties.  Counter mode does not encrypt the desired data (the plaintext) directly, rather, an ever incrementing counter is encrypted. The result of this encryption is then xored with the desired data.

Counter mode effectively turns a block cipher into a stream cipher, as the plaintext is never actually passed to the block cipher. Rather, a counter which is a multiple of the block size is used. One can always xor bytes with an arbitrary size, and since that is the only step in counter mode against the plain text, it is effectively a stream cipher. Since the underlying cipher can be a block cipher with no dependency between blocks, this kind of stream cipher also allows one to jump ahead to any particular multiple of the block size in the data, and begin encryption/decryption from there.

Now while ChaCha20 is advertised as a stream cipher, it's actually designed as a block cipher in counter mode. The internal design mostly mirrors that of typical counter mode design, except that the counter components are directly fused with a large but simple block cipher. Since it's really a block cipher, it has an internal block size, and also allows one to jump ahead to some multiple of it.

Since ChaCha20 is considered to have a great level of security, and all these other wonderful properties, it's starting to see a lot of use. However, practically every implementation I'm seeing is either utterly broken, or has some ridiculous API.

Common ChaCha20 implementation mistakes:
  • Implemented as a typical block cipher, not allowing usage with arbitrary amounts of bytes, or worse, the API allows for it, but produces incorrect results.
  • Implemented as a typical stream cipher with no way to jump ahead.
  • Failing on Big-Endian systems.

The first mistake I listed is the most common. If some software is only using ChaCha20 internally, and always using it in a multiple of its block size (or it's all the crummy API offers), then things are fine. But if it's a library which is inviting others to use it, and it can be used incorrectly, expect disaster to ensue.

The reference implementation of ChaCha20 was designed that an arbitrary amount of data can be encrypted, as long as all but the last usage of the API was a multiple of the block size. This was also mentioned in its documentation. However, practically every other implementation out there copies this design in some way, but makes no note of it. Worse yet, some libraries are offering ChaCha20 with this implementation flaw alongside other stream ciphers with an identical API whereas those can be used arbitrarily throughout.

Essentially, this means if you're using ChaCha20 right now in a continuous fashion with chunks of various sizes, your data is being encrypted incorrectly, and won't be interoperable with other implementations. These broken implementations are able to output exactly one chunk correctly which is not a multiple of the block size, which destroys their internal buffers, and screws up every output thereafter.

I noticed a similar situation with hash algorithm implementations several years back. However, most hash implementations are fine. Yet with ChaCha20, practically every implementation I looked at recently was broken.

Since this situation cannot stand, especially with ChaCha20 gaining speed, I am providing a simple implementation without these flaws. This implementation is designed to be correct, portable, and simple. (Those wanting an optimized version of this should consider paying for more optimized routines)

Usage of the C99 API I designed is as follows:

Custom type: chacha20_ctx
This type is used as a context for a state of encryption.

To initialize:
void chacha20_setup(chacha20_ctx *ctx, const uint8_t *key, size_t length, uint8_t nonce[8]);

The encryption key is passed via a pointer to a byte array and its length in bytes. The key can be 16 or 32 bytes. The nonce is always 8 bytes.

Once initialized, to encrypt data:
void chacha20_encrypt(chacha20_ctx *ctx, const uint8_t *in, uint8_t *out, size_t length);

You can pass an arbitrary amount of data to be encrypted, just ensure the output buffer is always at least as large as the input buffer. This function can be called repeatedly, and it doesn't matter what was done with it previously.

To decrypt data, initialize, and then call the decryption function:
void chacha20_decrypt(chacha20_ctx *ctx, const uint8_t *in, uint8_t *out, size_t length);

For encryption or decryption, if you want to jump ahead to a particular block:
void chacha20_counter_set(chacha20_ctx *ctx, uint64_t counter);

Counter is essentially the number of the next block to encrypt/decrypt. ChaCha20's internal block size is 64 bytes, so to calculate how many bytes are skipped by a particular counter value, multiply it by 64.

In addition to just providing a library, I gathered the test vectors that were submitted for various RFCs, and included a series of unit tests to test it for correctness.

For fun, since I'm also playing around a bit with LibreSSL these days, I wrapped its API up in the API I described above. The wrapper is included in my package with the rest of the code, however it is currently not designed for serious usage outside of the included test cases.

Since I already whipped up some unit tests that anyone can use, I'll leave it as an exercise to the reader to determine which libraries are and aren't implemented correctly.

I tried to ensure my library is bug free, but I am only human. If you find a mistake, please report it.

Wednesday, May 21, 2014

LibreSSL porting update

I've recently covered some issues with LibreSSL and some common porting mistakes.

Since these articles came out, I've noticed two broken ports I saw prior seem to have vanished. One port has seen significant improvement in response to these articles, although still has significant concerns. And worst of all, more ports are popping up.

The official team has since reiterated some of these concerns, and I also wrote two articles regarding some concerns with random data.

Unfortunately, many of these ports are continuing to rely on arc4random() implementations on certain OSs or from certain portability libraries. These OSs or libraries may be copying some or all of the code from OpenBSD, but they are not copying the implementation.

To demonstrate this, let's see how different implementations of arc4random() work across fork() using the following test code:

/*
Blogger is refusing to allow me to list the headers without trying to escape the signs.
So they are: stdio.h, stdlib.h, stdint.h, unistd.h, sys/wait.h
And on Linux: bsd/stdlib.h
*/

int main()
{
  int children = 3;
  pid_t pid = getpid();

  printf("parent process %08x: %08x %08x\n", (uint32_t)pid, arc4random(), arc4random());
  fflush(stdout);

  while (children--)
  {
    pid_t pid = fork();
    if (pid > 0) //Parent
    {
      waitpid(pid, 0, 0);
    }
    else if (pid == 0) //Child
    {
      pid = getpid();
      printf(" child process %08x: %08x %08x\n", (uint32_t)pid, arc4random(), arc4random());
      fflush(stdout);
      _exit(0);
    }
    else //Error
    {
      perror(0);
      break;
    }
  }
  printf("parent process %08x: %08x %08x\n", (uint32_t)pid, arc4random(), arc4random());
  fflush(stdout);

  return(0);
}


OpenBSD (the reference implementation):
parent process 0000660d: beb04672 aa183dd0
 child process 00001a2a: e52e0b25 764966bb
 child process 00007eb7: 27619dd1 a7c0df81
 child process 000039f5: 33daf1f1 4524c6c6
parent process 0000660d: 1eb05b45 d3956c43
Linux with libbsd 0.6:
parent process 000031cb: 2bcaaa9a 01532d3f
 child process 000031cc: 3b43383f 4fbbb4d5
 child process 000031cd: 3b43383f 4fbbb4d5
 child process 000031ce: 3b43383f 4fbbb4d5
parent process 000031cb: 3b43383f 4fbbb4d5
 NetBSD 6.1.2:
parent process 0000021a: 4bc81424 958bf90f
 child process 0000021f: c0681a36 5a3f8bdb
 child process 00000022: c0681a36 5a3f8bdb
 child process 000001fc: c0681a36 5a3f8bdb
parent process 0000021a: c0681a36 5a3f8bdb
FreeBSD 9.2:
parent process 0000032e: 03d19ad2 543c5fa4
 child process 0000032f: 6e3a1214 57b74381
 child process 00000330: 6e3a1214 57b74381
 child process 00000331: 6e3a1214 57b74381
parent process 0000032e: 6e3a1214 57b74381
DragonFlyBSD 3.4.3:
parent process 0000030a: cb987922 8f94fb58
 child process 0000030b: 65047965 1ebdc52b
 child process 0000030c: 65047965 1ebdc52b
 child process 0000030d: 65047965 1ebdc52b
parent process 0000030a: 65047965 1ebdc52b

So in looking at this data, one can see that on OpenBSD the random data is utterly different between the parent and all the various children. However, in all the ports of the function, the parent and children all share the exact same state after the fork() call. This situation is fine for single-process programs, but is a disaster in multi-process ones.

Since LibreSSL is having its random needs all being supplied by arc4random*(), and it can be used by multi-process servers, there is a serious porting problem here.

I covered this problem without elaboration in my previous article. See there for some solutions. The OpenBSD team is saying randomness is the responsibility of the OS, and for many of the issues involved, they are quite right. However, remember your ports cannot necessarily rely on the random functions provided by your OS. Even if the OSs fix them in future versions, one still has to be mindful of porting to older ones, so ensure your port doesn't rely too much on the OS.

Tuesday, May 20, 2014

Dealing with randomness

Two weeks ago, I wrote an article regarding randomness on UNIX systems and libraries. In it, I dealt with some theoretical API issues, and real world issues of libraries being horribly misdesigned. Today I'd like to focus more on the current state of things, and further discuss real world problems.

Randomness

To start, let us understand what randomness means. Any perceived randomness on your part is your inability to track all the variables. Meaning that so called randomness to us mere mortals is something that we do not know nor can predict, at least not with the tools we have at our disposal.

The ideas behind randomness and determinism have long been battled over by philosophers and scientists, with the former debating whether it's possible for such things to exist, and the latter hunting for its existence. But thankfully, in terms of cryptography, even though randomness doesn't exist, we can make believe it does, and we can generate data that appears random to anyone from the outside. We can do so by using variables that are hard to track and influence, and extremely confusing algorithms.

Collecting unpredictable values is a challenge, and nothing is ever as unpredictable as one might think. Even if we were to look to quantum mechanics, which today is believed to be the closest thing to random that we know of, and measure some quantum behavior, the data might still be predicable, and therefore not random enough for cryptography needs. Cryptography Engineering pages 138-139 covers this in more detail, without even getting into possible advancements in the field. All the more so, being completely unpredictable is quite challenging using more conventional means.

Essentially, we're left with trying to do our best, without really being able to ensure we're doing things well. The less information we leak, and more unpredictable we behave, the better.

Expectations


We tend to make assumptions. We make assumptions that the people behind Linux know how to create a great operating system. We assumed the OpenSSL team knew how to create a secure library. We assume the OpenBSD developers know safe software engineering. However, assumptions may be wrong. Never assume, research and test it.

Like others, I tend to assume that different pieces of software were written correctly. A couple of years back, I was having trouble with a web server in conjunction with IPv6, which was built on top of a network server library. I assumed the library written by supposed experts with IPv6 knowledge knew what they were doing, and blamed the newness of IPv6, and assumed the issue was not with the library but with the network stacks in the OS.

For unrelated reasons I decided to improve my network development knowledge and purchased Unix Network Programming 3rd Edition, which is now over a decade old, yet includes instructions on properly working with IPv6. Turns out I was able to fix the bug in the networking library I was using by modifying a single line in it. After reading this book, I also realized why I was having some rare hard to reproduce bug in another application, and fixed the issue in two minutes.

The aforementioned book is considered the bible in terms of sockets programming. It explains common gotchas, and shows how to build robust networking software which works properly. We would assume the developers of a major networking library would have read it, but experience shows otherwise.

I hardly need to elaborate how this has applied elsewhere (hello OpenSSL).

False messiahs

Most software is written by people who aren't properly trained for their positions. They either get thrust into it, or dabble in something for fun, and it turns into something popular which people all over the globe use regularly. Suddenly popularity becomes the measurement for gauging expertise. Some kid with a spare weekend becomes the new recognized authority in some field with no real experience. People who end up in this position tend to start doing some more research to appear knowledgeable, and receive unsolicited advice, which to some extent propels them to be an expert in their field, at least at a superficial level. This can then further reinforce the idea that the previously written software was correct.

Situations like this unfortunately even apply to security libraries and software which needs to be secure. We can read a book, which covers some important design nuance, and we'll think to ourselves that if we laymen read the popular book, surely those who need to be aware of its contents did, and assume the software we depend on every day is written properly. I've read a few books and papers over the years on proper random library gotchas, design, and more, yet used to simply use OpenSSL's functions in my code, as I assumed the OpenSSL developers read these books too and already took care of things for me.

Recently, the OpenBSD team actually reviewed OpenSSL's random functions, and realized that no, the OpenSSL developers did not know what they were doing. Since the Debian key generation fiasco a couple of years ago revealed they were using uninitialized variables as part of randomness generation, I suspected they didn't know what they were doing, but never bothered looking any deeper. It's scary to realize you may in fact know how to handle things better than the so-called experts.

I wrote an article the other day regarding how to protect private keys. A friend of mine after reading it asked me incredulously: You mean Apache isn't doing this?!

We have to stop blindly believing in certain pieces of software or developers. We must educate ourselves properly, and then ensure everything we use lives up to the best practices we know about.

UNIX random interfaces

Linux invented two interfaces for dealing with random data which were then copied by the other UNIX-like operating systems. /dev/random which produces something closely related to what the OS believes is random data it collected, and /dev/urandom which produces an endless stream of supposedly cryptographically secure randomish data. The difference in the output between the two of them should not be discernible, so theoretically, using one in place of the other should not make a difference.

There's a ton of websites online which tell developers to use only /dev/urandom, since the whole point of a cryptographically-secure pseudo-random number generator is to appear random. So who needs /dev/random anyway, and finite amounts of randomishness is problematic, so everyone should just use something which is unlimited. Then the Linux manual page will be blamed as a source of fostering confusion for suggesting there's situations that actually require /dev/random.

Now those who have an ounce of critical thinking within themselves should be questioning, if there's never a point with /dev/random, why did Linux invent it in the first place? Why does it continue to provide it with the same semantics that it always had? In fact, the manual page is doing nothing more than paraphrasing and explaining the comments in the source code:
 * The two other interfaces are two character devices /dev/random and
 * /dev/urandom.  /dev/random is suitable for use when very high
 * quality randomness is desired (for example, for key generation or
 * one-time pads), as it will only return a maximum of the number of
 * bits of randomness (as estimated by the random number generator)
 * contained in the entropy pool.
 *
 * The /dev/urandom device does not have this limit, and will return
 * as many bytes as are requested.  As more and more random bytes are
 * requested without giving time for the entropy pool to recharge,
 * this will result in random numbers that are merely cryptographically
 * strong.  For many applications, however, this is acceptable. 
 
There's actually a number of problems with pseudo-random number generators. They leak information about their internal states as they output their data. Remember, data is being generated from an internal state, there's an input which generates the stream of output, it's not just magically created out of thin air. The data being output will also eventually cycle. Using the same instance of a pseudo-random number generator over and over is a bad idea, as its output will become predictable. Not only will its future output become predictable, anything it once generated will also be known. Meaning pseudo-random number generators lack what is known as forward secrecy.

Now if you believe the hype out there that for every situation, one should only use /dev/urandom, then you're implying you don't trust the developers of the Linux random interfaces to know what they're talking about. If they don't know what they're talking about, then clearly, they also don't know what they're doing. So why are you using any Linux supplied random interface, after all, Linux obviously handles randomness incorrectly!

FreeBSD actually makes the above argument, as they only supply the /dev/urandom interface (which is known as /dev/random on FreeBSD), which uses random algorithms created by leading cryptographers, and nothing remotely like what Linux is doing. Each of the BSDs in fact claim that their solution to randomness is superior to all the other solutions found among competing operating systems.

Linux on the other hand takes an approach where it doesn't trust the cryptographic algorithms out there. It has its own approach where it collects data, estimates the entropy of that data, and uses its own modified algorithms for producing the randomness that it does. Cryptographers are constantly writing papers about how non-standard Linux is in what it does, and numerous suggestions are made to improve it, in one case, even a significant patch.

Now I personally dislike Linux's approach to throw cryptographic practice out the window, but on the other hand, I like their approach in not putting too much faith into various algorithms. I like FreeBSD's approach to using well designed algorithms by cryptographers, but I dislike their inability to ensure the highest levels of security for when you really need it.

The cryptographers out there are actually divided on many of the issues. Some believe entropy estimation is utterly flawed, and cryptographically strong algorithms are all you need. Others believe that such algorithms are more easily attacked and prone to catastrophic results, and algorithms combined with pessimistic estimators and protective measures should be used.

In any case, while the various operating systems may be basing themselves on various algorithms, are they actually implementing them properly? Are they being mindful of all the issues discussed in the various books? I reviewed a couple of them, and I'm not so sure.

A strong point to consider for developers is that even though OS developers may improve their randomness in response to various papers those developers happen to stumble across or gets shoved in their faces, it doesn't necessarily mean you're free from worrying about the issues in your applications. It's common to see someone compiling and using some modern software package on say Red Hat Enterprise Linux 5, with its older version of Linux. I recently got a new router which allows one to SSH into it. Doing so, I saw it had some recentish version of BusyBox and some other tools on it, but was running on Linux 2.4!

Situations to be mindful of

There are many situations one must be mindful of when providing an interface to get random data. This list can be informative, but is not exhaustive.

Your system may be placed onto a router or similar device which generates private keys upon first use. These devices are manufactured in bulk, and are all identical. These devices also generally lack a hardware clock. In typical operating conditions, with no special activity occurring, these devices will be generating identical keys, there's nothing random about them.

Similar to above, a full system may have an automated install process deploying on many machines. Or an initial virtual machine image is being used multiple times. (Do any hosting companies make any guarantees about the uniqueness of a VM you purchase from them? Are preinstalled private keys identical for all customers?)

A system may be designed to use a seed file, a file which contains some random data, which is stored on disk to ensure a random state for next boot-up. The system may be rebooted at some point after the seed file is used, but before it is updated, causing an identical boot state.

There can be multiple users on a system, which can influence or retrieve random data. Other users may very well know the sequences generated before and after some data was generated for another user. That can then be used in turn to determine what random data was generated for the other user.

Hardware to generate random data may contain a backdoor, and should not be fully trusted.

Hardware to generate random data may break, or be influenced in some manner, causing the generated data to not actually be random.

Two machines next to each other may be measuring the same environmental conditions, leading one machine to know the random data being generated for the other.

A process may be inheriting its state from its parent, where both of them will be generating the same sequences, or multiple children will be generating the same sequences.

Wall-clock time can be adjusted, allowing the same time to occur multiple times, which in turn will lead to identical random states where wall-clock time is the only differentiating factor.

Possible Solutions

A proper solution for identical manufacturing or deployments would be to write a unique initial state to each one. However, this cannot be relied upon, as it requires compliance by the manufacturer or deployer, which can increase costs and complexity, and good tools to do so are not available.

Devices generally have components which contain serial numbers. These serial numbers should be mixed into initial states, minimally ensuring that identical devices do not have identical states. As an example, routers will have different MAC addresses. They may even have multiple MAC addresses for different sides of the router, or for wireless. Be aware however that it is possible for an outsider to collect all the MAC addresses, and thus reveal the initial state.

If a hardware clock is available, it should be mixed into the boot-up state to differentiate the boot-up state from previous boots and other identical machines that were booted at different times.

Random devices should not emit data until some threshold of assurances are reached. Linux and NetBSD provide an API for checking certain thresholds, although race conditions make the API a bit useless for anything other than ensuring the devices are past an initial boot state. FreeBSD now ensures its random device does not emit anything till some assurances are met, but older versions did not carry this guarantee. Also be wary of relying on /dev/random for assurances here, as my prior random article demonstrated that the device under that path may be swapped for /dev/urandom, a practice done by many sysadmins or device manufacturers.

Seed-files should not solely be relied upon, as embedded devices may not have them, in addition to the reboot issue. Above techniques should help with this.

The initial random state provided to applications should be different for each user and application, so they are not all sharing data in the same sequence.

Hardware generators should not be used without some pattern matching and statistical analysis to ensure they are functioning properly. Further, what they generate should only be used as a minor component towards random data generation, so they cannot solely influence the output, and prevent machines in the same environment producing the same results.

Due to application states being cloned by fork() (or similar), a system wide random interface can be more secure than an application state (thus OpenSSL when directly using /dev/urandom can be more secure than various flavors of LibreSSL). For application level random interfaces, they should have their states reset upon fork(). Mixing in time, process ID, and parent process ID can help.

Always use the most accurate time possible. Also, mix in the amount of time running (which cannot be adjusted), not just wall-clock time.

Many of the above solutions alone do not help much, as such data is readily known by others. This is a common library mistake where libraries tend to try system random devices, and if that fails, fall back on some of these other sources of uniquish data. Rather, system random devices should be used, which mix in all these other sources of uniquish data, and upon failure, go into failure mode, not we can correct this extremely poorly mode.

Usage scenarios

Not only does random data have to be generated correctly, it has to be used correctly too.

A common scenario is attempting to get a specific amount of random values. The typical approach is to divide the random value returned by the amount of possible values needed, and take the remainder. Let me demonstrate why this is wrong. Say your random number generator can generate 32 possible values, meaning any number from 0 to and including 31, and we only need to choose between five possible values.


As can be seen from this chart, modulo reducing the random number will result in 0 and 1 being more likely than 2, 3, 4. Now if we were to suppose lower numbers are inherently better for some reason, we would consider such usage fine. But in that scenario, our random number generator should be:
 function GetInherentlyGoodRandomNumber()
{
   return 0 //Lower numbers are inherently better!
}
Being that the above is obviously ridiculous, we want to ensure equal distribution. So in the above scenario, if 30 or 31 occurs, a new random number should be chosen.

Similar issues exist with floating point random numbers. One generally uses floating point random numbers when one wants to use random values with certain probabilities. However, typical algorithms do not provide good distributions. There is an entire paper on this topic, which includes some solutions.

The C library's fork() call (when it exists) should be resetting any random state present in C library random functions, but unfortunately that's not done in the implementations I've checked (hello arc4random). The issue extends to external libraries, which most of them never even considered supplying secure_fork() or similar. Be wary to never use fork() directly. You almost always want a wrapper which handles various issues, not just random states.

Further reading

Pretty much everything I stated here is a summary or rehash of information presented in important books and papers. Now based on my past experiences that I described above, you're probably not going to read anything further. However, if the topics here apply to you, you really should read them.


Saturday, May 17, 2014

Protecting private keys

Web servers use private keys which they alone have in order to secure connections with users. Private keys must be protected at all costs.

In order to protect private keys on disk, one generally encrypts them with a password, which is then needed by the web server upon launch in order to decrypt and use it in memory. However, if measures aren't taken to secure the memory containing the private key, it can be stolen from there too, which would be catastrophic.

Normally, one doesn't need to worry about outsiders getting a hold of data from memory unless the attackers have direct access to the server itself. But bugs like Heartbleed allow remote users to grab random data from memory. Once data is in memory, the application and all the libraries it uses could divulge secret data if there is a buffer overflow lurking somewhere.

To protect against exploiting such bugs, one should ensure that buffer overflows do not have access to memory containing private data. The memory containing private keys and similar kinds of data should be protected, meaning nothing should be allowed to read from them, not even the web server itself.

Now obviously a program needs some access to a private key in order to work with it, so it can't just prevent all access from it. Rather, once a private key or similar data is loaded into memory, that memory should have its read permissions removed. When, and only when some activity needs to be performed with the private key, read permissions can be restored, the activity performed, and then read permissions revoked. This will ensure the rest of the application cannot access what it does not need, nor should be allowed to access.

On UNIX systems, one can use mprotect() to change the permission on a page of memory. On Windows, one can use VirtualProtect().

The above however has a crucial flaw - multi-threading. In a threaded application, all threads have access to data of other threads. So while one thread may be performing some critical private key related code, and allows read access for the moment, another thread can read it outside of the critical portion of code too. Therefore, even more isolation is needed.

To truly isolate the code that uses a private key and similar data, all the code that handles that stuff should be placed into its own process. The rest of the application can then request that well defined activities be performed via a pipe or other form of inter-process communication. This will also ensure that other kinds of bugs in the application, such as buffer overflows that allow arbitrary code execution cannot reestablish read access to the secret data.

On UNIX systems, one can use fork() to create a process which is still part of the same application. On all systems, the separate process can be a separate application with a well defined restrictive IPC API with limited and secured access by the web server.

No insecure library or silly bug in your web server should ever allow the application to divulge such secrets. If such services are not utilizing the techniques above, then they're just biding their time until the next Heartbleed.

Saturday, May 10, 2014

Copying code != Copying implementation

A little over a week ago, I wrote an article about some common porting mistakes for a new library - LibreSSL. I've since received multiple questions about copying code from other projects. Mainly, if the original project uses certain code or another popular library is using some compatibility code, what is wrong with copying one of those directly?

The most obvious problem with copying from some compatibility library is that they may not be written properly. The primary concern behind most compatibility libraries is that things should compile. Just because things compile doesn't mean errors and all scenarios are being handled properly, or that the implementation is in any way secure. The popular libbsd and many GNU library shims may seem to get the job done, but only superficially.

However, the real problem is that copying some code does not mean you copied the implementation. There's a lot more to an implementation than just some code. For example, perhaps the source in question is being compiled with certain parameters or with special compilers, and the code is written specifically for that situation. If you copied the code but not the build environment, you only nabbed part of the implementation.

Another point to consider is if the code you copied is even used how you think it is. Let's look at a function from NetBSD:

int
consttime_memequal(const void *b1, const void *b2, size_t len)
{
 const char *c1 = b1, *c2 = b2;
 int res = 0;

 while (len --)
  res |= *c1++ ^ *c2++;

 /*
  * If the compiler for your favourite architecture generates a
  * conditional branch for `!res', it will be a data-dependent
  * branch, in which case this should be replaced by
  *
  * return (1 - (1 & ((res - 1) >> 8)));
  *
  * or rewritten in assembly.
  */
 return !res;
}
Note how the comment here says that this code may not function correctly on all architectures. In fact, some architectures may have an implementation in assembly which works correctly, whereas the C version does not.

NetBSD, like most C libraries, offers assembly variants for certain functions on certain architectures. If you copy a C file from it without the assembly files for the architecture in question, then the implementation was not copied, and in fact, may even produce incorrect results. You can't always depend on scanning comments either, as previous versions of this C file didn't contain the comment.

Now, let's say you copied files and their build environment, including possible assembly variants, does that now mean the implementation was definitively copied? No! Some functions used in the copied code may actually depend on certain characteristics within this particular implementation. Let me provide an example. Let's say consttime_memequal() was implemented as follows:

int consttime_memequal(const void *s1, const void *s2, size_t n)
{
   return !memcmp(s1, s2, n);
}
consttime_memequal() must run in a constant amount of time regardless if whether its two parameters are equal or not. Yet this version here wraps directly to memcmp(), which does not make such guarantees. Assuming this version was correct on the platform in question, it is correct because of an external factor, not because of the C code presented here.

Some compilers may provide their own version of memcmp() which they'll use instead of the C library's, such as GCC. Perhaps some compiler provides memcmp(), along with a compile option to make all memcmp() operations constant-time. The file containing this function can then be compiled with constant-time-memcmp enabled, and now everything else can call consttime_memequal() without any special compilers or build options.

Now while the above is possible, it's not the only possibility. Perhaps memcmp() on the platform in question just happens to be constant-time for some reason. In such a scenario, even if you used the exact same compiler with the exact same build options, and the exact same code, your implementation is still incorrect, because your underlying functions behave differently.

Bottom line, just copying code is incorrect. If some function is supposed to be doing something with certain properties beyond just fulfilling some activity (perhaps constant time, or defying compiler optimizations), then you must review the entire implementation and determine what aspect of it gives it this property. Without doing so, you're only fooling yourself into believing you copied an implementation.

Implementation copying checklist:
  • Understand what you're attempting to copy, and what properties it carries.
  • Ensure what you're copying actually performs its objectives.
  • Ensure you copy the entire implementation.
  • Ensure there's nothing about the compiler, compile options, or other aspects of the build environment which you forgot to copy.
  • Ensure you copy any needed alternatives for certain architectures.
  • Ensure the code does not depend on different implementations of functions you already have.
  • Test everything for correctness, including matching output for error situations, and extra properties.

Saturday, May 3, 2014

A good idea with bad usage: /dev/urandom

Last week, I wrote two articles pointing out issues with unofficial porting efforts of LibreSSL. In these articles, I highlighted some issues that I currently see going on with some of these projects.

In the second article, I called attention to poor arc4random_buf() implementations being created, specifically saying: "Using poor sources of entropy like /dev/urandom on Linux, or worse, gettimeofday(), and using them to generate long-lived keys." Which seemed to irk some. Now for those that care to understand the quoted issue in its proper context should go look for the current LibreSSL porting projects they can find via Google, and review their source. However, most seem to be understanding this as some kind of argument about /dev/random versus /dev/urandom. It isn't. So without further ado:

Poorly seeding a cryptographically secure pseudorandom number generator (CSPRNG)

In order for many forms of cryptography to work properly, they depend on secrecy, unpredictability, and uniqueness. Therefore, we need a good way to use many unpredictable values.

Now randomness doesn't really exist. When we humans see something as random, it's only because we don't know or understand all the details. Therefore, any perceived randomness on your part is your inability to track all the variables.

Computers are very complex machines, where many different components are all working independently, and in ways that are hard to keep track of externally. Therefore, the operating system is able to collect variables from all the various hardware involved, their current operating conditions, how they handle certain tasks, how long they take, how much electricity they use, and so on. These variables can now be combined together using very confusing and ridiculous algorithms, which essentially throw all the data through a washing machine and the world's worst roller coaster. This result is what we call entropy.

Now entropy might not be that large, and can only provide a small amount of unique random values to play with. However, a small amount of random values can be enough to create trillions of pseudo-random values. To do so, one uses some of the aforementioned ridiculous algorithms to produce two values. One value is never seen outside the pseudo-random number generator, and is used as part of the calculations for the next time a random value is requested, and the other value is output as the pseudo-random value generated for a single request.




A construct of this nature can allow an unlimited amount of "random" values to be generated. As long as this technique never repeats and is unpredictable, then it is cryptographically secure. Of course since the algorithm is known, the entropy seeding it must be a secret, otherwise it is completely predictable.

Different algorithms have different properties. OpenBSD's arc4random set of functions are known to be able to create a very large amount of good random values from a very little amount of entropy. Of course the better entropy it is supplied with, the better it can perform, so you'll always want the best entropy possible. As with any random number generator, supply it with predictable values, and its entire security is negated.

arc4random and /dev/(u)random

So, how does one port the arc4random family to Linux? Linux is well known for inventing and supplying two default files, /dev/random and /dev/urandom (unlimited random). The former is pretty much raw entropy, while the latter is the output of a CSPRNG function like OpenBSD's arc4random family. The former can be seen as more random, and the latter as less random, but the differences are extremely hard to measure, which is why CSPRNGs work in the first place. Since the former is only entropy, it is limited as to how much it can output, and one needing a lot of random data can be stuck waiting a while for it to fill up the random buffer. Since the latter is a CSPRNG, it can keep outputting data indefinitely, without any significant waiting periods.

Now theoretically, one can make the arc4random_buf() function a wrapper around /dev/urandom, and be done with it. The only reason not to is because one may trust the arc4random set of algorithms more than /dev/urandom. In that case, would /dev/urandom be trusted enough to seed the arc4random algorithms, which is then in turn used for many other outputs, some of which end up in RSA keys, SSH keys, and so on? I'll leave that question to the cryptographic experts. But I'll show you how to use /dev/urandom poorly, how to attack the design, and corrective efforts.

First, take a look at how one project decided to handle the situation. It tries to use /dev/urandom, and in a worse case scenario uses gettimeofday() and other predictable data. Also, in a case where the read() function doesn't return as much as requested for some reason, but perhaps returned a decent chunk of it, the gettimeofday() and getpid() calls will overwrite what was returned.

This very much reminds me of why the OpenBSD team removed the OpenSSL techniques in the first place. You do not want to use a time function as your primary source of entropy, nor with range limited values, and other dumb things. If you're going to use time, at the very least use clock_gettime() which provides time resolution that is 1,000 times better than gettimeofday(), and can provide both current time, and monotonic time (time which cannot go backwards). Additionally, both gettimeofday() and clock_gettime() on 64-bit systems will return values where the top half of their data will be zeroed out, so you'll want to ensure you throw that away, and not just use their raw data verbatim. Further, how this one project uses /dev/urandom, like most other projects, is terrible.

Attacking /dev/(u)random usage

A naive approach to use /dev/urandom (and /dev/random) is as follows:

int fd = open("/dev/urandom", O_RDONLY);
if (fd != -1)
{
  uint8_t buffer[40];
  if (read(fd, buffer, sizeof(buffer)) == sizeof(buffer)))
  {
    //Do what needs to be done with the random data...
  }
  else
  {
    //Read error handling
  }
  close(fd);
}
else
{
  //Open error handling
}

This tries to open the file, ensures it was opened, tries to read 40 bytes of data, and continues what it needs to do in each scenario. Note, 40 bytes is the amount arc4random wants, which also happens to be 8 bytes more than the Linux manual page says you should be using at a time from its random device.

The first common mistake here is using read() like this. read() can be interrupted. Normally it can't be interrupted for regular files, but this device is not a regular file. Some random device implementations specifically document that read() can be interrupted upon them.

So now our second attempt:

//Like read(), but keep reading upon interuption, until everything possible is read
ssize_t insane_read(int fd, void *buf, size_t count)
{
  ssize_t amount_read = 0;
  while ((size_t)amount_read < count)
  {
    ssize_t r = read(fd, (char *)buf+amount_read, count-amount_read);
    if (r > 0) { amount_read += r; }
    else if (!r) { break; }
    else if (errno != EINTR)
    {
      amount_read = -1;
      break;
    }
  }
  return(amount_read);
}

int success = 0;
int fd = open("/dev/urandom", O_RDONLY);
if (fd != -1)
{
  uint8_t buffer[40];
  ssize_t amount = insane_read(fd, buffer, sizeof(buffer)); //Grab as much data as we possibly can
  close(fd);

  if (amount > 0)
  {
    if (amount < sizeof(buffer))
    {
      //Continue filling with other sources
    }
    //Do what needs to be done with random data...
    success = 1; //Yay!
  }
}

if (!success)
{
  //Error handling
}

With this improved approach, we know we're reading as much as possible, and if not, then we can try using lesser techniques to fill in the missing entropy required. So far so good, right?

Now, let me ask you, why would opening /dev/(u)random fail in the first place? First, it's possible the open was interrupted, as may happen on some implementations. So the open() call should probably be wrapped like read() is. In fact, you may consider switching to the C family fopen() and fread() calls which handle these kinds of problems for you. However, opening could be failing because the application has reached its file descriptor limit, which is even more prevalent with the C family of file functions. Another possibility is that the file doesn't even exist. Go ahead, try to delete the file as the superuser, nothing stops you. Also, you have to consider that applications may be running inside a chroot, and /dev/ entries don't exist.

I'll cover some alternative approaches for the above problems later. But if you managed to open and read all the data needed, everything is great, right? Wrong! How do you even know if /dev/(u)random is random in the first place? This may sound like a strange question, but it isn't. You can't just trust a file because of it's path. Consider an attacker ran the following:

void sparse_1gb_overwrite(const char *path)
{
  int fd;
  char byte = 0;
  //No error checking
  unlink(path);
  fd = open(path, O_CREAT | O_WRONLY | O_TRUNC, 0644);
  lseek(fd, 1073741822, SEEK_SET);
  write(fd, &byte, 1);
  close(fd);
}

sparse_1gb_create("/dev/random");
sparse_1gb_create("/dev/urandom");

Now both random devices are actually large sparse files with known data. This is worse than not having access to these files, in fact, the attacker is able to provide you with a seed of his own choosing! A strong cryptographic library should not just assume everything is in a proper state. In fact, if you're using a chroot, you're already admitting you don't trust what will happen on the file system, and you want to isolate some applications from the rest of the system.

So the next step is to ensure that the so called device you're opening is actually a device:

int success = 0;
int fd = insane_open("/dev/urandom", O_RDONLY);
if (fd != -1)
{
  struct stat stat_buffer;
  if (!fstat(fd, &stat_buffer) && S_ISCHR(stat_buffer.st_mode)) //Make sure we opened a character device!
  {
    uint8_t buffer[40];
    ssize_t amount = insane_read(fd, buffer, sizeof(buffer)); //Grab as much data as we possibly can

    if (amount > 0)
    {
      if (amount < sizeof(buffer))
      {
        //Continue filling with other sources
      }
      //Do what needs to be done with random data...
      success = 1; //Yay!
    }
  }
  close(fd);
}

if (!success)
{
  //Error handling
}


So now we're out of the woods, right? Unfortunately not yet. How do you know you opened the correct character device? Maybe /dev/urandom is symlinked to /dev/zero? You can run lstat() on /dev/urandom initially, but that has TOCTOU issues. We can add the FreeBSD/Linux extension O_NOFOLLOW to the open command, but then /dev/urandom can't be used when it's linked to /dev/random as it is on FreeBSD, or linked to some other location entirely as on Solaris. Furthermore, avoiding symlinks is not enough:

void dev_zero_overwrite(const char *path)
{
  unlink(path);
  mknod(path, S_IFCHR | 0644, makedev(1, 5));
}

dev_zero_overwrite("/dev/random");
dev_zero_overwrite("/dev/urandom");

If an attacker manages to run the above code on Linux, the random devices are now both in their very essence /dev/zero!

Here's a list of device numbers for the random devices on various Operating Systems:




Linux FreeBSD DragonFlyBSD NetBSD OpenBSD Solaris
/dev/random 1:8 0:10 8:3 46:0 45:0 0:0
/dev/urandom 1:9

8:4 46:1 45:2 0:1
/dev/srandom







45:1

/dev/arandom







45:3


If your application is running with Superuser privileges, you can actually create these random devices anywhere on the fly:
int result = mknod("/tmp/myurandom", S_IFCHR | 0400, makedev(1, 9)); //Create "/dev/urandom" on Linux

Of course, after opening, you want to ensure that you're using what you think you're using:

int success = 0;
int fd = insane_open("/dev/urandom", O_RDONLY);
if (fd != -1)
{
  struct stat stat_buffer;
  if (!fstat(fd, &stat_buffer) && S_ISCHR(stat_buffer.st_mode) &&
      ((stat_buffer.st_rdev == makedev(1, 8)) || (stat_buffer.st_rdev == makedev(1, 9)))) //Make sure we opened a random device
  {
    uint8_t buffer[40];
    ssize_t amount = insane_read(fd, buffer, sizeof(buffer)); //Grab as much data as we possibly can

    if (amount > 0)
    {
      if (amount < sizeof(buffer))
      {
        //Continue filling with other sources
      }
      //Do what needs to be done with random data...
      success = 1; //Yay!
    }
  }
  close(fd);
}

The Linux user manual page for the random devices explicitly informs the reader of these magic numbers, so hopefully they won't change. I have no official sources for the magic numbers on the other OSs. Now, you'll notice that I checked here that the file descriptor was open to either Linux random device. A system administrator may for some reason replace one with the other, so don't necessarily rely on a proper system using the expected device under the device name you're trying to use.

This brings us back to /dev/random versus /dev/urandom. Different OSs may implement these differently. On FreeBSD for example, there is only the former, and it is a CSPRNG. MirBSD offers five different random devices with all kinds of semantics, and who knows how a sysadmin may shuffle them around. On Linux and possibly others, /dev/urandom has a fatal flaw that in fact it may not be seeded properly, so blind usage of it isn't a good idea either. Thankfully Linux and NetBSD offer the following:

int data;
int result = ioctl(fd, RNDGETENTCNT, &data); //Upon success data now contains amount of entropy available in bits

This ioctl() call  will only work on a random device, so you can use this instead of the fstat() call on these OSs. You can then check data to ensure there's enough entropy to do what you need to:

int success = 0;
int fd = insane_open("/dev/urandom", O_RDONLY);
if (fd != -1)
{
  uint8_t buffer[40];
  int entropy;
  if (!ioctl(fd, RNDGETENTCNT, &entropy) && (entropy >= (sizeof(buffer) * 8))) //This ensures it's a random device, and there's enough entropy
  {
    ssize_t amount = insane_read(fd, buffer, sizeof(buffer)); //Grab as much data as we possibly can

    if (amount > 0)
    {
      if (amount < sizeof(buffer))
      {
        //Continue filling with other sources
      }
      //Do what needs to be done with random data...
      success = 1; //Yay!
    }
  }
  close(fd);
}

However, there may be a TOCTOU between the ioctl() and the read() call, I couldn't find any data on this, so take the above with a grain of salt. This also used to work on OpenBSD too, but they removed the ioctl() command RNDGETENTCNT a couple of versions back.

Linux has one other gem which may make you want to run away screaming. Look at the following from the manual for random device ioctl()s:

RNDZAPENTCNT, RNDCLEARPOOL
Zero the entropy count of all pools and add some system data (such as wall clock) to the pools.
If that last one does what I think it does, is any usage ever remotely safe?

/dev/(u)random conclusion


After all this, we now know the following:
  • This is hardly an interface which is easy to use correctly (and securely).
  • On some OSs there may not be any way to use it correctly (and securely).
  • Applications not running with Superuser privileges may have no way to access the random device.
  • An application at its file descriptor limit cannot use it.
  • Many portability concerns.

For these reasons, OpenBSD created arc4random_buf() in the first place. It doesn't suffer from these above problems. The other BSDs also copied it, although they may be running on older less secure implementations.

Alternatives


In addition to a CSPRNG in userspace, the BSDs also allow for a way to get entropy directly from the kernel:

#define NUM_ELEMENTS(x) (sizeof(x)/sizeof((x)[0]))
uint8_t buffer[40];
size_t len = sizeof(buffer);
int mib[] = { CTL_KERN, KERN_RND };
int result = sysctl(mib, NUM_ELEMENTS(mib), buffer, &len, 0, 0);


KERN_RND may also be replaced with KERN_URNDKERN_ARND, and others on the various BSDs. FreeBSD also has a similar interface to determine if the kernel's CSPRNG is properly seeded, see the FreeBSD user manual page for more details. DragonFlyBSD also provides read_random() and read_random_unlimited() as direct no-nonsense interfaces to the underlying devices for /dev/random and /dev/urandom.

Now Linux used to provide a similar interface as the BSDs:
#define NUM_ELEMENTS(x) (sizeof(x)/sizeof((x)[0]))
uint8_t buffer[40];
size_t len = sizeof(buffer);
int mib[] = { CTL_KERN, KERN_RANDOM, RANDOM_UUID };
int result = sysctl(mib, NUM_ELEMENTS(mib), buffer, &len, NULL, 0);

However, this interface was removed a couple of versions back. Kind of makes you wonder why there is no sane simple to use standardized API everywhere that doesn't depend on a house of cards. This is the kind of thing you would think would be standardized in POSIX.

Conclusion


There does not seem to be any good cross-platform techniques out there. Anything done with the current technology will probably require a ton of conditional compiles, annoying checks, and ridiculous workarounds. Although, there should be enough basis here to come up with something with a high confidence level. It'd be interesting to see what the LibreSSL team (or the OpenSSH team) comes up with when they get around to porting what's needed here. As I said before, until then, avoid the non-official ports out there, which use poor sources of entropy like an insecure use of /dev/urandom on Linux, which then falls back on gettimeofday(), and is used to generate long-lived keys.