Saturday, May 3, 2014


A good idea with bad usage: /dev/urandom



Last week, I wrote two articles pointing out issues with unofficial porting efforts of LibreSSL. In these articles, I highlighted some issues that I currently see going on with some of these projects.

In the second article, I called attention to poor arc4random_buf() implementations being created, specifically saying: "Using poor sources of entropy like /dev/urandom on Linux, or worse, gettimeofday(), and using them to generate long-lived keys." Which seemed to irk some. Now for those that care to understand the quoted issue in its proper context should go look for the current LibreSSL porting projects they can find via Google, and review their source. However, most seem to be understanding this as some kind of argument about /dev/random versus /dev/urandom. It isn't. So without further ado:

Poorly seeding a cryptographically secure pseudorandom number generator (CSPRNG)

In order for many forms of cryptography to work properly, they depend on secrecy, unpredictability, and uniqueness. Therefore, we need a good way to use many unpredictable values.

Now randomness doesn't really exist. When we humans see something as random, it's only because we don't know or understand all the details. Therefore, any perceived randomness on your part is your inability to track all the variables.

Computers are very complex machines, where many different components are all working independently, and in ways that are hard to keep track of externally. Therefore, the operating system is able to collect variables from all the various hardware involved, their current operating conditions, how they handle certain tasks, how long they take, how much electricity they use, and so on. These variables can now be combined together using very confusing and ridiculous algorithms, which essentially throw all the data through a washing machine and the world's worst roller coaster. This result is what we call entropy.

Now entropy might not be that large, and can only provide a small amount of unique random values to play with. However, a small amount of random values can be enough to create trillions of pseudo-random values. To do so, one uses some of the aforementioned ridiculous algorithms to produce two values. One value is never seen outside the pseudo-random number generator, and is used as part of the calculations for the next time a random value is requested, and the other value is output as the pseudo-random value generated for a single request.




A construct of this nature can allow an unlimited amount of "random" values to be generated. As long as this technique never repeats and is unpredictable, then it is cryptographically secure. Of course since the algorithm is known, the entropy seeding it must be a secret, otherwise it is completely predictable.

Different algorithms have different properties. OpenBSD's arc4random set of functions are known to be able to create a very large amount of good random values from a very little amount of entropy. Of course the better entropy it is supplied with, the better it can perform, so you'll always want the best entropy possible. As with any random number generator, supply it with predictable values, and its entire security is negated.

arc4random and /dev/(u)random

So, how does one port the arc4random family to Linux? Linux is well known for inventing and supplying two default files, /dev/random and /dev/urandom (unlimited random). The former is pretty much raw entropy, while the latter is the output of a CSPRNG function like OpenBSD's arc4random family. The former can be seen as more random, and the latter as less random, but the differences are extremely hard to measure, which is why CSPRNGs work in the first place. Since the former is only entropy, it is limited as to how much it can output, and one needing a lot of random data can be stuck waiting a while for it to fill up the random buffer. Since the latter is a CSPRNG, it can keep outputting data indefinitely, without any significant waiting periods.

Now theoretically, one can make the arc4random_buf() function a wrapper around /dev/urandom, and be done with it. The only reason not to is because one may trust the arc4random set of algorithms more than /dev/urandom. In that case, would /dev/urandom be trusted enough to seed the arc4random algorithms, which is then in turn used for many other outputs, some of which end up in RSA keys, SSH keys, and so on? I'll leave that question to the cryptographic experts. But I'll show you how to use /dev/urandom poorly, how to attack the design, and corrective efforts.

First, take a look at how one project decided to handle the situation. It tries to use /dev/urandom, and in a worse case scenario uses gettimeofday() and other predictable data. Also, in a case where the read() function doesn't return as much as requested for some reason, but perhaps returned a decent chunk of it, the gettimeofday() and getpid() calls will overwrite what was returned.

This very much reminds me of why the OpenBSD team removed the OpenSSL techniques in the first place. You do not want to use a time function as your primary source of entropy, nor with range limited values, and other dumb things. If you're going to use time, at the very least use clock_gettime() which provides time resolution that is 1,000 times better than gettimeofday(), and can provide both current time, and monotonic time (time which cannot go backwards). Additionally, both gettimeofday() and clock_gettime() on 64-bit systems will return values where the top half of their data will be zeroed out, so you'll want to ensure you throw that away, and not just use their raw data verbatim. Further, how this one project uses /dev/urandom, like most other projects, is terrible.

Attacking /dev/(u)random usage

A naive approach to use /dev/urandom (and /dev/random) is as follows:

int fd = open("/dev/urandom", O_RDONLY);
if (fd != -1)
{
  uint8_t buffer[40];
  if (read(fd, buffer, sizeof(buffer)) == sizeof(buffer)))
  {
    //Do what needs to be done with the random data...
  }
  else
  {
    //Read error handling
  }
  close(fd);
}
else
{
  //Open error handling
}

This tries to open the file, ensures it was opened, tries to read 40 bytes of data, and continues what it needs to do in each scenario. Note, 40 bytes is the amount arc4random wants, which also happens to be 8 bytes more than the Linux manual page says you should be using at a time from its random device.

The first common mistake here is using read() like this. read() can be interrupted. Normally it can't be interrupted for regular files, but this device is not a regular file. Some random device implementations specifically document that read() can be interrupted upon them.

So now our second attempt:

//Like read(), but keep reading upon interuption, until everything possible is read
ssize_t insane_read(int fd, void *buf, size_t count)
{
  ssize_t amount_read = 0;
  while ((size_t)amount_read < count)
  {
    ssize_t r = read(fd, (char *)buf+amount_read, count-amount_read);
    if (r > 0) { amount_read += r; }
    else if (!r) { break; }
    else if (errno != EINTR)
    {
      amount_read = -1;
      break;
    }
  }
  return(amount_read);
}

int success = 0;
int fd = open("/dev/urandom", O_RDONLY);
if (fd != -1)
{
  uint8_t buffer[40];
  ssize_t amount = insane_read(fd, buffer, sizeof(buffer)); //Grab as much data as we possibly can
  close(fd);

  if (amount > 0)
  {
    if (amount < sizeof(buffer))
    {
      //Continue filling with other sources
    }
    //Do what needs to be done with random data...
    success = 1; //Yay!
  }
}

if (!success)
{
  //Error handling
}

With this improved approach, we know we're reading as much as possible, and if not, then we can try using lesser techniques to fill in the missing entropy required. So far so good, right?

Now, let me ask you, why would opening /dev/(u)random fail in the first place? First, it's possible the open was interrupted, as may happen on some implementations. So the open() call should probably be wrapped like read() is. In fact, you may consider switching to the C family fopen() and fread() calls which handle these kinds of problems for you. However, opening could be failing because the application has reached its file descriptor limit, which is even more prevalent with the C family of file functions. Another possibility is that the file doesn't even exist. Go ahead, try to delete the file as the superuser, nothing stops you. Also, you have to consider that applications may be running inside a chroot, and /dev/ entries don't exist.

I'll cover some alternative approaches for the above problems later. But if you managed to open and read all the data needed, everything is great, right? Wrong! How do you even know if /dev/(u)random is random in the first place? This may sound like a strange question, but it isn't. You can't just trust a file because of it's path. Consider an attacker ran the following:

void sparse_1gb_overwrite(const char *path)
{
  int fd;
  char byte = 0;
  //No error checking
  unlink(path);
  fd = open(path, O_CREAT | O_WRONLY | O_TRUNC, 0644);
  lseek(fd, 1073741822, SEEK_SET);
  write(fd, &byte, 1);
  close(fd);
}

sparse_1gb_create("/dev/random");
sparse_1gb_create("/dev/urandom");

Now both random devices are actually large sparse files with known data. This is worse than not having access to these files, in fact, the attacker is able to provide you with a seed of his own choosing! A strong cryptographic library should not just assume everything is in a proper state. In fact, if you're using a chroot, you're already admitting you don't trust what will happen on the file system, and you want to isolate some applications from the rest of the system.

So the next step is to ensure that the so called device you're opening is actually a device:

int success = 0;
int fd = insane_open("/dev/urandom", O_RDONLY);
if (fd != -1)
{
  struct stat stat_buffer;
  if (!fstat(fd, &stat_buffer) && S_ISCHR(stat_buffer.st_mode)) //Make sure we opened a character device!
  {
    uint8_t buffer[40];
    ssize_t amount = insane_read(fd, buffer, sizeof(buffer)); //Grab as much data as we possibly can

    if (amount > 0)
    {
      if (amount < sizeof(buffer))
      {
        //Continue filling with other sources
      }
      //Do what needs to be done with random data...
      success = 1; //Yay!
    }
  }
  close(fd);
}

if (!success)
{
  //Error handling
}


So now we're out of the woods, right? Unfortunately not yet. How do you know you opened the correct character device? Maybe /dev/urandom is symlinked to /dev/zero? You can run lstat() on /dev/urandom initially, but that has TOCTOU issues. We can add the FreeBSD/Linux extension O_NOFOLLOW to the open command, but then /dev/urandom can't be used when it's linked to /dev/random as it is on FreeBSD, or linked to some other location entirely as on Solaris. Furthermore, avoiding symlinks is not enough:

void dev_zero_overwrite(const char *path)
{
  unlink(path);
  mknod(path, S_IFCHR | 0644, makedev(1, 5));
}

dev_zero_overwrite("/dev/random");
dev_zero_overwrite("/dev/urandom");

If an attacker manages to run the above code on Linux, the random devices are now both in their very essence /dev/zero!

Here's a list of device numbers for the random devices on various Operating Systems:




Linux FreeBSD DragonFlyBSD NetBSD OpenBSD Solaris
/dev/random 1:8 0:10 8:3 46:0 45:0 0:0
/dev/urandom 1:9

8:4 46:1 45:2 0:1
/dev/srandom







45:1

/dev/arandom







45:3


If your application is running with Superuser privileges, you can actually create these random devices anywhere on the fly:
int result = mknod("/tmp/myurandom", S_IFCHR | 0400, makedev(1, 9)); //Create "/dev/urandom" on Linux

Of course, after opening, you want to ensure that you're using what you think you're using:

int success = 0;
int fd = insane_open("/dev/urandom", O_RDONLY);
if (fd != -1)
{
  struct stat stat_buffer;
  if (!fstat(fd, &stat_buffer) && S_ISCHR(stat_buffer.st_mode) &&
      ((stat_buffer.st_rdev == makedev(1, 8)) || (stat_buffer.st_rdev == makedev(1, 9)))) //Make sure we opened a random device
  {
    uint8_t buffer[40];
    ssize_t amount = insane_read(fd, buffer, sizeof(buffer)); //Grab as much data as we possibly can

    if (amount > 0)
    {
      if (amount < sizeof(buffer))
      {
        //Continue filling with other sources
      }
      //Do what needs to be done with random data...
      success = 1; //Yay!
    }
  }
  close(fd);
}

The Linux user manual page for the random devices explicitly informs the reader of these magic numbers, so hopefully they won't change. I have no official sources for the magic numbers on the other OSs. Now, you'll notice that I checked here that the file descriptor was open to either Linux random device. A system administrator may for some reason replace one with the other, so don't necessarily rely on a proper system using the expected device under the device name you're trying to use.

This brings us back to /dev/random versus /dev/urandom. Different OSs may implement these differently. On FreeBSD for example, there is only the former, and it is a CSPRNG. MirBSD offers five different random devices with all kinds of semantics, and who knows how a sysadmin may shuffle them around. On Linux and possibly others, /dev/urandom has a fatal flaw that in fact it may not be seeded properly, so blind usage of it isn't a good idea either. Thankfully Linux and NetBSD offer the following:

int data;
int result = ioctl(fd, RNDGETENTCNT, &data); //Upon success data now contains amount of entropy available in bits

This ioctl() call  will only work on a random device, so you can use this instead of the fstat() call on these OSs. You can then check data to ensure there's enough entropy to do what you need to:

int success = 0;
int fd = insane_open("/dev/urandom", O_RDONLY);
if (fd != -1)
{
  uint8_t buffer[40];
  int entropy;
  if (!ioctl(fd, RNDGETENTCNT, &entropy) && (entropy >= (sizeof(buffer) * 8))) //This ensures it's a random device, and there's enough entropy
  {
    ssize_t amount = insane_read(fd, buffer, sizeof(buffer)); //Grab as much data as we possibly can

    if (amount > 0)
    {
      if (amount < sizeof(buffer))
      {
        //Continue filling with other sources
      }
      //Do what needs to be done with random data...
      success = 1; //Yay!
    }
  }
  close(fd);
}

However, there may be a TOCTOU between the ioctl() and the read() call, I couldn't find any data on this, so take the above with a grain of salt. This also used to work on OpenBSD too, but they removed the ioctl() command RNDGETENTCNT a couple of versions back.

Linux has one other gem which may make you want to run away screaming. Look at the following from the manual for random device ioctl()s:

RNDZAPENTCNT, RNDCLEARPOOL
Zero the entropy count of all pools and add some system data (such as wall clock) to the pools.
If that last one does what I think it does, is any usage ever remotely safe?

/dev/(u)random conclusion


After all this, we now know the following:
  • This is hardly an interface which is easy to use correctly (and securely).
  • On some OSs there may not be any way to use it correctly (and securely).
  • Applications not running with Superuser privileges may have no way to access the random device.
  • An application at its file descriptor limit cannot use it.
  • Many portability concerns.

For these reasons, OpenBSD created arc4random_buf() in the first place. It doesn't suffer from these above problems. The other BSDs also copied it, although they may be running on older less secure implementations.

Alternatives


In addition to a CSPRNG in userspace, the BSDs also allow for a way to get entropy directly from the kernel:

#define NUM_ELEMENTS(x) (sizeof(x)/sizeof((x)[0]))
uint8_t buffer[40];
size_t len = sizeof(buffer);
int mib[] = { CTL_KERN, KERN_RND };
int result = sysctl(mib, NUM_ELEMENTS(mib), buffer, &len, 0, 0);


KERN_RND may also be replaced with KERN_URNDKERN_ARND, and others on the various BSDs. FreeBSD also has a similar interface to determine if the kernel's CSPRNG is properly seeded, see the FreeBSD user manual page for more details. DragonFlyBSD also provides read_random() and read_random_unlimited() as direct no-nonsense interfaces to the underlying devices for /dev/random and /dev/urandom.

Now Linux used to provide a similar interface as the BSDs:
#define NUM_ELEMENTS(x) (sizeof(x)/sizeof((x)[0]))
uint8_t buffer[40];
size_t len = sizeof(buffer);
int mib[] = { CTL_KERN, KERN_RANDOM, RANDOM_UUID };
int result = sysctl(mib, NUM_ELEMENTS(mib), buffer, &len, NULL, 0);

However, this interface was removed a couple of versions back. Kind of makes you wonder why there is no sane simple to use standardized API everywhere that doesn't depend on a house of cards. This is the kind of thing you would think would be standardized in POSIX.

Conclusion


There does not seem to be any good cross-platform techniques out there. Anything done with the current technology will probably require a ton of conditional compiles, annoying checks, and ridiculous workarounds. Although, there should be enough basis here to come up with something with a high confidence level. It'd be interesting to see what the LibreSSL team (or the OpenSSH team) comes up with when they get around to porting what's needed here. As I said before, until then, avoid the non-official ports out there, which use poor sources of entropy like an insecure use of /dev/urandom on Linux, which then falls back on gettimeofday(), and is used to generate long-lived keys.

43 comments:

Max said...

I wasn't expecting this post so quickly. Thank you.

I originally blew off the concerns about an attacker monkeying with /dev/urandom because an attacker would need to have administrative privileges to do so, and if they have that they can already do a lot more damage (Raymond Chen refers to this as requiring the attacker to be on the other side of an airtight hatch: https://www.google.com/search?q=airtight+hatch+oldnewthing ). However, the machine you're one may not be the attacker's final target, your SSL keys might be (or, rather, your customers' private data might be the final target, and the attacker hopes to get that data through administrative rights on the machine).

henke37 said...

Next up, why you shouldn't use the unpriviledged assembly instruction to load random data.

insane coder said...

Max: I was already asked to write this yesterday, and was half way done by the time you wrote your first comment. So yeah, it seems quick :)

Now, regarding your points:
Not necessarily an attacker either, The NetBSD manpage *recommends* you shuffle around your random devices to what you think is best. If you program to expect one design and get the other, you're going to get problems.

And don't forget that it may be seeded incorrectly. Thankfully you can use the ioctl() to check for that.

In any case, one shouldn't make assumptions in secure programs, they should be verified, and (/dev/urandom || gettimeofday()) is a very poor source of entropy. The first may fail for a multitude of reasons, and the last one, especially when done terribly, is a joke.

insane coder said...

Another issue with the entire design of the Linux random devices is that even a regular user can run: cat /dev/random > /dev/null & cat /dev/zero > /dev/random

If that's running, entropy will be degraded.

>sysctl kernel.random
kernel.random.poolsize = 4096
kernel.random.entropy_avail = 12

12 bits of entropy is not a good situation to be in...

Unknown said...

Hi, thanks for the feedback insane coder. I had posted this version more as a PoC to the OpenSSH mailing list, you can follow the discussion there: http://www.gossamer-threads.com/lists/openssh/dev/58537

I'll note your comments on the project issue tracker so that they're not lost.

insane coder said...

Hi Unknown,

I'm not saying you should follow any of these blindly, but take a look at:
https://github.com/libevent/libevent/blob/master/arc4random.c
https://github.com/nmathewson/libottery/blob/master/src/ottery_entropy_urandom.c

For explicit_bzero(), you might want to consider:
void explicit_bzero(void *p, size_t n) __attribute__((optimize("O0")))
Or: http://cvsweb.netbsd.org/bsdweb.cgi/src/common/lib/libc/string/Attic/explicit_bzero.c?rev=1.1&content-type=text/x-cvsweb-markup&only_with_tag=MAIN

Some platforms have memset_s(), and Microsoft has SecureZeroMemory().

dlg said...

there's no TOCTOU problem between an ioctl and a read like you describe unless you're in a threaded program and the other thread is doing things you're really unaware of.

ps, dont use threads.

insane coder said...

dlg: How do you know another process won't be using the device at the same time and alter the value returned from the ioctol() before the read()?

Based on some quick testing on Linux, that very much seems like a possibility. Other OSs may act differently.

dlg said...

ah, i thought you were asking if the device an fd was referencing could change between ioctl and read.

the state of the device can obviously change between syscalls unless it makes attempts to be opened by only one thing at a time, and isn't affected by something like hotplug or interrupts. however dup and fork kind of ruin that too.

insane coder said...

According to manpages on other platforms, each process trying to open the random device gets a unique random device, so there would be no race condition on them.

I don't really understand what Linux's random "API" is trying to accomplish. It is very hard to use correctly, and some of its features just seem useless or deceptive.

Dobin Rutishauser said...

Some of your statements contradict the following article:
Myths about urandom

IMHO, an initial seed of 128 bit should be enough to produce random numbers forever, no need for reseeding. So /dev/urandom is not less secure than /dev/random.

May you comment on this issue?

insane coder said...

Hello Dobin,

Where did I say /dev/urandom is less secure than /dev/random?

Dobin Rutishauser said...

Hmm after reading the article again, it seems you did not explicetly state this fact :-)

But:
So, how does one port the arc4random family to Linux? Linux is well known for inventing and supplying two default files, /dev/random and /dev/urandom (unlimited random). The former is pretty much raw entropy, while the latter is the output of a CSPRNG function like OpenBSD's arc4random family

The thing with "pretty much raw entropy" caught my eye. According to the mentioned article, /dev/random is the same as /dev/urandom, but will block if the super-mega-magic entropy-measurement function thinks it does not have enough entropy.

insane coder said...

Hi Dobin,

I tried giving an explanation which should be simple for a layman to follow of what goes on. I wouldn't take it as precise fact, especially when there's a washing machine involved in the explanation ;)

The issues between /dev/random and /dev/urandom is slightly more complex than just blocking. How does /dev/urandom keep supplying an unlimited about of data? Will its algorithm ever start becoming predictable? How does this interact with seeding the algorithm in arc4random, which also reseeds from time to time?

As is well known, the primary issue with /dev/urandom is when you use it in a library which uses it once to seed, and is started at bootup, say within Apache. To combat the issues there, 1) I showed that the ioctl() can be used to see if /dev/urandom has entropy (whether that works properly or not is another matter). 2) I suggested that arc4random_buf() should in fact be a direct wrapper around /dev/urandom, so you never have to worry about some initial poor seed infecting an application's entire life time, as you'll just call it directly each time you need some random.

William Ahern said...

The RANDOM_UUID sysctl wasn't removed, but the syscall mode of access was simply made optional with the kernel configuration option CONFIG_SYSCTL_SYSCALL.

Ubuntu 14.01 still builds with sysctl syscalls, and so the method I posted still works, while Red Hat apparently removed them awhile ago. I haven't tested other distributions.

This is pretty lame on the part of Red Hat, because quite a few applications use the RANDOM_UUID method. You can still, for example, cat /proc/sys/kernel/random/uuid, you just cannot access it without /proc.

Here's a choice quote from the Tor source code:

/* On linux, sysctl is deprecated. Because proc is so awesome that you shouldn't _want_ to write portable code, I guess? */

Dobin Rutishauser said...

Hey Mr. Insane

I also dont know if the normal /dev/urandom is a cryptographically secure PRNG. It should be ;-)

1) To seed arc4random with /dev/[u]random, it's like:
"randomness from measuring IRQ timings" -> (/dev/[u]random) PRNG -> seed arc4random

2) And the better way:
"randomness from measuring IRQ timings" -> seed arc4random

Is 1) really much worse than 2? Can't answer that too.


Nevertheless, good writeup about the difficulties in "just acquiring" the random numbers by userspace code in hostile environment :-)

insane coder said...

Hi Dobin,

Any randomness should be first ran through some algorithm to remove biases that a single source may carry.

If you're very interested in this, you may want to read:
http://eprint.iacr.org/2013/338.pdf
http://www.oreilly.com/catalog/secureprgckbk/chapter/ch11.pdf (this covers real world use too!)

There's also been some talk lately about a backdoor in Linux's RNG, read through some commit messages here: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/drivers/char/random.c?showmsg=1

Some random devices may not too secure, check out /dev/srandom and /dev/prandom on MirBSD. This page covers a lot of interesting material regarding different devices on different OSs and testing: https://calomel.org/entropy_random_number_generators.html

Lastly, you're assuming that arc4random is secure with a good random seed. I personally haven't seen any papers on its strengths and weaknesses, or how it compares to alternatives. I'd love if someone who really knows their stuff would determine how arc4random stacks up to Linux's method, and the pros and cons of one seeding the other. If arc4random is truly a real improvement that can be applied after a /dev/urandom step, then there's no reason why it shouldn't be added as the final pass to the output of /dev/(u)random itself.


I'm glad you enjoyed my article.

Bob Beck said...

"Why doesn't linux have a sane API" - the same reason linux doesn't have a lot of sane API's: Usually, it looks like this.

insane coder said...

Hi Bob,

In order for Linux to have a sane API, it would have to provide a sane interface for the C library to then provide a sane interface to the application developers.

The patch there is solely for glibc, and makes all the same mistakes this article is rallying against. I would reject it too.

insane coder said...

Since enough people were writing to me about /dev/random vs. /dev/urandom, I wrote an article about it an other practical problems.

Joseph M. Schwartz said...

Hey, have you seen http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libcrypto/crypto/

Do you think they handle entropy properly?

insane coder said...

Hi Joseph,

Yet, I saw that a couple of days ago.
Note, that is being constantly worked on, so my remarks may no longer apply by the time you read this.

1) It seems they noted some of the information presented here, so that's good.

2) They seem not to handle some issues mentioned here, or in my other article.

Notably, they're relying on /dev/urandom to be returning good data, which may not be true close to system start up. The approach they're taking if reading /dev/urandom fails should really be used in addition to /dev/urandom, not instead of. Combining data from all over (hashing it all together) will serve to improve the basic situation. Of course there is still the question as to whether not being able to combine in /dev/(u)random as one of the sources should result in an abort.

They're using the sysctl() call on Linux, which doesn't exist. I don't know which distribution has an actually system call there (even with what they're doing), but not a single one of my servers, which have multiple distros, some even 5 years old, supports it. Even if a distro does offer that function, I'm not sure it does what they think it does.

Since I don't have access to a system which supports it, I haven't actually tested the function to see what it returns. But based on some of the information I've read about it, it may only return a limited amount of data, and it won't be in the format you're expecting. Meaning, you really need to check how much you're receiving, and loop if need be. Also, you'll have to convert the encoding of the data to raw binary.

This may also be entirely moot, because if the function does work, it may very well just wrap to file system reads, in which case you're not really gaining anything, and you surely cannot trust the data you receive from it.

However, you may be able to mount proc in cases where you cannot mount devtmpfs (or whatever flavor alternate is available), in which case it can be worthwhile to read it via the file system directly, perhaps with some preads to ensure successive calls are returning different results.

3) There's a lot more they can be doing to gather entropy, especially for bad situations. As I described in my other article, grabbing hardware serial numbers, perhaps from the network devices can help some situations. SIOCGIFCONF and friends can help, and on some OSs, that will also return some real entropy from the network devices. There's getifaddrs() too, although that can be error prone if you're not careful, and you really need to test how you're parsing the returned data if certain whacky network devices are present (with SIOCGIFCONF, you can just hash the buffer as a whole, and the only parsing you need is perhaps for simple SIOCGIFHWADDR calls on OSs where available).

On x86 CPUs, the cpuid instruction can also return some unique to processor details, and on modern ones, entropy as well, as some cpuid calls return details about temperature and power.

4) Taking the address of main is a bad idea. Depending how the library is linked/loaded, that's going to fail.

5) Based on the idea of taking addresses, sbrk(0) should be mixed in. Address of environ can be good too. Linux has 3 pointers, etext, edata, end, on Solaris, _etext, _edata, _end, which point to different areas of the memory layout should be mixed in too. Some other C functions may also be in separate libraries and can move around a bit, for example, malloc, clock_gettime, and pthread_mutex_lock, so those are good in addition to printf.

insane coder said...

6) Regarding time with clock_gettime(), I use the following in my libraries:

#ifdef CLOCK_REALTIME_PRECISE
TIME_UPDATE(CLOCK_REALTIME_PRECISE)
#else
TIME_UPDATE(CLOCK_REALTIME) //Everyone who has clock_gettime() has CLOCK_REALTIME
#endif

#ifdef CLOCK_HIGHRES
TIME_UPDATE(CLOCK_HIGHRES)
#elif defined(CLOCK_MONOTONIC_RAW)
TIME_UPDATE(CLOCK_MONOTONIC_RAW)
#elif defined(CLOCK_MONOTONIC_PRECISE)
TIME_UPDATE(CLOCK_MONOTONIC_PRECISE)
#elif defined(CLOCK_MONOTONIC)
TIME_UPDATE(CLOCK_MONOTONIC)
#endif

So if clock_gettime() is present, it'll use the most accurate real time, and for those also supporting monotonic time, the most accurate one of that. Of course their approach of mixing in multiple of these is good, but they're not always hitting some of the more important ones.

7) The use of gettimeofday() when you have clock_gettime() is pretty pointless. However, alongside clock_gettime() of day, it can be useful to use clock_getres() for each timer, as those will differ system to system.

8) times() may have other useful data to mix in.

9) On Linux, there's probably a few more getauxval() values that can be useful to mix in. A fallback for implementing that function for older GLIBCs wouldn't be a bad idea either.

Okay, I think that pretty much wraps it up.

Joseph M. Schwartz said...

Wow, I wasn't expecting you to be so thorough. Thank you!

You mentioned you have your own libraries, is there anything you do in your libraries you have not covered?

If all your suggestions are implemented, would you yourself use the LibReSSL RAND API?

insane coder said...

Hi Joseph,

>You mentioned you have your own libraries, is there anything you do in your libraries you have not covered?

All the important things I do is more or less covered in the articles here, and the resources I link to.

As far as collecting entropy goes, the biggest difference between my implementation and LibreSSL's is that I have a list of the random device numbers for OSs where they do not fluctuate (the BSDs fluctuate), and on some OSs, use their APIs for getting the device numbers (for example NetBSD provides such an API). These device numbers are used to verify the devices being opened, and also attempt to mknod() them if they do not exist in the usual location.

For the OSs that support it, I also attempt to mount devfs or devtmpfs.

>If all your suggestions are implemented, would you yourself use the LibReSSL RAND API?

No, I wouldn't.

Today, with the proliferation of virtual machines, especially for hosting, traditional random APIs no longer cut it. As I mentioned in my other article, virtual machines can be rolled back to a particular state, or are sometimes cloned from the same state. These things happen in real hosting environments in response to malfunctions, attacks, or heavy load.

Mixing in the current time (and perhaps MAC address) into the entropy upon the initial machine boot is crucial (see my previous response regarding combining /dev/(u)random with the other functions). Now while that will guard against several problems, it wouldn't help against live state loading, nor cases where the VM hasn't updated the real time yet.

Buffered APIs like arc4random*() and RAND_*() layered over it will be generating identical data in these situations, and can be attacked. Mixing in the current time and any hardware RNGs if available on every use can help, but you really need to be using hedged cryptography.


The APIs provided by OpenBSD and LibreSSL also don't cover all the scenarios I need them to. For example, I sometimes need a uniform number of a particular type within a particular range, and arc4random_uniform() doesn't actually provide uniform results, doesn't simply allow for ranges, and only works with a single type.

When using a standalone command to generate RSA keys, I'd also prefer if it blocked on /dev/random, but this point is debatable.

Joseph M. Schwartz said...

"but you really need to be using hedged cryptography"

What is that? I do not see you talking about this anywhere.

"arc4random_uniform() doesn't actually provide uniform results"

What do you mean? I see this in the manual "arc4random_uniform() will return a uniformly distributed random number less than upper_bound. arc4random_uniform() is recommended over constructions like “arc4random() % upper_bound” as it avoids "modulo bias" when the upper bound is not a power of two."

"only works with a single type"

So what? You can easily extend the functions to other types as needed.

insane coder said...

Hi Joseph,

>"but you really need to be using hedged cryptography"

>What is that? I do not see you talking about this anywhere.

I mentioned the overall idea of it, and link to the paper in the comments of my other article.

There's also a video at Microsoft Research, although the site doesn't seem to be loading for me at the moment. I got a direct link to the MP4 out of Google's cache.

The essential of the idea is that you should ensure the randomness generated is generated specifically for whatever it is you're generating it for. So for example, if you're generating a random temporary password for a user, you mix the user's ID or user's name into the entropy pool before obtaining the random data. If you're generating a temporary symmetric encryption key for a remote connection, you can mix in the IP address, and so on. This ensures that if two different users or machines or whatever are working off the same state, they'll still get different results between the two of them.

This will need a slightly different API, and applications have to make use of it where applicable, but OpenBSD doesn't even provide an underlying API to accomplish this. They used to have an arc4random_addrandom() function which could theoretically get the job then, but they since removed it. In any case, that API was poorly designed, the hedged randomness function should combine the various functions as needed internally to prevent misuse, and ensure there aren't any gotchas occurring between the two calls.

>I see this in the manual "arc4random_uniform() will return a uniformly distributed random number less than upper_bound. arc4random_uniform() is recommended over constructions like “arc4random() % upper_bound” as it avoids "modulo bias" when the upper bound is not a power of two."

Yes, but that's not actually true. Like most naive implementations out there, the results aren't uniform and contain bias. Test it yourself if you don't believe me.

insane coder said...

>So what? You can easily extend the functions to other types as needed.

You just condoned the worst sin in all programming, code repetition.

As you've probably seen in the LibreSSL changelogs, OpenSSL was filled with cases where a bunch of code could be replaced with a single call to asprintf(). Now it's easy to write your own asprintf(), although there could be gotchas involved as I demonstrated. Functions make the code clearer, and ensure you don't screw up, or if you do, you only need to fix it once.

Unfortunately, with the UNIX principal of do just one thing and do it well, there's a severe lack of higher level functions in the various APIs. Sure, every now and then someone invents a higher level function like asprintf(), but for the most part, you find UNIX libraries filled with code repetition, and OpenBSD is no exception.

Take this example, which I noticed someone pointed out on OpenSSL Valhalla Rampage, see: http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libssl/src/crypto/engine/eng_list.c
In revision 1.11, issetugid() was placed before every getenv() call throughout the library. This kind of usage already indicates there should be a combined function. If not in the C library, at least internal to LibreSSL itself. In fact, GLIBC already noticed this issue is frequent and supplies a function secure_getenv().
Now look at revision 1.12, theo had to correct a mistake he made in using these two functions separately. That would likely not have happened if there was a combined function call available.

POSIX is finally starting to get it, and is adding more functions for atomicity reasons. Unfortunately this messed up development style and the ensuing issues and complications is quite prevalent throughout OpenBSD and other projects. As I mentioned in this article, read() is problematic, and generally needs to be wrapped in checks for EAGAIN, EINTR, and general looping. Yet these projects generally just employ the same code repetition everywhere leading to ugly bloated code and potential problems.

I'm all for not complicating the underlying C library, and providing new functions which may have questionable APIs ending up in public use. But something of LibreSSL's nature should have its own internal higher level functions to use wherever needed. I'd welcome another several thousand line source reduction by doing so.

insane coder said...

Besides different types for different sized integers, my random library also includes functions like these:
void rand_text_hedge_whitelist(char *buffer, size_t buffer_length, const void *hedge, size_t hedge_length, const char *whitelist);
void rand_text_hedge_vec_whitelist(char *buffer, size_t buffer_length, const struct iovec *hedge_iov, size_t iov_amount, const char *whitelist);

Which can be used for generating temporary passwords using certain allowed characters, ensure the returned random string is hedged, and null terminated.

Now such functions could easily be built from more primitive functions, but we want to ensure our developers use these things correctly, and provide higher level functions for both ease and correctness. It's a good thing too, because shortly after creating these, a coworker of mine pointed out how these two functions had a subtle bug due to the interactions of the primitives used to build them. I'm glad I only had to fix that in one place, and not throughout a ton of code recreating these.

To give you an idea of what I'm talking about here, my random API has over 70 functions. My string APIs (which go way beyond the silly things like strcpy(), strncpy(), strlcpy()) has several hundred functions. With consistent naming and APIs which were well thought out, developers also remember the different ones, and how they work and don't need to keep looking them up. More than once, one of our developers came across a certain situation where they needed some particular string handling function, just typed in what they thought it would be called if it existed along with its parameters, and figured they'd go add it to our massive list, but turns out it was already put there by a previous developer, along with the API they thought it would have. Now that's intuitive and useful, but unfortunately rarely exists, and all too often, a lot of functions leads to confusing misdesigned APIs. So don't expect such things to work in anything as messed up as a public C library. More often than not, in such cases, you end up with the garbage Microsoft provides.

Joseph M. Schwartz said...

Thanks for the information.
Hedged cryptography sounds like a good idea.

How can I test arc4random_uniform() for bias?

Your other points make sense in theory, but I think you're going a wee bit overboard.

insane coder said...

Hi Joseph,

>How can I test arc4random_uniform() for bias?

Rip out the random function and replace with one of your own which returns 0, then 1, then 2, etc...

Replace the loop with a failure mode.

Collect statistics on returned data.


Perhaps I should write an article about this.

Jorden Verwer said...

Note: I'm posting this from my business account because I can apparently only post with a Google account here, but my employer has nothing to do with what I write here. It's not that I can foresee any conflict of interest, but you can never be too careful with these things.

> Rip out the random function and replace with one of your own which returns 0, then 1, then 2, etc...
>
> Replace the loop with a failure mode.
>
> Collect statistics on returned data.

Assuming I understood you correctly, I did exactly that for a few different upper bounds (not exceeding 42) and found no differences between the expected and actual distributions. Could you clarify when the problem occurs? Does it only happen with large upper bounds? Could you post some sample code, perhaps? I've studied this function in serious detail and am absolutely convinced that it's 100% correct, so your claims that it does the wrong thing have me puzzled. Also, if this is (in your words) a naive implementation, what would a proper implementation look like?

insane coder said...

Hello Jorden,

I modified the function to use a uint8_t instead of uint32_t, upper bound of 255 (254 really the way it works), and collected data. Suffice it to say, it fails, although performed better than some other approaches I saw.

I can't comment on using it with 42, I'm not aware of any native integer size which provide a limit of that number.

The function is incorrect, because the distribution is not uniform. For every single upper bound, the amount of hits for each number below it should be equal to every other number. For the maximum, the value of every number should be exactly 1. For some values for various upper bounds, you'll have results which do not match that.

If in proper testing, it did work correctly, perhaps that was of a version other than the ones I looked at. What I said certainly held true of the latest version in CVS at the time I wrote the above. I can double check with latest CVS again when I get a chance.

If need be, I can write an article about proper testing and implementation.

Jorden Verwer said...

That function did not change in the OpenBSD tree since 2012, although recently it was moved to a separate file, without any changes to the actual code.

I did not change the return type of the random function, but 42 was the largest value of the upper bound argument I supplied to the uniform distribution function. Larger values require a larger array to store the results and more time to verify them, so there is an upper limit to the values that are feasible (although it's obviously a lot higher than 42).

I you changed the type to uint8_t, you may have run afoul of integer promotions. I would check that first if I were you. Of course, if you post your code here, everyone can take a look at it to see what your approach was.

insane coder said...

Hello Jorden,

I did not change the return type of the function. I redid the logic with uint8_t to test it, as testing a uint32_t would require more resources that I was willing to test with.

If one is not going to do an exhaustive test of the logic, it can't be verified to be correct.

As I said before I can write an article about how to handle this sort of thing properly.

Jorden Verwer said...

Well, I feel like we're going in circles. Yes, it might be interesting to write an article about how to handle this properly, but in the meantime there's a more immediate issue to take on.

You wrote that arc4random_uniform is not implemented correctly. If this is true, it's a serious problem that needs to be solved immediately, given that this function has become something of an informal standard that half the world is now using. However, if it's not true, we need to clear that up, so as not to cast unnecessary doubt on a function that's actually correct.

I agree that only an exhaustive test can serve as proof that a function is correct. In fact, even an exhaustive test isn't necessarily sufficient, given that there are differences between platforms, compilers, etc. I did not claim that I performed an exhaustive test, but I did try to keep the alterations to the function minimal. I was already convinced that the function was correct based on an in-depth analysis of its code, and running the test (based on my initial understanding of what you did) reassured me. It is, however, still theoretically possible that I'm wrong and that the function is defective. Given that I was the one who suggested the latest change to this function, after MUCH deliberation, I'd like to consider myself something of an expert ("in a very narrow field", to quote a recently released humorous movie about being the only engineer in business meetings). That's why I believe I may know the explanation for your test results.

What you seem to have done (again, this is based on my understanding of what you wrote here, so please do correct me if I'm wrong) is to replace uint32_t with uint8_t and test that version exhaustively. While you may have performed an exhaustive test on your altered version, you did not test the original function exhaustively, nor did you technically test the original function AT ALL. This is important, because it means that conclusions about the correctness of the altered function (no matter if they are positive or negative) do not necessarily carry over to the original function.

Allow me to elaborate. There's an implicit assumption in the current code that uint32_t is not a candidate for integer promotions, i.e. that the arithmetic will be carried out in 32 bits, not in 64 bits or some other number of bits that is not 32. This assumption holds for all platforms on which OpenBSD runs, and for the platforms that have imported this code it probably holds as well. However, similar assumptions do not hold for uint8_t. Such values will be promoted to 32 bits (signed, probably) and that will affect the result. So, if you replaced all occurrences of uint32_t with uint8_t, you broke the code. Results obtained by running the altered function will be incorrect. However, this does not say anything at all about the correctness of the original function.

Will you help me confirm my hypothesis? If so, and if my analysis of how you performed the tests is correct, then please correct your altered version and report back on your findings.

All you need to do is take this one line:

min = -upper_bound % upper_bound;

And then replace it by these two:

min = -upper_bound;
min %= upper_bound;

insane coder said...

Hi Jorden,

I saw what you were saying and looked over my function again, and I noticed I placed my cast around the wrong segment, so I'm entirely to blame here.

I now plugged this function into my C++ uniform testing framework:

uint32_t uniform_openbsd(uint8_t upper_bound, uint8_t selected_rand)
{
if (upper_bound < UINT8_MAX)
{
++upper_bound;
if (upper_bound >= 2)
{
uint8_t min = static_cast<uint8_t>(-upper_bound) % upper_bound;
if (selected_rand < min) { return(UINT32_MAX); }
selected_rand %= upper_bound;
}
else { selected_rand = 0; }
}
return(selected_rand);
}

And results are perfectly uniform for every possible combination.

My testing framework includes many uniform functions found in various libraries and books (most of them considerably easier to correctly port to uint8_t and this API than OpenBSD's method). Nearly every implementation I've ran across fails an exhaustive test (and nearly every port was double and triple checked for correctness), and it seems in my haste to recently add OpenBSD's to it, I mistakenly casted the wrong component, and assumed that like many other implementations it carried similar mistakes.

I'm sorry for the false alarm, and can now confirm with exhaustive tests that OpenBSD is one of three implementations that I know of that are correctly implemented.

Jorden Verwer said...

Ah, that's definitely good news. Thank you for admitting your mistake.

What are the other two correct implementations, by the way?

insane coder said...

Hello Jorden,

>Ah, that's definitely good news. Thank you for admitting your mistake.

If one cannot admit when they make a mistake, they should not be allowed to write anything.

>What are the other two correct implementations, by the way?

The other implementations are libottery and a corrected version I wrote based upon the one found in the Secure Programming Cookbook (In the SPC itself, it mistakes > for >=, and mishandles the case where the specified range is the entire range of possible values, but fixing it is easy).

Since I now have 3 working implementations, I ran an additional statistic regarding reloop, and noticed that OpenBSD unlike the other two has optimal relooping in all scenarios. The other two both cut off a whole set of random values when the range is a power of 2 minus 1. Meaning their retry random value is a bit worse than 50% on power of 2 minus 1. Nothing terrible, but can be slightly slower to find a random value in those cases.

Additionally, OpenBSD and my corrected SPC implementation has a faster algorithm than libottery, as the last division operation for the formers is only ran when the value is valid, whereas the latter need to run a division upon each random value received, even if thrown away.

Jorden Verwer said...

Interesting stuff. It looks like there would be more than enough material for a real article, if you can find the time for it. This subject is still poorly understood by many, but the awareness is growing. A whole generation of programmers has grown up with the modulo trick, without realizing that it's subtly (and sometimes not so subtly) wrong. They are now slowly waking up to the reality that they can't do it that way if the distribution has to be truly uniform.

insane coder said...

> It looks like there would be more than enough material for a real article, if you can find the time for it.

Unfortunately I have a backlog of 3 articles that are partially completed, and I keep getting requests from others (many regarding LibreSSL, especially lately regarding fork() protection), and I still have real code that needs to be written.

> A whole generation of programmers has grown up with the modulo trick, without realizing that it's subtly (and sometimes not so subtly) wrong.

Indeed, however it's so much worse, and there's many libraries and books which claim to offer something better, and sometimes they are a bit better, but fall short of the ideal.

Of the ones I reviewed, the algorithms are usually far off the mark. The only reason why I corrected the one from SPC is because it was so close to correctness. Perhaps in later printings of the book they even corrected it, I know they used to publish a lot of errata.

As I mentioned above, there's also so many other issues with random related functions. Needing random strings is common, and some of the solutions I've seen are horribly broken. If you think uniformity is bad, you should see how some developers try to generate a password which matches minimum requirements of certain classes of characters (I saw one implementation which affixes A0! to the end of the string, and fills the rest with mersenne twister % 26 + 'a'). I've only ever seen one correct implementation for this. Nobody gets floating point random correct either in a portable manner, and I haven't come across a single ideal implementation yet.

insane coder said...

Jorden, regarding your comment about people not necessarily knowing why modulo is wrong, I did write about it here with pertinent details and an example, along with a number of other issues: http://insanecoding.blogspot.com/2014/05/dealing-with-randomness.html

I tried to be brief with this and the other issues, leaving just enough to get the point across. If there needs to be more elaboration on certain issues or solutions, I can do so.

Jorden Verwer said...

Yes, you mentioned but the problem, but not really the solution, apart from "So in the above scenario, if 30 or 31 occurs, a new random number should be chosen". While that is technically correct, I would not trust most programmers to write a correct implementation based on that sentence alone. ;)

I came up with this approach independently (although, in my case, it was an optimization of my earlier "just keep requesting random numbers until the result is in range" approach, which (for single bytes) isn't actually as unreasonable a solution as it may sound) and wrote my own implementation a bit more than two years ago, only to find out that I wasn't the first to think of that solution. I know of several implementations, but you seem to have done more in-depth research.

It seems that the results of your testing would be an interesting basis for an article. Of course, I understand that you may have other priorities.