Saturday, May 10, 2014

Copying code != Copying implementation

A little over a week ago, I wrote an article about some common porting mistakes for a new library - LibreSSL. I've since received multiple questions about copying code from other projects. Mainly, if the original project uses certain code or another popular library is using some compatibility code, what is wrong with copying one of those directly?

The most obvious problem with copying from some compatibility library is that they may not be written properly. The primary concern behind most compatibility libraries is that things should compile. Just because things compile doesn't mean errors and all scenarios are being handled properly, or that the implementation is in any way secure. The popular libbsd and many GNU library shims may seem to get the job done, but only superficially.

However, the real problem is that copying some code does not mean you copied the implementation. There's a lot more to an implementation than just some code. For example, perhaps the source in question is being compiled with certain parameters or with special compilers, and the code is written specifically for that situation. If you copied the code but not the build environment, you only nabbed part of the implementation.

Another point to consider is if the code you copied is even used how you think it is. Let's look at a function from NetBSD:

consttime_memequal(const void *b1, const void *b2, size_t len)
 const char *c1 = b1, *c2 = b2;
 int res = 0;

 while (len --)
  res |= *c1++ ^ *c2++;

  * If the compiler for your favourite architecture generates a
  * conditional branch for `!res', it will be a data-dependent
  * branch, in which case this should be replaced by
  * return (1 - (1 & ((res - 1) >> 8)));
  * or rewritten in assembly.
 return !res;
Note how the comment here says that this code may not function correctly on all architectures. In fact, some architectures may have an implementation in assembly which works correctly, whereas the C version does not.

NetBSD, like most C libraries, offers assembly variants for certain functions on certain architectures. If you copy a C file from it without the assembly files for the architecture in question, then the implementation was not copied, and in fact, may even produce incorrect results. You can't always depend on scanning comments either, as previous versions of this C file didn't contain the comment.

Now, let's say you copied files and their build environment, including possible assembly variants, does that now mean the implementation was definitively copied? No! Some functions used in the copied code may actually depend on certain characteristics within this particular implementation. Let me provide an example. Let's say consttime_memequal() was implemented as follows:

int consttime_memequal(const void *s1, const void *s2, size_t n)
   return !memcmp(s1, s2, n);
consttime_memequal() must run in a constant amount of time regardless if whether its two parameters are equal or not. Yet this version here wraps directly to memcmp(), which does not make such guarantees. Assuming this version was correct on the platform in question, it is correct because of an external factor, not because of the C code presented here.

Some compilers may provide their own version of memcmp() which they'll use instead of the C library's, such as GCC. Perhaps some compiler provides memcmp(), along with a compile option to make all memcmp() operations constant-time. The file containing this function can then be compiled with constant-time-memcmp enabled, and now everything else can call consttime_memequal() without any special compilers or build options.

Now while the above is possible, it's not the only possibility. Perhaps memcmp() on the platform in question just happens to be constant-time for some reason. In such a scenario, even if you used the exact same compiler with the exact same build options, and the exact same code, your implementation is still incorrect, because your underlying functions behave differently.

Bottom line, just copying code is incorrect. If some function is supposed to be doing something with certain properties beyond just fulfilling some activity (perhaps constant time, or defying compiler optimizations), then you must review the entire implementation and determine what aspect of it gives it this property. Without doing so, you're only fooling yourself into believing you copied an implementation.

Implementation copying checklist:
  • Understand what you're attempting to copy, and what properties it carries.
  • Ensure what you're copying actually performs its objectives.
  • Ensure you copy the entire implementation.
  • Ensure there's nothing about the compiler, compile options, or other aspects of the build environment which you forgot to copy.
  • Ensure you copy any needed alternatives for certain architectures.
  • Ensure the code does not depend on different implementations of functions you already have.
  • Test everything for correctness, including matching output for error situations, and extra properties.


insane coder said...

On this topic, The Hidden Cost of Software Reuse may also be of interest.

umphy said...

Just out of curiosity, how applicable do you think this applies to higher level programming languages?

insane coder said...

Extremely applicable. All the time I'm running into some JavaScript snippet which works great with the built in JS standard library implementation on one browser, but not on another.

You'll run into these issues wherever you have different compilers, compiler options, or different standard library implementations for the language in question.

However, for many of the higher level programming languages, there's often only a single implementation, and little to no options, so you won't find these issues there at first. They begin to pop up once a second implementation is created and becomes popular.