Tuesday, April 2, 2013


Designing C++ functions to write/save to any storage mechanism



Problem

A common issue when dealing with a custom object or any kind of data is to create some sort of save functionality with it, perhaps writing some text or binary to a file. So what is the correct C++ method to allow an object to save its data anywhere?

An initial approach to allow some custom object to be able to save its data to a file is to create a member function like so:
void save(const char *filename);
While this is perfectly reasonable, what if I want something more advanced than that? Say I don't want the data to be saved as its own separate file, but would rather the data be written to some file that is already open, to a particular location within it? What if I'd rather save the data to a database? How about send the data over the network?

Naive Approach


When C++ programmers hear the initial set of requirements, they generally look to one of two solutions:

The first is to allow for a save function which can take an std::ostream, like so:
void save(std::ostream &stream);
C++ out of the box offers std::cout as an instance of an std::ostream which writes to the screen. C++ offers a derived class std::ofstream (std::fstream) which can save to files on disk. C++ also offers a derived class std::ostringstream which saves file to a C++ string.

With these options, you can display the data on the screen, save it to an actual file, or save it to a string, which you can then in turn save it wherever you want.

The next option programmers look to is to overload  std::basic_ostream::operator<< for the custom object. This way one can simply write:
mystream << myobject;
And then the object can be written to any C++ stream.

Either of these techniques pretty much work, but can be a bit annoying when you want a lot of flexibility and performance.

Say I wanted to save my object over the network, what do I do? I could save it to a string stream, grab the string, and then send that over the network, even though that seems a bit wasteful.

And for a similar case, say I have an already open file descriptor, and wish to save my object to it, do I also use a string stream as an intermediary?

Since C++ is extensible, one could actually create their own std::basic_streambuf derived class which works with file descriptors, and attach it to an std::ostream, which can then be used with anything that works with a stream for output. I'm not going to go into the details how to do that here, but The C++ Standard Library explains the general idea, and provides a working file descriptor streambuf example and shows how to use it with stream functions. You can also find some ready made implementations online with a bit of searching, and some compilers may even include a solution out of the box in their C++ extensions.

On UNIX systems, once you have a stream which works with file descriptors, you can now send data over the network, as sockets themselves are file descriptors. On Windows, you'll need a separate class which works with SOCKETs. Of course to turn a file descriptor streambuf into a SOCKET streambuf is trivial, and can probably be done with a few well crafted search and replace commands.

Now this may have solved the extra string overhead with file descriptors and networking, but what about if I want to save to a database? What about if I'm working with C's FILE *? Does one now have to implement a new wrapper for each of these (or pray the compiler offers an extension, or one can be found online)? The C++ stream library is actually a bit bloaty, and creating your own streambufs is somewhat annoying, especially if you want to do it right and allow for buffering. Many stream related library code you find online are also of poor quality. Surely there must be a better option, right?

Solution


If we look back at how C handles this problem, it uses function pointers, where the function doing the writing receives a callback to use for the actual writing, and the programmer using it can make the writing go anywhere. C++ of course includes this ability, and even takes it much further, in the form of function objects, and even further in C++ 2011.

Let's start with an example.
template<typename WriteFunction>
void world(WriteFunction func)
{
  //Do some stuff...
  //Do some more stuff...
  func("World", 5); //Write 5 characters via callback
  //Do some more stuff...
  unsigned char *data = ...;
  func(data, data_size); //Write some bytes
}
The template function above is expecting any function pointer which can be used to write data by passing it a pointer and a length. A proper signature would be something like the following:
void func(const void *data, size_t length);
Creating such a function is trivial. However, to be useful, writing needs to also include a destination of some sort, a device, a file, a database row, and so on, which makes function objects more powerful.
#include <cstdio>

class writer_file
{
  std::FILE *handle;
  public:
  writer_file(std::FILE *handle) : handle(handle) {}
  inline void operator()(const void *data, size_t length)
  {
    std::fwrite(data, 1, length, handle);
  }
};
Which can be used as follows:
world(writer_file(stdout));
Or perhaps:
std::FILE *fp = fopen("somefile.bin", "wb");
world(writer_file(fp));
std::close(fp);
As can be seen, our World function can write to any FILE *.

To allow any char-based stream to be written, the following function object will do the trick:
#include <ostream>

class writer_stream
{
  std::ostream *handle;
  public:
  writer_stream(std::ostream &handle) : handle(&handle) {}
  inline void operator()(const void *data, size_t length)
  {
    handle->write(reinterpret_cast<const char *>(data), length);
  }
};
You can call this with:
world(writer_stream(std::cout));
Or anything in the ostream family.

If for some reason we wanted to write to strings, it's easy to create a function object for them too, and we can use the string directly without involving a string stream.
#include <string>

class writer_string
{
  std::string *handle;
  public:
  writer_string(std::string &handle) : handle(&handle) {}
  inline void operator()(const void *data, size_t length)
  {
    handle->append(reinterpret_cast<const char *>(data), length);
  }
};
If you're worried about function objects being slow, then don't. Passing a function object like this to a template function has no overhead. The compiler is able to see a series of direct calls, and throws all the extraneous details away. It is as if the body of World is calling the write function to the handle passed to it directly. For more information, see Effective STL Item 46.

If you're wondering why developers forgo function pointers and function objects for situations like this, it is because C++ offers so much with its stream classes, which are also very extensible (and are often extended), they completely forget there are other options. The stream classes are also designed for formatting output, and working with all kinds of special objects. But if you just need raw writing or saving of data, the stream classes are overkill.

C++ 2011


Now C++ 2011 extends all this further in a multiple of ways.

std::bind()


First of all, C++ 2011 offers std::bind() which allows for creating function object adapters on the fly. std::bind() can take an unlimited amount of parameters. The first must be a function pointer of some sort, the next is optionally an object to work on in the case of a member function pointer, followed by the parameters to the function. These parameters can be hard coded by the caller, or bound via placeholders by the callee.

Here's how you would use std::bind() for using fwrite():
#include <functional>
world(std::bind(std::fwrite, std::placeholders::_1, 1, std::placeholders::_2, stdout));
Let us understand what is happening here. The function being called is std::fwrite(). It has 4 parameters. It's first parameter is the first parameter by the callee, denoted by std::placeholders::_1. The second parameter is being hard coded to 1 by the caller. The third parameter is the second parameter from the callee denoted by std::placeholders::_2. The fourth parameter is being hardcoded by the caller to stdout. It could be set to any FILE * as needed by the caller.

Now we'll see how this works with objects. To use with a stream, the basic approach is as follows:
world(std::bind(&std::ostream::write, &std::cout, std::placeholders::_1, std::placeholders::_2));
Note how we're turning a member function into a pointer, and we're also turning cout into a pointer so it can be passed as std::ostream::write's this pointer. The callee will pass its first and second parameters as the parameters to the stream write function. However, the above has a slight flaw, it will only work if writing is done with char * data. We can solve that with casting.
world(std::bind(reinterpret_cast<void (std::ostream::*)(const void *, size_t)>(&std::ostream::write), &std::cout, std::placeholders::_1, std::placeholders::_2));
Take a moment to notice that we're not just casting it to the needed function pointer, but as a member function pointer of std::ostream.

You might find doing this a bit more comfortable than using classical function objects. However, function objects still have their place, wherever functions do. Remember, functions are about re-usability, and some scenarios are complicated enough that you want to pull out a full blown function object.

For working with file descriptors, you might be tempted to do the following:
world(std::bind(::write, 1, std::placeholders::_1, std::placeholders::_2));
This here will have World write to file descriptor 1 - generally standard output. However this simple design is a mistake. Write can be interrupted by signals and needs to be resumed manually (by default, except on Solaris), among other issues, especially if the file descriptor is some kind of pipe or a socket. A proper write would be along the following lines:
#include <system_error>
#include <unistd.h>

class writer_fd
{
  int handle;
  public:
  writer_fd(int handle) : handle(handle) {}
  inline void operator()(const void *data, size_t length)
  {
    while (length)
    {
      ssize_t r = ::write(handle, data, length);
      if (r > 0) { data = static_cast<const char *>(data)+r; length -= r; }
      else if (!r) { break; }
      else if (errno != EINTR) { throw std::system_error(errno, std::system_category()); }
    }
  }
};

Lambda Functions

Now you might be wondering, why C++ 2011 stopped with std::bind(), what if the function body needs more than just a single function call that can be wrapped up in an adapter? That's where lambda functions come in.
world([&](const void *data, size_t length){ std::fwrite(data, 1, length, stdout); });
world([&](const void *data, size_t length){ std::cout.write(static_cast<const char *>(data), length); });
Note the ridiculous syntax. The [](){} combination signifies we are working with a lambda function. The [] receives a function scope, in this case &, which means that the function operates fully within its parent-scope, and has direct access to all its data. The rest you should already be well familiar with. You can change the stdout or the cout in the body of the lambda function to use your FILE * or ostream as necessary.

Let us look at an example of having our World function write directly to a buffer.
#include <cstring>

void *p = ...; //Point p at some buffer which has enough room to hold the contents needed to be written to it.
world([&](const void *data, size_t length){ std::memcpy(p, data, length); p = static_cast<char *>(p) + length; });
There's a very important point in this example. There is a pointer which is initialized to where writing should begin. Every time data is written, the pointer is incremented. This ensures that if World calls the passed write function multiple times, it will continue to work correctly. This was not needed for files above, as their write pointer increments automatically, or with std::string, where append always writes to the end, wherever it now is.

Be careful writing like this though, you must ensure in advance that your buffer is large enough, perhaps if your object has a way of reporting how much data the next call to its save or write function needs to generate. If it doesn't and you're winging it, something like the following is in order:
#include <stdexcept>

class writer_buffer
{
  void *handle, *limit;
  public:
  writer_buffer(void *handle, size_t limit) : handle(handle), limit(static_cast(handle)+limit) {}
  inline void operator()(const void *data, size_t length)
  {
    if ((static_cast<char *>(handle) + length) > limit) { throw std::out_of_range("writer_buffer"); }
    std::memcpy(handle, data, length);
    handle = static_cast<char *>(handle) + length;
  }
};
You can use it as follows:
#include <cstdlib>

size_t amount = 1024; //A nice number!
void *buffer = std::malloc(amount);
world(writer_buffer(buffer, amount));
Now an exception will be thrown if the callee tries to write more data than it should.

std::function

Lastly, C++ 2011 added the ability for more verbose type checking on function objects, and the ability to create the save/write function as a normal function as opposed to a template function. That ability is a general reusable function object facade, std::function.

To rewrite World to use it, we'd do as follows:
void world(std::function<void (const void *, size_t)> func)
{
  //Do some stuff...
  //Do some more stuff...
  func("World", 5); //Write 5 characters via callback
  //Do some more stuff...
  unsigned char *data = ...;
  func(data, data_size); //Write some bytes
}
With std::function, the type is now made explicit instead of being a template. It is anything which receives any kind of buffer and its length, and returns nothing. This can ensure that callers will always use a compatible function as intended by the library designer. For example, in our case, the caller only needs to ensure that data can be passed via a char * and an unsigned char *, based on how World uses the callback function. If World was now modified to also output an int *, less capable callers would now break. std::function can ensure that things are designed properly up front. With std::function, you can now also restructure your code to place various components in different compilation units if you so desire, although perhaps at a performance penalty.

Conclusion

To wrap up, you should now understand some features of C++ that are not as commonly used, or some new features of C++ 2011 that you may not be familiar with. You should now also have some ideas about generic code which should help you improve code you write.

Many examples above were given only with one methodology, although they can be implemented with some of the others. For practice, try doing this yourself. Also try applying these ideas to other kinds of storage mechanisms not covered here, doing so should now be rather trivial for you.

Remember, while this was done with a few standard examples and for writing, it can be extended to all handles Win32 offers, or for reading, or for anything else.