Tuesday, April 2, 2013

Designing C++ functions to write/save to any storage mechanism

Problem

A common issue when dealing with a custom object or any kind of data is to create some sort of save functionality with it, perhaps writing some text or binary to a file. So what is the correct C++ method to allow an object to save its data anywhere?

An initial approach to allow some custom object to be able to save its data to a file is to create a member function like so:
void save(const char *filename);
While this is perfectly reasonable, what if I want something more advanced than that? Say I don't want the data to be saved as its own separate file, but would rather the data be written to some file that is already open, to a particular location within it? What if I'd rather save the data to a database? How about send the data over the network?

Naive Approach


When C++ programmers hear the initial set of requirements, they generally look to one of two solutions:

The first is to allow for a save function which can take an std::ostream, like so:
void save(std::ostream &stream);
C++ out of the box offers std::cout as an instance of an std::ostream which writes to the screen. C++ offers a derived class std::ofstream (std::fstream) which can save to files on disk. C++ also offers a derived class std::ostringstream which saves file to a C++ string.

With these options, you can display the data on the screen, save it to an actual file, or save it to a string, which you can then in turn save it wherever you want.

The next option programmers look to is to overload  std::basic_ostream::operator<< for the custom object. This way one can simply write:
mystream << myobject;
And then the object can be written to any C++ stream.

Either of these techniques pretty much work, but can be a bit annoying when you want a lot of flexibility and performance.

Say I wanted to save my object over the network, what do I do? I could save it to a string stream, grab the string, and then send that over the network, even though that seems a bit wasteful.

And for a similar case, say I have an already open file descriptor, and wish to save my object to it, do I also use a string stream as an intermediary?

Since C++ is extensible, one could actually create their own std::basic_streambuf derived class which works with file descriptors, and attach it to an std::ostream, which can then be used with anything that works with a stream for output. I'm not going to go into the details how to do that here, but The C++ Standard Library explains the general idea, and provides a working file descriptor streambuf example and shows how to use it with stream functions. You can also find some ready made implementations online with a bit of searching, and some compilers may even include a solution out of the box in their C++ extensions.

On UNIX systems, once you have a stream which works with file descriptors, you can now send data over the network, as sockets themselves are file descriptors. On Windows, you'll need a separate class which works with SOCKETs. Of course to turn a file descriptor streambuf into a SOCKET streambuf is trivial, and can probably be done with a few well crafted search and replace commands.

Now this may have solved the extra string overhead with file descriptors and networking, but what about if I want to save to a database? What about if I'm working with C's FILE *? Does one now have to implement a new wrapper for each of these (or pray the compiler offers an extension, or one can be found online)? The C++ stream library is actually a bit bloaty, and creating your own streambufs is somewhat annoying, especially if you want to do it right and allow for buffering. Many stream related library code you find online are also of poor quality. Surely there must be a better option, right?

Solution


If we look back at how C handles this problem, it uses function pointers, where the function doing the writing receives a callback to use for the actual writing, and the programmer using it can make the writing go anywhere. C++ of course includes this ability, and even takes it much further, in the form of function objects, and even further in C++ 2011.

Let's start with an example.
template<typename WriteFunction>
void world(WriteFunction func)
{
  //Do some stuff...
  //Do some more stuff...
  func("World", 5); //Write 5 characters via callback
  //Do some more stuff...
  unsigned char *data = ...;
  func(data, data_size); //Write some bytes
}
The template function above is expecting any function pointer which can be used to write data by passing it a pointer and a length. A proper signature would be something like the following:
void func(const void *data, size_t length);
Creating such a function is trivial. However, to be useful, writing needs to also include a destination of some sort, a device, a file, a database row, and so on, which makes function objects more powerful.
#include <cstdio>

class writer_file
{
  std::FILE *handle;
  public:
  writer_file(std::FILE *handle) : handle(handle) {}
  inline void operator()(const void *data, size_t length)
  {
    std::fwrite(data, 1, length, handle);
  }
};
Which can be used as follows:
world(writer_file(stdout));
Or perhaps:
std::FILE *fp = fopen("somefile.bin", "wb");
world(writer_file(fp));
std::close(fp);
As can be seen, our World function can write to any FILE *.

To allow any char-based stream to be written, the following function object will do the trick:
#include <ostream>

class writer_stream
{
  std::ostream *handle;
  public:
  writer_stream(std::ostream &handle) : handle(&handle) {}
  inline void operator()(const void *data, size_t length)
  {
    handle->write(reinterpret_cast<const char *>(data), length);
  }
};
You can call this with:
world(writer_stream(std::cout));
Or anything in the ostream family.

If for some reason we wanted to write to strings, it's easy to create a function object for them too, and we can use the string directly without involving a string stream.
#include <string>

class writer_string
{
  std::string *handle;
  public:
  writer_string(std::string &handle) : handle(&handle) {}
  inline void operator()(const void *data, size_t length)
  {
    handle->append(reinterpret_cast<const char *>(data), length);
  }
};
If you're worried about function objects being slow, then don't. Passing a function object like this to a template function has no overhead. The compiler is able to see a series of direct calls, and throws all the extraneous details away. It is as if the body of World is calling the write function to the handle passed to it directly. For more information, see Effective STL Item 46.

If you're wondering why developers forgo function pointers and function objects for situations like this, it is because C++ offers so much with its stream classes, which are also very extensible (and are often extended), they completely forget there are other options. The stream classes are also designed for formatting output, and working with all kinds of special objects. But if you just need raw writing or saving of data, the stream classes are overkill.

C++ 2011


Now C++ 2011 extends all this further in a multiple of ways.

std::bind()


First of all, C++ 2011 offers std::bind() which allows for creating function object adapters on the fly. std::bind() can take an unlimited amount of parameters. The first must be a function pointer of some sort, the next is optionally an object to work on in the case of a member function pointer, followed by the parameters to the function. These parameters can be hard coded by the caller, or bound via placeholders by the callee.

Here's how you would use std::bind() for using fwrite():
#include <functional>
world(std::bind(std::fwrite, std::placeholders::_1, 1, std::placeholders::_2, stdout));
Let us understand what is happening here. The function being called is std::fwrite(). It has 4 parameters. It's first parameter is the first parameter by the callee, denoted by std::placeholders::_1. The second parameter is being hard coded to 1 by the caller. The third parameter is the second parameter from the callee denoted by std::placeholders::_2. The fourth parameter is being hardcoded by the caller to stdout. It could be set to any FILE * as needed by the caller.

Now we'll see how this works with objects. To use with a stream, the basic approach is as follows:
world(std::bind(&std::ostream::write, &std::cout, std::placeholders::_1, std::placeholders::_2));
Note how we're turning a member function into a pointer, and we're also turning cout into a pointer so it can be passed as std::ostream::write's this pointer. The callee will pass its first and second parameters as the parameters to the stream write function. However, the above has a slight flaw, it will only work if writing is done with char * data. We can solve that with casting.
world(std::bind(reinterpret_cast<void (std::ostream::*)(const void *, size_t)>(&std::ostream::write), &std::cout, std::placeholders::_1, std::placeholders::_2));
Take a moment to notice that we're not just casting it to the needed function pointer, but as a member function pointer of std::ostream.

You might find doing this a bit more comfortable than using classical function objects. However, function objects still have their place, wherever functions do. Remember, functions are about re-usability, and some scenarios are complicated enough that you want to pull out a full blown function object.

For working with file descriptors, you might be tempted to do the following:
world(std::bind(::write, 1, std::placeholders::_1, std::placeholders::_2));
This here will have World write to file descriptor 1 - generally standard output. However this simple design is a mistake. Write can be interrupted by signals and needs to be resumed manually (by default, except on Solaris), among other issues, especially if the file descriptor is some kind of pipe or a socket. A proper write would be along the following lines:
#include <system_error>
#include <unistd.h>

class writer_fd
{
  int handle;
  public:
  writer_fd(int handle) : handle(handle) {}
  inline void operator()(const void *data, size_t length)
  {
    while (length)
    {
      ssize_t r = ::write(handle, data, length);
      if (r > 0) { data = static_cast<const char *>(data)+r; length -= r; }
      else if (!r) { break; }
      else if (errno != EINTR) { throw std::system_error(errno, std::system_category()); }
    }
  }
};

Lambda Functions

Now you might be wondering, why C++ 2011 stopped with std::bind(), what if the function body needs more than just a single function call that can be wrapped up in an adapter? That's where lambda functions come in.
world([&](const void *data, size_t length){ std::fwrite(data, 1, length, stdout); });
world([&](const void *data, size_t length){ std::cout.write(static_cast<const char *>(data), length); });
Note the ridiculous syntax. The [](){} combination signifies we are working with a lambda function. The [] receives a function scope, in this case &, which means that the function operates fully within its parent-scope, and has direct access to all its data. The rest you should already be well familiar with. You can change the stdout or the cout in the body of the lambda function to use your FILE * or ostream as necessary.

Let us look at an example of having our World function write directly to a buffer.
#include <cstring>

void *p = ...; //Point p at some buffer which has enough room to hold the contents needed to be written to it.
world([&](const void *data, size_t length){ std::memcpy(p, data, length); p = static_cast<char *>(p) + length; });
There's a very important point in this example. There is a pointer which is initialized to where writing should begin. Every time data is written, the pointer is incremented. This ensures that if World calls the passed write function multiple times, it will continue to work correctly. This was not needed for files above, as their write pointer increments automatically, or with std::string, where append always writes to the end, wherever it now is.

Be careful writing like this though, you must ensure in advance that your buffer is large enough, perhaps if your object has a way of reporting how much data the next call to its save or write function needs to generate. If it doesn't and you're winging it, something like the following is in order:
#include <stdexcept>

class writer_buffer
{
  void *handle, *limit;
  public:
  writer_buffer(void *handle, size_t limit) : handle(handle), limit(static_cast(handle)+limit) {}
  inline void operator()(const void *data, size_t length)
  {
    if ((static_cast<char *>(handle) + length) > limit) { throw std::out_of_range("writer_buffer"); }
    std::memcpy(handle, data, length);
    handle = static_cast<char *>(handle) + length;
  }
};
You can use it as follows:
#include <cstdlib>

size_t amount = 1024; //A nice number!
void *buffer = std::malloc(amount);
world(writer_buffer(buffer, amount));
Now an exception will be thrown if the callee tries to write more data than it should.

std::function

Lastly, C++ 2011 added the ability for more verbose type checking on function objects, and the ability to create the save/write function as a normal function as opposed to a template function. That ability is a general reusable function object facade, std::function.

To rewrite World to use it, we'd do as follows:
void world(std::function<void (const void *, size_t)> func)
{
  //Do some stuff...
  //Do some more stuff...
  func("World", 5); //Write 5 characters via callback
  //Do some more stuff...
  unsigned char *data = ...;
  func(data, data_size); //Write some bytes
}
With std::function, the type is now made explicit instead of being a template. It is anything which receives any kind of buffer and its length, and returns nothing. This can ensure that callers will always use a compatible function as intended by the library designer. For example, in our case, the caller only needs to ensure that data can be passed via a char * and an unsigned char *, based on how World uses the callback function. If World was now modified to also output an int *, less capable callers would now break. std::function can ensure that things are designed properly up front. With std::function, you can now also restructure your code to place various components in different compilation units if you so desire, although perhaps at a performance penalty.

Conclusion

To wrap up, you should now understand some features of C++ that are not as commonly used, or some new features of C++ 2011 that you may not be familiar with. You should now also have some ideas about generic code which should help you improve code you write.

Many examples above were given only with one methodology, although they can be implemented with some of the others. For practice, try doing this yourself. Also try applying these ideas to other kinds of storage mechanisms not covered here, doing so should now be rather trivial for you.

Remember, while this was done with a few standard examples and for writing, it can be extended to all handles Win32 offers, or for reading, or for anything else.

Tuesday, March 19, 2013

OAuth - A great way to cripple your API

Intro

 A few years ago, the big social networking sites were looking for a secure way to allow their users to safely use any and all untrusted software to access their own personal accounts. So that a user could have their account on one social networking site safely interact with their account on another social networking site. They also wanted to allow for users to be able to allow their accounts to interact safely with various untrusted web applications and web sites.

In order to safely allow untrusted software to access a user's account, a few points needed to be kept in mind.
  • Untrusted software should not have access to a user's credentials, in case the software is compromised, passwords will not be stolen, as user's passwords are not stored by the software.
  • Untrusted software should not have full access to a user's account, but only limited access as defined by the user. In the same vein, giving the software user's personal credentials will allow unlimited access to a user's account, which could be used maliciously.
  • A user should be able to revoke the permission they granted to allow particular untrusted software to work, in case they turn malicious, despite the limited access they have.
A solution that was developed for this use-case was created - OAuth. OAuth's primary use-case is the one described above.

Implementation

OAuth is generally implemented in a fashion where the untrusted software is accessed by a user via a standard web browser. In order for a user to authorize that software, the software will redirect the user's browser to a page on the social networking site, with a couple of parameters sent to it. Now that the user is on his social networking site, the untrusted software no longer has control over what the user is doing, as the user left the untrusted software, allowing the user to safely log in to his social networking account. Then, the social networking site will see the parameters it was passed, and ask the user if he or she wants to authorize the software in question, and what kind of access to the user's account it should be given.

If and when the user has authorized the untrusted software, the social networking site can then report back to the untrusted software that it was granted a certain amount of access, and give it some credentials to use. These credentials are unique to the user and the software in question. The social networking site allows for a user to later on see a list of software authorized, and revoke the unique credentials given to any one of them, or modify the amount of access a particular set of credentials has.

Problems

There is no standard

Now above, I said OAuth is generally implemented in this fashion. I say generally, because unlike standard HTTP authentication schemes (Basic, Digest, and others), OAuth is a big grab bag of ideas which can be mixed and matched in an infinite amount of ways, and also allows for developers to make their own unique tweaks to their personal implementations.

With the standard HTTP authentication schemes, once a developer knows the protocol, and implemented it with one web site, that exact same knowledge can be reused to logging into any other web site that supports the standard. Likewise, software libraries can be made to handle HTTP authentication, and all a third party developer needs to do is specify to the library which credentials should be used, and then everything works as expected.

With OAuth, once a developer learns how to authenticate with it to one web site, it is likely that the same developer will need to relearn how to connect to every other web site using OAuth. This further means that every web site which supports OAuth needs to document exactly how it is implementing it, and what tweaks are in use. It also means that no library can be written which can simply support every OAuth implementation out there. This places a great burden on developers on both sides. It can also greatly increase frustration for less able users, when their favorite library works great with one site they support, but are unable to extend their software to another site, because their library lacks some aspect of OAuth for this new site, or a unique OAuth tweak on this new site renders the library incompatible.

Here are some choice quotes from the official RFC:
  • However, as a rich and highly extensible framework with many optional components, on its own, this specification is likely to produce a wide range of non-interoperable implementations.
  • This framework was designed with the clear expectation that future work will define prescriptive profiles and extensions necessary to achieve full web-scale interoperability.

Another issue is that while various standard HTTP authentication schemes have well understood security margins, OAuth is entirely variable. On one web site, its OAuth implementation may be secure, while on another, its implementation may be Swiss cheese. Even though a security consultant should generally look over any authorization implementation to ensure mistakes were not made, laymen can have a good understanding how reliable and secure their standardized authentication scheme is. A manager can ask their developers if some bullet points were adhered to, and then be reasonably confident in their security. Whereas with OAuth, the entire (complex) implementation needs to be reviewed from top to bottom by a top security professional to ensure it is secure. A manager has no adequate set of bullet points to discuss with their developers, as unique implementation details will drastically change the applicability of various points, with many important details still missing. At best, a manager can only get a false sense of security when OAuth is in use. The designers of OAuth respond to this point with a 71 page document of security issues that need to be dealt with!

What all this boils down to, is that OAuth is really just a set of ideas and recommendations on how to implement a unique authorization scheme. OAuth does not allow for interoperability as standards (usually) guarantee. Essentially, OAuth ensures that API authentication with your web site will be confusing, and will only work for the exact use-case for social networking sites described above.

Crippling Design - APUI

The common OAuth use-case described above is to allow for a user on one web site to allow their software to communicate for them with another web site. The workflow described above requires that a user navigate between web sites using their browser and manually enter their credentials and authorization to various aspects of their account as part of the overall software authorization process.

This means that OAuth (at least how it's normally implemented) only works between two web sites, with individual users, and that user intervention is required in order for software to authenticate with another web site. The fact that user intervention is required with manual input means that any API behind OAuth is not an API - Application Programming Interface, but an APUI - Application Programming User Interface, meaning user intervention is required for functionality. All in all, this cripples your API, or APUI as it should now properly be called.

Let us now focus on what cannot be properly done with OAuth in place:
  • Third party software which is not part of a web site cannot interact with an OAuth APUI.
  • One user in third party software cannot act on behalf of another user via an OAuth APUI.
  • Third party software cannot run automated processes on an OAuth APUI.
  • Organizations cannot ensure tight integration between various software and services they use.
Let us review an example case where a company launches a new online service for coordinating schedules and personal calendars between people. We'll call this new online service Calendar. The developers of Calendar create an APUI for it using OAuth, so third party software can integrate with Calendar.

As described above, a user needs to navigate between one web site and another in order to authorize software, with the two web sites sending each other information. What if one wants to integrate something which isn't a web site with Calendar? There's tons of desktop calendaring applications, Microsoft Outlook, Mozilla Lightning, Evolution, KOrganizer, Kontact, and more. If their developers want to integrate with the Calendar service, they need to embed a web browser in their applications now?

Furthermore, that may not even help, if the OAuth workflow requires information be sent to an existing web site, what web site is associated with a user's personal copy of software? Developers of that software are unlikely to create a web site with user accounts just for this integration. Even if they did, if the site goes down (temporarily), then the integration stops working. Even though, in theory, there should be no dependance of some extra web site between Calendar and the software trying to integrate with it.

Insecure

Also, as a security consideration, if an application does embed a web browser in it so it can authenticate with OAuth, the first requirement is no longer met - Untrusted software should not have access to a user's credentials. When using a standard web browser, once a user leaves the third party software's site and redirects to Calendar's site, the third party software cannot steal the information the user is entering. But when the third party software itself is the one which browses to Calendar's site, then it has access to everything the user is entering, including passwords.

Actually, this attack works on OAuth in every circumstance, including web site to web site, and I've actually used it in practice. A web site can embed a web browser via a Java Applet or similar, or have a web browser server side which presents the OAuth log in page to the user, but slightly modified to have all the data entered pass through the third party site. Therefore OAuth doesn't even fulfill its own primary security objective!

Incompatible with Enterprise

Next, once we go to the enterprise level, OAuth starts becoming much worse. Say a boss has his secretary manage his calendar for him, which is a very common scenario. In many OAuth setups, he cannot enter his Calendar credentials into the company-wide calendaring software running on a secure company server, which the secretary can then access. Rather, he would need to enter it directly in the browser the secretary uses, and stay there checking off many options. Also it is common that OAuth implementations are using security tokens which expire, meaning the boss will need to keep reentering his Calendar credentials again and again. Most bosses will just get fed up and give his secretary his credentials, especially if he's not physically near his secretary. It is also likely that the secretary will then write the password down. This gets compounded if multiple secretaries manage his schedule.

Now say an organization has software for which it would like to run background processes for all its users with Calendar. Perhaps every week it would like to automatically analyze the Calendar accounts of all department chiefs to find a good time for a weekly meeting. How would this work exactly? Every department chief would now have to go and enter their credentials into this software? Then do it again each time security tokens expire from past authentications?


Of course this is only the beginning. Since OAuth was designed for the likes of Twitter and Facebook which cater to individual personal accounts, the common implementations do not allow for hierarchical permissions as enterprise organizations need.

In enterprise infrastructure, software is already used which has well defined roles and access rights for ever single user. This infrastructure should be deciding who can do what, and to what level one user can act on behalf of another user. Once an organization purchases accounts for all their employees from Calendar, no employee should be able to turn off or limit what the enterprise management software can do with their Calendar account. The enterprise management software is the one who needs to make such decisions. This flies in the face of Untrusted software should not have full access to a user's account, but only limited access as defined by the user and A user should be able to revoke the permission they granted to allow particular untrusted software to work.

The amount of work involved to tightly integrate Calendar with existing infrastructure is also enormous from a user perspective. Every single account now has to have a user navigate across web pages and check all kinds of boxes until all the users are integrated. OAuth implementations generally forget to have administrator accounts which can do everything on behalf of other users in their organization.

It should be obvious at this point whether enterprise organizations would be willing to purchase accounts for their entire staff with Calendar, when integration into their existing infrastructure or new endeavors with the service will be difficult if not outright impossible. Imagine how much money Calendar sales now stand to lose from being unattractive to enterprise clients.

Repel third party developers

OAuth implementations also generally require that any software which uses their APUIs be preregistered in advance. This places extra burden on third party developers. The OAuth implementations also commonly require that the third party web site integrating be preregistered in advance, so you can say goodbye to your application being compatible with staging servers which may be at another URL. It also makes software much less attractive when one party sells software to another party, as every new client now needs to go through an application registration process for the URLs they plan to setup the software they are purchasing. Therefore third party developers are less likely to be interested in selling software which integrates with Calendar, as there's hassle involved with every single sale.

Recap

  • OAuth is not a standard, but a set of ideas, which does not allow third party developers to reuse knowledge, and places extra documentation burden on implementers.
  • OAuth as a whole has undefined security characteristics.
  • OAuth doesn't properly fulfill its primary security objectives.
  • OAuth doesn't work well outside social networking web site use-cases.
  • OAuth services are unattractive to enterprise organizations looking to integrate such services into their infrastructure.
  • OAuth services are less likely to have professional third parties sell software based upon them.


Solutions


If you're looking to implement authorization for your API, I recommend to sticking with well understood secure designs, such as HTTP Basic Authentication over SSL/TLS (or HTTP Digest Authentication).

In order to achieve a situation where users can securely authorize third party software, without giving over their personal credentials (passwords), I recommend that these services have a page where they can generate new credentials (keys) which the user can copy and paste. They can then name these keys themselves (avoiding application registration hassle), and set permissions upon them themselves. Since the user is the one initiating the key creation, and copying and pasting it themselves, they cannot fall prey to a man-in-the-middle attack where the third party software initiates the authorization process.

But remember the use-cases described here, and ensure that organizations have a way to access all user accounts company-wide, and without individual users being able to disable or limit that access.

Conclusion


If you want your service to be used by everyone out there, be well supported by third parties, and to have them create all kinds of interesting software with it, do not use OAuth.

Even the original social networking sites behind OAuth decided they really need other options for different use-cases, such as Twitter's xAuth, or Yahoo offering Direct OAuth, which turns the entire scheme into a more complicated version of HTTP Basic Authentication, with no added benefits. Perhaps the most damaging point against OAuth, is that the original designer behind it decided to remove his name from the specification, and is washing his hands clean of it.

I find it really amazing at how blind many big players are these days to all the problems with OAuth. When I first heard that IBM, a major enterprise player started offering services only accessible with typical utterly crippled OAuth, I was overcome with disbelief. Yet at the same time, I hear that they're wondering why they're not seeing the sales they used to with other services they offer, or compared to the competition.

I'm even more amazed to see other big companies throwing away their currently working authentication systems for OAuth. Followed by them wondering why many third party developers are not upgrading to support the new authentication scheme, and clients jumping ship to inferior services.

It seems many developers and managers out there simply don't get it. If you know anyone like that, show them this article.