Tuesday, July 20, 2010


Time to shutdown the factory for code violations



There's a common problem in software development regarding how to create an object's type dynamically in C++ and similar languages. If I have a collection of classes (they all inherit from the same base), I can use a base pointer to any instance, and perform operations as needed. The issue is generating that pointer in the first place.

Say I have an image class, with subclasses of image_png, image_jpeg, image_gif, and so on. I can use an image pointer along with its virtual functions to do an operation like image_pointer->resize(new_dimensions), and have it do whatever needs to be done for the image type in question to resize it and write the data properly.

This idea applies to all kinds of projects over and over again. Another example could be a compression class. Where I have the class compress, along with subclasses compress_zip, compress_gzip, compress_7zip, and so on. I can do compress_pointer->uncompress(in_file, out_file), and have that work appropriately with virtual functions.

However my image pointer and my compress pointer need to be pointing at the right kind of object for this to work in the first place. My image is sent over the network, I have information at runtime of image/jpeg and the binary data, and I need to load that into the correct subclass image_jpeg. My compressed file is on the hard drive, I read its type via its extension or its header, again I need my pointer to point at the appropriate subtype. I don't have this information at compile time, so how do I initialize my pointer?

In this case, the idea of virtual constructors, or factories come into play. The simplest factory would be in the form of an if/else if structure, or a switch.
image *image_factory(const char *mime_type, const uint8_t *binary_data, size_t size)
{
  image *image_pointer;
  if (!strcmp(mime_type, "image/jpeg")) { image_pointer = new image_jpeg(binary_data, size); }
  else if (!strcmp(mime_type, "image/png")) { image_pointer = new image_png(binary_data, size); }
  else if (!strcmp(mime_type, "image/gif")) { image_pointer = new image_gif(binary_data, size); }
  else { throw&nbsp"Unsupported image type"; }
  return image_pointer;
}

compress *compress_factory(COMPRESS_TYPES requested_save_format)
{
  compress *compress_pointer;
  switch (requested_save_format)
  {
    case COMPRESS_ZIP: compress_pointer = new compress_zip; break;
    case COMPRESS_GZIP: compress_pointer = new compress_gzip; break;
    case COMPRESS_7ZIP: compress_pointer = new compress_7zip; break;
    default: throw std::runtime_error("Unknown enum value");
  }
  return compress_pointer;
}


Now the above functions allow us to get the base pointer initialized appropriately, but they have some drawbacks.

  • A single function needs to know about every type.

  • If which types are supported changes during runtime, perhaps based on which DLLs are found, the factory function becomes more complex, and has to be specifically tailored for each application.

  • If there are multiple constructors, and in some cases one is needed, and in some cases another, suddenly we need to recreate the same logic in multiple factory functions, one factory function per constructor.

  • This style of factory function has to be recreated for every single collection of classes present, the logic behind a factory function is not abstracted away.


Based on these drawbacks, obviously such a method is the incorrect solution, even though it is the most popular method used.

In the case where we have variables mapping to other variables, such as 1 = "Apple", 2 = "Orange", 3 = "Tomato", the obvious solution would be to setup an array or use a map. If the mapping needed is between an integer and a variable, going from integer -> variable is simply a matter of indexing into an array. When dealing with non integer types, the standard std::map fits the bill rather nicely, as it can allow any type to become an index.

So the question becomes, can we map a variable to a type itself? With a bit of bootstrapping, the answer is actually yes! What we need are function pointers to replace code logic.

We can't create a function pointer directly to a constructor, so we'll need a bit of bootstrapping. We can create function pointers to global functions, or static member functions.

In our compression case we can do the following:
static compress *compress_zip::construct() { return new compress_zip; }
static compress *compress_gzip::construct() { return new compress_gzip; }
static compress *compress_7zip::construct() { return new compress_7zip; }


We'll place a static construct() within each of our classes we'd like to be able to instantiate dynamically, and then we can use them in a map like so:
std::map<COMPRESS_TYPES, compress *(*)()> compress_factory;
compress_factory[COMPRESS_ZIP] = compress_zip::construct;
compress_factory[COMPRESS_GZIP] = compress_gzip::construct;
compress_factory[COMPRESS_7ZIP] = compress_7zip::construct;


Once we have a map in place that can be constructed at runtime, we no longer have most of our above issues. No canned function needs to be created which knows about all the supported types in advance, which needs modification for each new type. Values are only added to the map that the required DLLs were found for it, or other runtime constraints. The entire idea is also abstracted away, and we don't need to constantly recreate new factory functions for each family of objects. We simply just initialize a factory object and make use of it.

Now how exactly do we use our map? The idea which is rather popular and expounded in book after book, is to create a factory template class something similar to the following:
template <typename identifier, typename product, typename function>
struct factory
{
  void set_mapping(const identifier &id, function f) { mapping[id] = f; }
  product *construct(const identifier &id)
  {
    function f = mapping.at(id);
    return f();
  }

  private:
  std::map<identifier, function> mapping;
};


Now that we have all that encapsulated, we can do the following:
factory<COMPRESS_TYPES, compress, compress *(*)()> myfactory;
myfactory.set_mapping(COMPRESS_ZIP, compress_zip::construct);
...
compress *compress_pointer = myfactory.construct(requested_save_format);


However, such a method has a serious drawback. What if my constructor needs parameters? As it does in the image example above? Do I end up creating a new factory encapsulation class each time? One library out there solves this problem with ~800 lines of code (read Modern C++ Design for more info). Basically the construct function in the factory class is overloaded a bunch of times with template parameters over and over again, for some conceivable amount of parameters a function might take.
AbstractProduct *CreateObject(const IdentifierType &id,
                              Parm1 p1, Parm2 p2, Parm3 p3, Parm4 p4, Parm5 p5, Parm6 p6, Parm7 p7, Parm8 p8)
{
  typename IdToProductMap::iterator i = associations_.find(id);
  if (i != associations_.end()) { return (i->second)(p1, p2, p3, p4, p5, p6, p7, p8); }
  return this->OnUnknownType(id);
}

The above snippet is repeated 16 times in that library with 0-15 parameters. First of all, such code commits the unforgivable sin of code repetition, and is limited to 15 parameters. What if I need more? Sin more?

The entire idea of a factory which produces an object within itself needs to be shutdown. By trying to be overly clever and encapsulate more than needed, object factories turn into nothing but bloat and horrible code.

Simplicity is king. Therefore, the map should NOT be encapsulated, nor should an object be directly produced. Our so called factory should only produce the appropriate mold needed to form our object.

In the case of images:
static image *image_jpeg::construct(const uint8_t *binary_data, size_t size)
{
  return new image_jpeg(binary_data, size);
}

static image *image_png::construct(const uint8_t *binary_data, size_t size)
{
  return new image_png(binary_data, size);
}

static image *image_gif::construct(const uint8_t *binary_data, size_t size)
{
  return new image_gif(binary_data, size);
}

std::map<std::string, image *(*)(const uint8_t *, size_t)> image_factory;
image_factory["image/jpeg"] = image_jpeg::construct;
image_factory["image/png"] = image_png::construct;
image_factory["image/gif"] = image_gif::construct;


Now to create an image, simply do the following when you want to create an image based on run time variables:
image *image_pointer = image_factory.at(mime_type)(binary_data, size);

Notice that code would take the place of:
image *image_pointer = new image_png(binary_data, size);


Now the creation of a particular object at runtime looks pretty much like creating a specific object, and can feel natural. Just like new which throws an error when it fails (which can happen here too), std::map::at() throws an error when the requested index is not found, so you can reuse the same type of exception handling you're already using when creating objects in general.

If you'd like a complete example which you can play with yourself, try this:
#include <iostream>
#include <string>
#include <map>
#include <stdexcept>
#include <cstdlib>

class Base
{
  int x, y;
  public:
  Base(int x, int y) : x(x), y(y) {}
  virtual ~Base() {};
  virtual int mult() { return x*y; }
  virtual int add() { return x+y; }

  static Base *create(int x, int y) { return new Base(x, y); }
};

class Derived : public Base
{
  int z;
  public:
  Derived(int x, int y) : Base(x, y), z(1) {}
  void setZ(int z) { this->z = z; }
  virtual int mult() { return Base::mult()*z; }
  virtual int add() { return Base::add()+z; }

  static Base *create(int x, int y) { return new Derived(x, y); }
};

int main(int argc, const char *const *const argv)
{
  //Get factory ready
  std::map<std::string, Base *(*)(int, int)> factory;

  //Add some rules
  factory["base"] = Base::create;
  factory["derived"] = Derived::create;

  try
  {
    //Instantiate based on run-time variables
    Base *obj = factory.at(argv[1])(4, 5); //Note: instead of new Base(4, 5) or new Derived(4, 5)

    //Set Z if derived and requested
    if (argc > 2)
    {
      if (Derived *d = dynamic_cast<Derived *>(obj)) { d->setZ(std::atoi(argv[2])); }
    }

    //Output
    std::cout << obj->mult() << ' ' << obj->add() << std::endl;

    //Cleanup
    delete obj;
  }
  catch (const std::exception &e) { std::cout << "Error occured: " << e.what() << std::endl; }

  return 0;
}


Here's what the output looks like:
/tmp> g++-4.4 -Wall -o factory_test factory_test.cpp
/tmp> ./factory_test
Error occured: basic_string::_S_construct NULL not valid
/tmp> ./factory_test base
20 9
/tmp> ./factory_test derived
20 10
/tmp> ./factory_test derived 6
120 15
/tmp> ./factory_test cows
Error occured: map::at
/tmp>


Basically the only issue we're left with is that for each class you want to be able to use a factory for, you have to create a static member function for each constructor you want to use with a particular factory. I don't think there is away around that. It's a minor inconvenience, but in the end offers a lot of power in a clean fashion.

This idea of combining a map with a function pointer can be expanded to other areas as well. You can use this method for calling functions in general based on any runtime input. If your class has many different member functions, each with the same parameters, but for different operations, consider creating a map to member function pointers, and use the input as an index into a map to call the appropriate member function.

Suggestions, improvements, and comments welcome.

12 comments:

AndreasFrische said...

c.s. student here. Really appreciate your blog

LB said...

> ? One library out there solves this problem with ~800 lines of code.

Ha-ha-ha... This is actually Alexandrescu's Loki :-) But I am on your side: I also think that excessive templatization and clever template tricks really add more problems (because of obfuscation) than they solve.

cottonvibes said...

Hi, this was an interesting article.

However I have a question, you said the typical factory implementation has this problem:
"If there are multiple constructors, and in some cases one is needed, and in some cases another, suddenly we need to recreate the same logic in multiple factory functions, one factory function per constructor."

Doesn't your method also have the same problem:
"std::map<std::string, image *(*)(const uint8_t *, size_t)> image_factory;"

Specifically in that example image_factory will only take those 2 parameters; but what if we have a constructor taking 3 paramters?
Wouldn't we need to use a seperate map...

Futhermore, using a function as a factory has the ability to be overloaded with the different parameters. While if you end up having different map-factories, you have to come up with different identifier names which could be undesirable.

The last limitation is the map-version has to be generated (which could cause problems in a threaded environment), while using a typical function-based factory will be thread safe.

The map-version is a nice idea for a runtime generated factory that depends on factors not available at compile-time; but it seems like the function-based factory still has its own advantages.

LB said...

The last limitation is the map-version has to be generated (which could cause problems in a threaded environment), while using a typical function-based factory will be thread safe.

You can always init map during init. of global objects.

cottonvibes said...

You can always init map during init. of global objects.

Well that still can have problems which the typical function-factory doesn't have.

What if you have a global class/struct that needs to use the factory in its constructor? Now you either have to code work-arounds, or you can just stick to the typical function-factory.
(global object initialization order is not defined between different translation units, so its not garanteed that the map will be initialized when used by another global object's constructor)

LB said...

What if you have a global class/struct that needs to use the factory in its constructor?

Singleton will help.

cottonvibes said...

Singleton will help.

Sure as I said there's work-arounds, but that ends up being a lot more code than a simple function-factory which doesn't have this problem.

Essentially my point is, the function-factory could end up being the better way to go depending on the needs.
The map-factory is indeed a good alternative to know about, but it does have its own drawbacks.

When I look at the pros and cons between the two, it seems to me that the function factory is the way to go when you're dealing with a simple factory that won't change at runtime.
While the map-factory is the way to go when you need versatility and runtime generated content.

insane coder said...

Hi cottonvibes.

"If there are multiple constructors, and in some cases one is needed, and in some cases another, suddenly we need to recreate the same logic in multiple factory functions, one factory function per constructor."

>Doesn't your method also have the same problem?

No, because it's not a function. You can map each type of constructor wrapper individually.

If you really want to use a single map, you can do it, but you'll need more boot strapping work, and use functionoids.

>The last limitation is the map-version has to be generated (which could cause problems in a threaded environment), while using a typical function-based factory will be thread safe.

I'm not at all sure what you're referring to. You trying to initialize the map from two threads at once? If so, use a mutex just like you would for any similar programming problem.

>it seems like the function-based factory still has its own advantages.

Not really. In fact, it's a bad design idiom. Don't use it. If you need more info, read Modern C++ Design Chapter 8.

>What if you have a global class/struct that needs to use the factory in its constructor?

You should NEVER be making global class instances in your program. See Effective C++ 3rd Edition Item #4.

insane coder said...

Okay, I take back what I said above about using a functionoid, there are better and simpler ways.

Method 1:

Make use of an std::pair or an std::tuple.

std::map<std::string, std::pair<image *(*)(const uint8_t *, size_t), image *(*)(const uint8_t *, size_t, 3rd_param_type)>> image_factory;
image_factory["image/jpeg"] = std::make_pair(image_jpeg::construct, image_jpeg::construct2);

And so on. Then add a .first or .second as needed.


Method 2:

Make each construct function take a variable amount of arguments.

Create a macro which pulls out the arguments, which sees how many there are, and use if/else if structure to call return new TYPE(params). Throw an error if a match isn't found.

Use the macro in each construct function.

Now when using the function pointer returned, pass as many parameters as you like.

cottonvibes said...

Hi Insane Coder, thanks for the reply.

I think your "Method 1" idea is a pretty interesting solution to the problem. I think its the nicer option compared to the alternatives.

"You trying to initialize the map from two threads at once? If so, use a mutex just like you would for any similar programming problem."

Sure, I'm not saying there aren't solutions to this problem; but the problem doesn't even exist in a simple function-factory, so its worth mentioning...

"You should NEVER be making global class instances in your program. See Effective C++ 3rd Edition Item #4."

Well I don't have that book, nor do I plan on actively going out to get it.
Sure global class instances are usually a bad idea, but I don't like the word "NEVER"...

insane coder said...

>Sure, I'm not saying there aren't solutions to this problem; but the problem doesn't even exist in a simple function-factory, so its worth mentioning...

The problem you're describing only arises in certain cases if you're not careful, and is easily solved. A factory function has issues which are insurmountable.

>Well I don't have that book, nor do I plan on actively going out to get it.

That's a shame. It's a bad idea to write a lot of C++ code without being familiar with the material in that book.

>Sure global class instances are usually a bad idea, but I don't like the word "NEVER"...

Replace usually with ALWAYS.

Creating a global class can many times run into the issue of it being used before it is initialized.

It can introduce random runtime undefined behavior if there are dependencies involved.

It prevents dynamic loading.

It breaks tools which do interesting things by hijacking main. Be it debugging, or initialization routines that replace code blocks with the appropriate optimized version for the CPU in question, or anything along those lines.

Sure you can get away with creating a global object if you don't have too many of them most of the time, but do you really want to run into issues, or prevent your program from interacting with any useful tools that can't have pre main constructors called?

There's also a major issue with deconstruction of global objects which can in certain C libraries cause a crash on exit if a deconstructor makes use of another object. See the C/C++ errata.

There's other issues too which I'm sure you can find discussed elsewhere.

So the short answer is, yes, NEVER use global objects. LB is correct that a singleton should be used. It's not a workaround, you should use a singleton with every global object to avoid making it a global object. ALWAYS. A Meyer's Singleton is usually good, unless you have objects which depend on each other in the deconstructor.

And since you brought up threads before, global objects with threads is even worse. They're trying to resolve some of those issues for C++-201x.

insane coder said...

A follow up has been posted.

It resolves the drawbacks described in this article, as well as offers a solution to one of the problems mentioned in the comments.