Saturday, July 24, 2010


Simplifying bootstrapping for virtual constructors



Last week I demonstrated a solution to the virtual constructor problem. My solution avoids many issues with the factory function solution. Yet it did require some bootstrapping to use.

The bootstrapping required a new function to be created for every single derived class that needs to be virtualized. When working with many derived classes, this becomes unacceptable. It's bad enough to solve this problem we need to generate a map, should we have to create additional functions as well? Each time a new derived class is added, should I go out of my way with two steps?

Turns out, making use of templates, we can combine the define and function generation step.

static compress *compress_zip::construct() { return new compress_zip; }
static compress *compress_gzip::construct() { return new compress_gzip; }
static compress *compress_7zip::construct() { return new compress_7zip; }


Instead of creating the above, and using it as follows:

std::map<COMPRESS_TYPES, compress *(*)()> compress_factory;
compress_factory[COMPRESS_ZIP] = compress_zip::construct;
compress_factory[COMPRESS_GZIP] = compress_gzip::construct;
compress_factory[COMPRESS_7ZIP] = compress_7zip::construct;


First create a single construct template function:

template <typename T>
compress *compress_construct()
{
return new T;
}


This has to be done only once.

Now when adding to the map, we can do the following:

std::map<COMPRESS_TYPES, compress *(*)()> compress_factory;
compress_factory[COMPRESS_ZIP] = compress_construct<compress_zip>;
compress_factory[COMPRESS_GZIP] = compress_construct<compress_gzip>;
compress_factory[COMPRESS_7ZIP] = compress_construct<compress_7zip>;


If some new type now comes along, simply add it with a single line. A new function will be generated on use by the template, so you no longer have to. Now we have truly managed to map a type directly to an identifier.

Of course no tutorial would be complete without a self contained example:

#include <iostream>
#include <map>
#include <string>

class base
{
std::string n;

protected:
base(const std::string &n) : n(n) {}

public:
base() : n("base") {}
std::string name() { return(n); }
};

struct derived : public base
{
derived() : base("derived") {}
};

template <typename T>
base *construct()
{
return new T;
}

int main(int argc, const char *const *const argv)
{
std::map<std::string, base *(*)()> factory;
factory["b"] = construct<base>;
factory["d"] = construct<derived>;

try
{
//Instantiate based on run-time variables
base *obj = factory.at(argv[1])();

//Output
std::cout << obj->name() << std::endl;

//Cleanup
delete obj;
}
catch (const std::exception &e) { std::cout << "Error occured: " << e.what() << std::endl; }

return 0;
}


Output:

/tmp> g++-4.4 -Wall -o factory_test factory_test.cpp
/tmp> ./factory_test b
base
/tmp> ./factory_test d
derived
/tmp>

Now that this problem has been solved nice and neatly. What about solving it for multiple constructors? Also known as the abstract factory problem. What if each class has multiple constructors, and we want a collection of them mapped to a single identifier? Can we do it without repeating a lot of code over and over?

With some minor bootstrapping, the answer is again yes! There's multiple solutions to this problem, but the following is what I found to be the nicest at the moment.

First create a pure virtual class with a function to match each constructor you'd like to virtualize. Each should of course return a base pointer.

Imagine we had 3 constructors, one taking no parameters, one taking a C string, and another taking a C++ string, we would setup the following:

struct construct_interface
{
virtual base *operator()() const = 0;
virtual base *operator()(const char *) const = 0;
virtual base *operator()(const std::string &) const = 0;
};


Once we have the interface defined, we'll create a template construct function which implements and returns that interface within a singleton similar to the above construct function:

template <typename T>
const construct_interface *construct()
{
static struct : public construct_interface
{
base *operator()() const { return new T; }
base *operator()(const char *s) const { return new T(s); }
base *operator()(const std::string &s) const { return new T(s); }
} local;
return &local;
}


It'd be nice to return a reference instead of a pointer, but dealing with references in std::map is kind of icky. We can use macros to cleanup any annoying pointer dereferencing issues that could arise.

Now our construct function returns a pointer to an interface which can construct a derived type using any of its constructors. We'll use it with an std::map like so:

std::map<std::string, const construct_interface *> factory;
factory["base"] = construct<base>();
factory["derived"] = construct<derived>();


Notice the map no longer tracks function pointers, but pointers to the interface. Also, when assigning to the map, we're calling the construct function to obtain the pointer. We could modify the above example to track function pointers and leave out the (), and even make the construct function return a reference to an interface instead, but then we'd need to add an extra () when creating objects. While that too can be hidden by a macro, or just ignored, as an extra () still looks rather clean, it does add extra overhead, as it is likely you'll initialize your map just once, and use it to create many objects during the lifetime of the program.

Now to use the map to create an object, the following has to be done:

//Use first constructor, the default constructor
base *obj1 = (*(factory).at(id))();

//Use second constructor, the one taking a C string
base *obj2 = (*(factory).at(id))(s);


It works nicely, but as I explained above that's rather ugly.

This macro can help simplify things:

#define VIRTUAL_NEW(factory, id) (*(factory).at((id)))


And unlike the interface class and template construction function which needs to be created for each set of classes making use of virtual constructors, the above macro can be reused for every virtual constructor collection that makes use of the above idiom.

Using the macro, the code now looks as follows:

//Use first constructor, the default constructor
base *obj1 = VIRTUAL_NEW(factory, id)();

//Use second constructor, the one taking a C string
base *obj2 = VIRTUAL_NEW(factory, id)(s);


That's it, problem solved!

Putting it all together, here's a working example:

#include <iostream>
#include <map>
#include <string>
#include <cstdlib>

class base
{
protected:
int x, y;

public:
base() : x(1), y(2) {}
base(const char *s) : x(std::atoi(s)), y(3) {}
base(const std::string &s) : x(std::atoi(s.c_str())), y(4) {}

virtual int operator()() { return x+y; }
};

class derived : public base
{
public:
derived() {}
derived(const char *s) : base(s) {}
derived(const std::string &s) : base(s) {}

virtual int operator()() { return x*y; }
};

struct construct_interface
{
virtual base *operator()() const = 0;
virtual base *operator()(const char *) const = 0;
virtual base *operator()(const std::string &) const = 0;
};

template <typename T>
const construct_interface *construct()
{
static struct : public construct_interface
{
base *operator()() const { return new T; }
base *operator()(const char *s) const { return new T(s); }
base *operator()(const std::string &s) const { return new T(s); }
} local;
return &local;
}

#define VIRTUAL_NEW(factory, id) (*(factory).at((id)))

int main(int argc, const char *const *const argv)
{
std::map<std::string, const construct_interface *> factory;
factory["b"] = construct<base>();
factory["d"] = construct<derived>();

if (argc == 3)
{
try
{
//Instantiate based on run-time variables
base *obj1 = VIRTUAL_NEW(factory, argv[1])();
base *obj2 = VIRTUAL_NEW(factory, argv[1])(argv[2]);
base *obj3 = VIRTUAL_NEW(factory, argv[1])(std::string(argv[2]));

//Output
std::cout << (*obj1)() << '\n'
<< (*obj2)() << '\n'
<< (*obj3)() << '\n'
<< std::flush;

//Cleanup
delete obj1;
delete obj2;
delete obj3;
}
catch (const std::exception &e) { std::cout << "Error occured: " << e.what() << std::endl; }
}

return 0;
}


Output:

/tmp> g++-4.4 -Wall -o abstract_factory_test abstract_factory_test.cpp
/tmp> ./abstract_factory_test b 2
3
5
6
/tmp> ./abstract_factory_test b 3
3
6
7
/tmp> ./abstract_factory_test d 2
2
6
8
/tmp> ./abstract_factory_test d 3
2
9
12
/tmp>

Hopefully you should now be able to take this example and plug it in just about anywhere. Easy to add new types. No need to modify existing classes. Fully dynamic. Easy to use!

This method really shines if you dynamically load derived classes while your program is running. Just make sure your DLL uses the the template function internally, so an instance of the function for the new type is created, and dynamically add an id to the map, and presto you're done.

Now go out there and leverage the power of C++!

2 comments:

cottonvibes said...

Nice job and interesting solutions.

The use of template functions to eliminate the need for the static-construct methods is a lot nicer; not only is it less code, but it also avoids having to modify the derived classes themselves.

insane coder said...

Modifying the derived classes with a static construct method was never required. A global function would work just as well.

"We can create function pointers to global functions, or static member functions."

I only demonstrated the latter version earlier as it keeps things in the family.