Showing posts with label parameters. Show all posts
Showing posts with label parameters. Show all posts

Sunday, October 24, 2010


Programming manuals for those who "get it"



Top secret official programming manuals for certain popular hand held devices have now been uploaded to the usual public document repositories. Some have been around for a while to those who get it, some are new.

Saturday, July 24, 2010


Simplifying bootstrapping for virtual constructors



Last week I demonstrated a solution to the virtual constructor problem. My solution avoids many issues with the factory function solution. Yet it did require some bootstrapping to use.

The bootstrapping required a new function to be created for every single derived class that needs to be virtualized. When working with many derived classes, this becomes unacceptable. It's bad enough to solve this problem we need to generate a map, should we have to create additional functions as well? Each time a new derived class is added, should I go out of my way with two steps?

Turns out, making use of templates, we can combine the define and function generation step.

static compress *compress_zip::construct() { return new compress_zip; }
static compress *compress_gzip::construct() { return new compress_gzip; }
static compress *compress_7zip::construct() { return new compress_7zip; }


Instead of creating the above, and using it as follows:

std::map<COMPRESS_TYPES, compress *(*)()> compress_factory;
compress_factory[COMPRESS_ZIP] = compress_zip::construct;
compress_factory[COMPRESS_GZIP] = compress_gzip::construct;
compress_factory[COMPRESS_7ZIP] = compress_7zip::construct;


First create a single construct template function:

template <typename T>
compress *compress_construct()
{
return new T;
}


This has to be done only once.

Now when adding to the map, we can do the following:

std::map<COMPRESS_TYPES, compress *(*)()> compress_factory;
compress_factory[COMPRESS_ZIP] = compress_construct<compress_zip>;
compress_factory[COMPRESS_GZIP] = compress_construct<compress_gzip>;
compress_factory[COMPRESS_7ZIP] = compress_construct<compress_7zip>;


If some new type now comes along, simply add it with a single line. A new function will be generated on use by the template, so you no longer have to. Now we have truly managed to map a type directly to an identifier.

Of course no tutorial would be complete without a self contained example:

#include <iostream>
#include <map>
#include <string>

class base
{
std::string n;

protected:
base(const std::string &n) : n(n) {}

public:
base() : n("base") {}
std::string name() { return(n); }
};

struct derived : public base
{
derived() : base("derived") {}
};

template <typename T>
base *construct()
{
return new T;
}

int main(int argc, const char *const *const argv)
{
std::map<std::string, base *(*)()> factory;
factory["b"] = construct<base>;
factory["d"] = construct<derived>;

try
{
//Instantiate based on run-time variables
base *obj = factory.at(argv[1])();

//Output
std::cout << obj->name() << std::endl;

//Cleanup
delete obj;
}
catch (const std::exception &e) { std::cout << "Error occured: " << e.what() << std::endl; }

return 0;
}


Output:

/tmp> g++-4.4 -Wall -o factory_test factory_test.cpp
/tmp> ./factory_test b
base
/tmp> ./factory_test d
derived
/tmp>

Now that this problem has been solved nice and neatly. What about solving it for multiple constructors? Also known as the abstract factory problem. What if each class has multiple constructors, and we want a collection of them mapped to a single identifier? Can we do it without repeating a lot of code over and over?

With some minor bootstrapping, the answer is again yes! There's multiple solutions to this problem, but the following is what I found to be the nicest at the moment.

First create a pure virtual class with a function to match each constructor you'd like to virtualize. Each should of course return a base pointer.

Imagine we had 3 constructors, one taking no parameters, one taking a C string, and another taking a C++ string, we would setup the following:

struct construct_interface
{
virtual base *operator()() const = 0;
virtual base *operator()(const char *) const = 0;
virtual base *operator()(const std::string &) const = 0;
};


Once we have the interface defined, we'll create a template construct function which implements and returns that interface within a singleton similar to the above construct function:

template <typename T>
const construct_interface *construct()
{
static struct : public construct_interface
{
base *operator()() const { return new T; }
base *operator()(const char *s) const { return new T(s); }
base *operator()(const std::string &s) const { return new T(s); }
} local;
return &local;
}


It'd be nice to return a reference instead of a pointer, but dealing with references in std::map is kind of icky. We can use macros to cleanup any annoying pointer dereferencing issues that could arise.

Now our construct function returns a pointer to an interface which can construct a derived type using any of its constructors. We'll use it with an std::map like so:

std::map<std::string, const construct_interface *> factory;
factory["base"] = construct<base>();
factory["derived"] = construct<derived>();


Notice the map no longer tracks function pointers, but pointers to the interface. Also, when assigning to the map, we're calling the construct function to obtain the pointer. We could modify the above example to track function pointers and leave out the (), and even make the construct function return a reference to an interface instead, but then we'd need to add an extra () when creating objects. While that too can be hidden by a macro, or just ignored, as an extra () still looks rather clean, it does add extra overhead, as it is likely you'll initialize your map just once, and use it to create many objects during the lifetime of the program.

Now to use the map to create an object, the following has to be done:

//Use first constructor, the default constructor
base *obj1 = (*(factory).at(id))();

//Use second constructor, the one taking a C string
base *obj2 = (*(factory).at(id))(s);


It works nicely, but as I explained above that's rather ugly.

This macro can help simplify things:

#define VIRTUAL_NEW(factory, id) (*(factory).at((id)))


And unlike the interface class and template construction function which needs to be created for each set of classes making use of virtual constructors, the above macro can be reused for every virtual constructor collection that makes use of the above idiom.

Using the macro, the code now looks as follows:

//Use first constructor, the default constructor
base *obj1 = VIRTUAL_NEW(factory, id)();

//Use second constructor, the one taking a C string
base *obj2 = VIRTUAL_NEW(factory, id)(s);


That's it, problem solved!

Putting it all together, here's a working example:

#include <iostream>
#include <map>
#include <string>
#include <cstdlib>

class base
{
protected:
int x, y;

public:
base() : x(1), y(2) {}
base(const char *s) : x(std::atoi(s)), y(3) {}
base(const std::string &s) : x(std::atoi(s.c_str())), y(4) {}

virtual int operator()() { return x+y; }
};

class derived : public base
{
public:
derived() {}
derived(const char *s) : base(s) {}
derived(const std::string &s) : base(s) {}

virtual int operator()() { return x*y; }
};

struct construct_interface
{
virtual base *operator()() const = 0;
virtual base *operator()(const char *) const = 0;
virtual base *operator()(const std::string &) const = 0;
};

template <typename T>
const construct_interface *construct()
{
static struct : public construct_interface
{
base *operator()() const { return new T; }
base *operator()(const char *s) const { return new T(s); }
base *operator()(const std::string &s) const { return new T(s); }
} local;
return &local;
}

#define VIRTUAL_NEW(factory, id) (*(factory).at((id)))

int main(int argc, const char *const *const argv)
{
std::map<std::string, const construct_interface *> factory;
factory["b"] = construct<base>();
factory["d"] = construct<derived>();

if (argc == 3)
{
try
{
//Instantiate based on run-time variables
base *obj1 = VIRTUAL_NEW(factory, argv[1])();
base *obj2 = VIRTUAL_NEW(factory, argv[1])(argv[2]);
base *obj3 = VIRTUAL_NEW(factory, argv[1])(std::string(argv[2]));

//Output
std::cout << (*obj1)() << '\n'
<< (*obj2)() << '\n'
<< (*obj3)() << '\n'
<< std::flush;

//Cleanup
delete obj1;
delete obj2;
delete obj3;
}
catch (const std::exception &e) { std::cout << "Error occured: " << e.what() << std::endl; }
}

return 0;
}


Output:

/tmp> g++-4.4 -Wall -o abstract_factory_test abstract_factory_test.cpp
/tmp> ./abstract_factory_test b 2
3
5
6
/tmp> ./abstract_factory_test b 3
3
6
7
/tmp> ./abstract_factory_test d 2
2
6
8
/tmp> ./abstract_factory_test d 3
2
9
12
/tmp>

Hopefully you should now be able to take this example and plug it in just about anywhere. Easy to add new types. No need to modify existing classes. Fully dynamic. Easy to use!

This method really shines if you dynamically load derived classes while your program is running. Just make sure your DLL uses the the template function internally, so an instance of the function for the new type is created, and dynamically add an id to the map, and presto you're done.

Now go out there and leverage the power of C++!

Thursday, June 3, 2010


Undocumented APIs



This topic has been kicked around all over for at least a decade. It's kind of hard to read a computer programming magazine, third party documentation, technology article site, third party developer mailing list or the like without running into it.

Inevitably someone always mentions how some Operating System has undocumented APIs, and the creators of the OS are using those undocumented APIs, but telling third party developers not to use them. There will always be the rant about unfair competition, hypocrisy, and other evils.

Large targets are Microsoft and Apple, for creating undocumented APIs that are used by Internet Explorer, Microsoft Office, Safari, iPhone development, and other in-house applications. To the detriment of Mozilla, OpenOffice.org, third parties, and hobbyist developer teams.

If you wrote one of those articles/rants/complaints, or agreed with them, I'm asking you to turn in your Geek Membership card now. You're too stupid, too idiotic, too dumb, too nearsighted, and too vilifying (add other derogatory adjectives you can think of) to be part of our club. You have no business writing code or even discussing programming.

Operating Systems and other applications that communicate via APIs provide public interfaces that are supposed to be stable. If you find a function in official public documentation, that generally signifies it is safe to use the function in question, and the company will support it for the foreseeable future versions. If a function is undocumented, it's because the company doesn't think they will be supporting the function long term.

Now new versions which provide even the exact same API inevitably break things. It is so much worse for APIs which you're not supposed to be using in the first place. Yet people who complain about undocumented APIs, also complain when the new version removes a function they were using. They blame the company for purposely removing features they knew competition or hobbyists were using. That's where the real hypocrisy is.

As much as you hate a new version breaking your existing software, so does Microsoft, Apple, and others. The popularity of their products is tied to what third party applications exist. Having popular third party applications break on new versions is a bad thing. That's why they don't document APIs they plan on changing.

Now why do they use them then? Simple, you can write better software when you intimately understand the platform you're developing for, and write the most direct code possible. When issues occur because of their use, it's easy for the in-house teams to test their own software for compatibility with upcoming versions and fix them. If Microsoft sees a new version of Windows break Internet Explorer, they can easily patch the Internet Explorer provided with the upcoming version, or provide a new version altogether. But don't expect them to go patching third party applications, especially ones which they don't have the source to. They may have done so in the past, and introduced compatibility hacks, but this really isn't their problem, and you should be grateful when they go out of their way to do so.

So quit your complaining about undocumented APIs, or please go around introducing yourself as "The Dumb One".

Saturday, March 27, 2010


Does anyone understand types and magnitudes?



Regularly I have to work with many popular libraries out there, as well as many libraries written by coworkers and similar.

One thing which I see constantly misused over and over again is the type system used in C and C++.

I wonder if most people understand it, have bothered to contemplate it, or simply don't care if their library is misprogrammed trash.

First of all, C/C++ for integers supports both signed and unsigned types. Many programmers seem to ignore the unsigned types altogether. Maybe some education is in order.

Each base type in C consists of one or more bytes. Each byte on modern main stream systems are 8 bits to a byte. Meaning a single byte can store 28 values, or in other words 256 possibilities. Two bytes can store 216 values, or in other words 65,536 possibilities.

A type designed for integers (whole numbers) which is unsigned will operate with the meaning of its values as ranging from 0 till amount of possibilities - 1. In the case of a one byte type: 0 to 255, for a 2 byte type: 0 to 65,535.

When a value is signed (the usual default if no signed or unsigned description is applied), a single bit is used to describe whether the rest of the bits represent a positive or negative value. This effectively cuts the amount of positive values that can be represented in half. In the case of a single byte, 27 values will be negative, -128 to -1, 27-1 values will be distinctly positive, 1 to 127, and one value will be 0 (for a total of 256 possibilities). This same math extends to as many bytes being used in a given type.

When dealing with something which requires working with both positive and negative values such as debt (monetary balance), direction, difference, relative position, and things of a similar nature, signed values is quite natural and should of course be used.

But if you're working with something which has no meaning for negative values such as "Number of students in this class", "How old are you", "Amount of chickens in the coop", "Amount of rows in database table", or "Amount of elements in this array", or amounts of any nature really, using a signed type is one of the worst things you could possibly do. You are reserving half of your possible values for a situation which will never occur (barring mistakes in programming).

Let's take a look at some real world examples. The most popular type used for at least the past decade would have to be the 32 bit integer. 232 gives us 4,294,967,296 different value possibilities. If that was a storage size, it would be 4GB. Effectively a 32 bit unsigned integer can store the values 0 to 4,294,967,295, or one byte less than 4GB. If it was signed, it would be able to store the values -2,147,483,648 to 2,147,483,6487, topping out at one less than 2GB.

If you have an Operating System or File System which limits you to 2GB as a max size for files, or your Digital Camera only lets you use up to 2GB flash cards, it's because the idiots that programmed it used signed values. They did this because either they were expecting files and flash cards of negative size, or because they were a bunch of idiots. Which of those two possibilities actually happened should be obvious.

The same goes for any programming library you see with a limit of 2GB. As soon as you hear anyone mention a limitation of 2GB, what should immediately register in your head is that idiots worked on this application/library/standard/device.

Now sometimes programmers try to justify their ignorance or their idiocy with remarks such as using negative values to indicate error conditions. Meaning, they write functions which return a positive value in case of success, and -1 in case of error. This is a very bad practice for two reasons. First of all, one shouldn't use a single variable to indicate two different things. It's okay to use a single variable where each bit (or sets of bits) indicate a flag for a set of flags. But it's very wrong to have a single value indicate success/failure and amount, or day of week and favorite color, or any two completely unrelated things like that. Doing so only leads to sloppy code, and should always be avoided. Second of all, you're cutting your amount of usable possibilities in half, and reserving a ton of values just for a single possibility. You can't be more wasteful than that.

If you need a way for your function to return success/failure and amount, there are cleaner and more effective ways to do so. Many times an amount of 0 is invalid. If a function is supposed to return the amount of something, 0 isn't really an amount. If I asked how many students are in my class, and I get 0, that in itself should be indicative of an error. If I asked how many rows did my SQL statement just change, and I get 0, that should be indicative of an error, or at the very least, there were no SQL rows to change. If I really need to know if the 0 means a real error, or there was no data to work with, a separate function can tell me if the last call was a success or failure. In C and in POSIX, it is common to use the global variable errno to indicate errors. If you're making your own library, you can make a function to tell you why the last value was 0. There was no data, there was an error accessing the data, and so on.

Another method would be to pass one parameter by pointer/reference to store the amount or success state, and have the other returned from the function. Another technique would be to return an enum which tells you which error condition happened or everything is okay, and use a different function to retrieve the value of that operation upon success. Lastly, C++ programmers can use std::pair<> for all their multiple return needs, or simply return the amount normally, and throw an exception on any kind of error.

Now that hopefully all my loyal readers are now more educated about the sign of their types, and various function return techniques that better libraries use, let's talk a bit about magnitude.

I see again and again libraries that don't have a clue about magnitude. The various C/C++ standards state the following about the normal built in types.

1) The amount of bytes used for the following types should be in this proportion: short <= int <= long <= long long.
2) short should be at least 2 bytes.
3) long should be at least 4 bytes.
4) long long should be at least 8 bytes;.

Which means all these scenarios are perfectly legal:
sizeof(short) == 2
sizeof(int) == 2
sizeof(long) == 4
sizeof(long long) == 8

sizeof(short) == 2
sizeof(int) == 4
sizeof(long) == 4
sizeof(long long) == 8

sizeof(short) == 2
sizeof(int) == 4
sizeof(long) == 8
sizeof(long long) == 8

sizeof(short) == 2
sizeof(int) == 4
sizeof(long) == 8
sizeof(long long) == 16

sizeof(short) == 4
sizeof(int) == 4
sizeof(long) == 8
sizeof(long long) == 16

sizeof(short) == 4
sizeof(int) == 8
sizeof(long) == 16
sizeof(long long) == 32

And so on.

In order to get types of a particular size, C99 added the header <stdint.h> and C++-201x added <cstdint>, with types such as int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, and several others. Each of these types are the size in bits that their name indicates, with those without a u at the beginning being signed, and those with being unsigned. Using these types you can get the exact size you need.

But remember to use things appropriately. I see several popular SQL engines for example that support as many rows in a table as a uint64_t is able to contain, but if you ask them how many rows were altered by the last query, they'll report that information in a plain and simple int! You by now should realize there are two problems with such a course of action.

Now lastly, if you're writing some kind of function which takes an array, it is normal convention to pass a pointer to the array, as well as a size variable to specify how many elements or bytes are within the array.

The standard C library uses a type called size_t to deal with such values. A size_t can contain the highest amount of bytes the program is able to address, and is of course unsigned.

If you need to pass an array, your prototype should always be along the lines of:

void func(void *p, size_t size);
void func(uint8_t *p, size_t size);

If you need to return a size, for something which is addressable memory, again always use a size_t. Functions like strlen() and sizeof() return a size_t, and functions like memcpy(), memset(), or malloc() take a size_t as one of their arguments.

Let me show you a quick program to illustrate a point.


#include <cstdio.h>

#define O(x) printf(#x": %zu\n", sizeof(x))
int main()
{
O(signed char);
O(signed short);
O(signed int);
O(signed long);
O(signed long long);
O(signed);
O(unsigned char);
O(unsigned short);
O(unsigned int);
O(unsigned long);
O(unsigned long long);
O(unsigned);
O(size_t);
O(void *);
O(float);
O(double);
O(long double);
return(0);
}


On a 32 bit system of mine, the output is as follows:

signed char: 1
signed short: 2
signed int: 4
signed long: 4
signed long long: 8
signed: 4
unsigned char: 1
unsigned short: 2
unsigned int: 4
unsigned long: 4
unsigned long long: 8
unsigned: 4
size_t: 4
void *: 4
float: 4
double: 8
long double: 12


On a 64 bit system of mine, the output is as follows:

signed char: 1
signed short: 2
signed int: 4
signed long: 8
signed long long: 8
signed: 4
unsigned char: 1
unsigned short: 2
unsigned int: 4
unsigned long: 8
unsigned long long: 8
unsigned: 4
size_t: 8
void *: 8
float: 4
double: 8
long double: 16


Now notice for 32 bit, the void * type is 4 bytes, and on 64 bit, it is 8 bytes. This means that a void * can address any position in memory that the system is able to. In order to specify the amount, you'll notice in each case the sizeof(size_t) matches the sizeof(void *). If I were to get a 128 bit system which had a void * of 16 bytes, the size_t would also be at least 16 bytes.

There are other types which also matched the size of the void * each time around, but if you look at my earlier explanation of the relationship of the various C types, you'll notice those other types are too variable to know if they'll be good for an amount of every system you try to use your application or library on, so always stick with the size_t.

All too often I see programmers use an int for such an amount which is clearly wrong, or an unsigned int, which is better, but also wrong, especially on my 64 bit system. Some programmers I also see use the type of just unsigned by itself. I don't know who thought that up, but it's identical to an unsigned int, which is also clearly wrong. I included unsigned and signed specifically in my test above because some people are under the mistaken notion that those types become as large as possible. Every time I see code which contains one of them, I want to vomit.

If you ever have any doubt as to what size a type may be on your system, or get into an argument with someone, test things yourself with a sizeof() call, it's not hard to do, or point them to this article.

Now go out there and write some good libraries. I have too much vomit and bile on the floor here from all the horrible code I have to work with or review.

Thursday, March 11, 2010


C++-201x Variadic Templates



C++-201x (or C++-0x as near sighted individuals like to call it) introduced a new feature called variadic templates, which allows a template class or function to recieve multiple parameters to it of various types as you would normally achieve with "...".

You could continue to use stdarg if you wish, but a limitation of it is that you have no way of knowing what type each parameter is to use with va_arg(), unless you just hard code it to only work with a specific type, or pass a type list how the printf() family of functions do it, or something similar.

I always wanted a way to do func(integer, string, float, whatever), and interchange that with func(float, whatever, char), and now you can.

If you try to research this topic online, basically everyone is just writing and talking about how to implement a tuple class, or how to make a type safe printf. Lets try something a little more interesting and perhaps instructive, yet simple.


#include <iostream>
using namespace std;

//Output function to output any type without type specifiers like printf() family
template <typename T, typename ...P>
void output(T t, P ...p)
{
cout << t << ' ';
if (sizeof...(p)) { output(p...); }
else { cout << '\n'; }
}

//Since variadic templates are recursive, must have a base case
void output() { cout << '\n'; }


//Compute sum of all parameters
template <typename T, typename ...P>
T sum(T t, P ...p)
{
if (sizeof...(p))
{
t += sum(p...);
}
return(t);
}

//Since variadic templates are recursive, must have a base case
template <typename T>
T sum(T t) { return(t); }

//Test it
int main()
{
output();
output('5');
output('5', 2);
output('5', 2, "cows");
output('5', 2, "cows", -1);
output('5', 2, "cows", -1, 0.5f);
output('5', 2, "cows", -1, 0.5f, 16.3);

cout << endl;

cout << sum(1) << '\n'
<< sum(1, 2) << '\n'
<< sum(1, 2, 3) << '\n'
<< sum(1, 2, 3, 4) << '\n'
<< sum(1, 2, 3, 4, 5) << '\n';

cout << endl;

cout << sum(0.1) << '\n'
<< sum(0.1, 0.2) << '\n'
<< sum(0.1, 0.2, 0.3) << '\n';

cout << endl;

return(0);
}


Here's how to compile and what the output looks like:

/tmp> g++-4.4 -Wall -o test test.cpp -std=gnu++0x
/tmp> ./test

5
5 2
5 2 cows
5 2 cows -1
5 2 cows -1 0.5
5 2 cows -1 0.5 16.3

1
3
6
10
15

0.1
0.3
0.6

/tmp>


As can be seen I achieved my desired goals of being able to pass various types, not know the types to be passed in advance, and not need a type list passed anywhere. All the other examples I've seen seem to be forgetting these important points.

As for how this works "typename ...P" defines a template type of P which will be many parameters packed into a single variable. "P ...p" receives all these parameters in a variable named "p".

However the only things I can really do is see how many parameters are in "p" using "sizeof...(p)", or split "p" up into all its parameters, to then pass to a function by using "p...". It would be nice to be able to simply iterate through the values of "p", but that's not possible at the moment.

Using standard recursion techniques, you can pass the split up parameters from "p" to a function which has enough templated parameters as "p" is currently holding, or a lesser amount, and the last parameter can be a variadic parameter to catch all the others which don't fit into the first few. Then using those first parameters, you can access the values you need, and so on ad infinitum.

Read up on recursion if you're confused.

Now if you're still wondering what this may be useful for, just consider functions which handle data serialization or something similar.

One thing to note, at least with GCC at the moment, you always need to have a separate function as a base case, even if it's never used.

Take "output('5', 2, "cows", -1, 0.5f, 16.3);", on the last leg of iteration, when the double is passed to the first parameter to be received by "t", and "p" is empty, "if (sizeof...(p)) { output(p...); }" which is the line responsible for the recursion seeing an empty "p" won't ask for "output()", but it seems the compiler still wants it to exist:


test.cpp: In function ‘void output(T, P ...) [with T = double, P = ]’:
test.cpp:8: instantiated from ‘void output(T, P ...) [with T = float, P = double]’
test.cpp:8: instantiated from ‘void output(T, P ...) [with T = int, P = float, double]’
test.cpp:8: instantiated from ‘void output(T, P ...) [with T = const char*, P = int, float, double]’
test.cpp:8: instantiated from ‘void output(T, P ...) [with T = int, P = const char*, int, float, double]’
test.cpp:8: instantiated from ‘void output(T, P ...) [with T = char, P = int, const char*, int, float, double]’
test.cpp:33: instantiated from here
test.cpp:8: error: no matching function for call to ‘output()’


I'm not sure though if this is required by the standard, or a bug in GCC.

As for C++-201x support in general, GCC seems to be the furthest ahead, even surpassing Comeau C++.

See a comparison of the various compilers' support of C++-201x.