Sunday, April 12, 2015

Offering needs, matching offers...

What?

Some months ago I was searching for problems! Sounds like sh***** but is true! Finding a problem is one of the first step to find a solution :). And if the solution is software based => means money! hundred kilograms of MONEY!
Well, so thinking in problems in my backyard I realized that the grass was long as hell, the swimming pool dirty as hell (maybe the hell is clean, maybe not) and the device used to clean the pool was broken.... Cut the grass, clean the pool with the almost broken device and fight with daemons was in my range of possibilities, but fixing the device wasn't, so I needed someone who can fix it for me, but who the hell?. So the problem appears suddenly, how to find someone who I can trust and can fix the device correctly and with a "real/honest" price?. Well that is the idea, an app to search people who can fix swimming pool devices? Awesome!!

Well no, the idea is not that, so, generalizing the problem, to not only plumbers, garden workers, electricians and baby sitter but also to stuff that you may need (furniture, computer, whatever!) Or services or help or anything that could be said that it is a need! To satisfy these needs you usually requires "something" to satisfy them, let's call this an offer. I offer my chair, I offer my service of cutting the grass, fix electrical / electronic devices, etc. So the idea is simple, search for offers that will satisfy your needs.
There are different sites / methods / ways to solve this: linkedin, amazon, ebay, newspaper, etc, but these are handling different sectors each, it is not centralized and there are no close relation between the one who offer and the one who needs, also, there are some needs that are not in these sites as well.

There is a well known saying: "If you need something screaming it very loud" (or something like that, maybe doesn't exists and I'm lying, maybe not), but basically if you say that you need something to a friend or someone and that person has what you need is highly probably that he/she will help you, even more if he/she can get a benefit from it.
So imagine now an app, mobile app where you can create an user, and add whatever you may need and whatever you have to offer, now think that everytime you go to a party, drink a coffe with a friend, go for lunch, in any meeting, the app is checking for people who is close to you (and also has  app installed), and automatically is checking for whatever you may need from these other people, and at the same time from his/her friends list of needs and offers, not only searching for what you need but also if someone close to you is needing what you offer, or some of their friend's needs, think how much more easy will be to give and get things automatically without even spending time searching, and from people that you may directly know or have access to him/her and ask directly on that moment.
The key here is how we can categorize the information in a common and intelligent way [sections / filters / etc]. There is also other points like: with who we want to share, public data, just for known people, etc, but that is just minor details, even there is no need to be proximity related, could just be like a social network...
I think the most important thing in here is that you can get something that you need from a known person or a person that is close to you (you also need to take into account also how to filter people that you maybe don't know like the ones that are in the street walking / in the same bus / underground / etc... if you have no shame to get what you need then doesn't matter :) ).


Well, after writing this I can say I'm not a good writer :(, maybe next time is better adding more images and less text :p.

Salu, enjoy your life.

Tuesday, March 31, 2015

[C++] chunk_list vs normal lists vs vectors...


What?

Some months (maybe a year) ago I was reading a book about game design patterns (from Bob Nystrom, very nice btw, I recommend to read it), and in one of the chapters about State pattern talking about "Observers", there was a particular problem of how to store the list of observers (the callbacks basically). The normal options are a list or vector, each one with the benefits and cons. List is """"less memory expensive"""", easy to remove elements at any point (maintaining the order), and also to add them but they are slower to iterate over the elements (jumping through pointers). The vectors, had the advantage that they are faster to iterate, but less efficient in terms of allocation / removing elements or inserting (not appending).
So, the idea, very basic idea, was to use list of chunks (list of arrays) to try to gain performance when iterating & decrease the memory usage.
So, as usual, after N years (N > 1, it seems to be a rule now), I decided to check if that idea had any kind of sense, was useful, or something, and, since the ticket in the book is still open, I spend a little (really little) time on this. The code can be improved for sure and the performance as well..

chunk_list

The chunk_list as mentioned before, is basically a list<chunk> where the chunk is a fixed size array, and provides the normal operations to operate on it:



///
/// The class Chunk will be used to store a prefixed number of elements that
/// for now we will limit to be trivial constructible
///
template<typename T, unsigned int SIZE = 8>
class Chunk {
    // to avoid some issues when using this class we will make some checks here
    // to ensure the user don't try to use this with objects that will create
    // inconsistencies
    static_assert((std::is_pointer<T>::value || std::is_trivial<T>::value),
                  "We can only use objects that has trivial default constructor "
                  "or are pointers or basic types");

    // support a small number for now
    static_assert(SIZE < 256, "We maybe don't want chunks so big?");
public:

    // default constructor & destructor, note we will not destroy the elements
    // here
    inline Chunk();
    inline ~Chunk();

    // add an element
    inline void
    push_back(T element);

    // remove last element
    inline void
    pop_back(void);

    // get X element
    inline const T&
    operator[](int index) const;
    inline T&
    operator[](int index);

    // check if is full
    inline bool
    isFull(void) const;

    // is empty
    inline bool
    isEmpty(void) const;

    // check size
    inline unsigned int
    size(void) const;

    // clear
    inline void
    clear(void);


private:
    alignas (T) T m_data[SIZE];
    short m_size;
};




Pretty the same than an array, maybe make no sense to implement it and use an array would be more than enough (+ counter).
I also implemented a list, very nasty list, that's why I called nasty_list, because is completely useless, but anyway, the important thing is that work for these tests: is single linked list (I didn't care measure insertion / removing elements, so just adding always at the end was more than fine).





/// define a simple list implementation only for this test
///
template<typename T>
struct Node {
    T data;
    Node* next;

    Node() : next(0) {}
};


template<typename T, typename AllocT>
class nasty_list {
public:
    // iterator, should go outside :(
    //
    template<typename T2>
    class Iterator {
    public:
        /// ...

    private:
        Node<T2>* m_node;
    };
    friend class Iterator<T>;

public:
    nasty_list(void) : m_first(0) {}
    /// .... implementation ...

private:
    Node<T>* m_first;
};




Well, very nasty, but works, at least for the tests, and is "correct" since the values given are the same than the stl ones (assuming stl is always true, always....?).

This nasty_lists supports some kind of "nasty_allocators", to allocate memory for the nodes, because I wanted to try heap / static memory, and check also the difference.





// AAAAAAAAAAAAHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
// wth !
template<typename T>
struct HeapAllocator {
    static T* allocate(void) {return new T;}
    static void deallocate(T* t) {delete t;}
};

template<typename T, unsigned int SIZE>
struct StaticAllocator {
    alignas(T) static T mem[SIZE];
    static std::stack<T*> data;

    static void
    init(void)
    {
        for (unsigned int i = 0; i < SIZE; ++i) {
            data.push(&(mem[i]));
        }
    }

    static T* allocate(void)
    {
        T* r = data.top();
        data.pop();
        return r;
    }
    static void deallocate(T* t) {data.push(t);}

};




Look how nasty they are, even how bad performance could they have! but don't care, because I didn't measure the allocation / destruction of elements neither... Even when my soul doesn't feel fine doing nasty things, sometimes sh*** happens. Now the chunk_list




    template<typename T, typename NodeAlloc, unsigned int CSIZE = 8>
class ChunkList {
    // helper typedefs
    typedef Chunk<T, CSIZE>    ChunkNode;
    typedef nasty_list<ChunkNode, NodeAlloc> ListType;

public:
    // iterator
    //
    template<typename T2>
    class Iterator {
    public:
        /// ....
    private:
        Node<ChunkNode>* m_node;
        int m_current;
    };
    friend class Iterator<T>;

public:
    /// implementation

    /// EXTRA METHOD
    template<typename OP>
    void applyToAll(OP& op);
private:
    ListType m_list;
};




There is something important to note here, that I realized when I was implementing the class, and then I verified with the times I got: when we implement an iterator for this class, to iterate over all the elements (since that was the original focus of all this), it is not very trivial (for the compiler to optimize it) when checking the condition to move forward to get the next element on the chunk_list. This problem cause a big performance issue, since the original idea was to try to get the performance of iterating an array, but some times being interrupted by a jump (at least we can cache / fetch more elements at once). So, a second solution was provide the applyToAll(...) method, that giving a functor (or whatever you want to call it), we apply it to all the elements, and this way, it is much more easier for the compiler so he knows how to optimize it.




    auto beg = m_list.begin();
    auto end = m_list.end();
    while (beg != end) {
        const ChunkNode& cn = *beg;
        // here the compiler can optimize :)
        for (int i = 0; i < cn.size(); ++i) {
            op(cn[i]);
        }
        ++beg;
    }



Something also interesting and to take into account is that using int instead of unsigned int inside of the for was twice faster. [The thing is that the compiler handles the integers differently than the unsigned int because of the overflow possibilities, in the case of the integers it gives a sh** so assumes that it will never happen => optimize. Also compiling with -fwrapv use the same behavior for unsigned int].

Measurements

Intro


As explained before, this tests are only focused in the performance related of iterating elements in the vectorlist and chunk_list, but not inserting nor removing them, obviously the insertion in the chunk_list and removing them is faster than in a vector, but slower than a list.

Let's see how the code of the tests begin:





    typedef nasty_list<int,NodeAllocator> ListT;
    typedef ChunkList<int, NodeChunkAllocator, CHUNK_SIZE> ChunkListT;

    ListT normalLists[LIST_COUNT];
    ChunkListT chunkLists[LIST_COUNT];
    std::list<int> stdlist[LIST_COUNT];
    std::vector<int> vector[LIST_COUNT];

    for (int j = 0; j < LIST_COUNT; ++j) {
        for (int i = 0; i < ELEMENTS_PER_LIST; i++) {
            normalLists[j].addElement(i);
            chunkLists[j].addElement(i);
            stdlist[j].push_back(i);
            vector[j].push_back(i);
        }
    }




The types NodeAllocator and NodeChunkAllocator are defined at compile time, depending if "heap allocation" is set to true or not, and I use the different nasty_allocators.
What I did, was fill the different lists (LIST_COUNT, at the end I just use 1), with ELEMENTS_PER_LIST (=N) and then I started measuring the times, running this code:




    double t1 = TimeHelper::currentTime();
    for (unsigned int i = 0; i < ROUNDS; ++i) {
        for (unsigned int j = 0; j < LIST_COUNT; ++j) {
            auto beg = normalLists[j].begin();
            auto end = normalLists[j].end();
            for (; beg != end; ++beg) {
                d1 += *beg;
            }
        }
    }
    double t2 = TimeHelper::currentTime();
    const double normalListTime = t2 - t1;



In this particular example, was for the normalLists (the nasty_list) defined above, and saving the total time I spend to read each element and sum it into a variable. I run the same piece of code ROUNDS times (~9999999). I did the same for the std::vectorstd::list, nasty_list and chunk_list (for the last case, I measured the iterator version and the functor version).

For the chunk_list:functor version I use the next piece code:




    struct OpSum {
        int sum;
        OpSum() : sum(0) {}

        void operator()(int i) {sum+=i;}
    } opSum;
    t1 = TimeHelper::currentTime();
    for (unsigned int i = 0; i < ROUNDS; ++i) {
        for (unsigned int j = 0; j < LIST_COUNT; ++j) {
            chunkLists[j].applyToAll(opSum);
        }
    }
    t2 = TimeHelper::currentTime();
    d4 = opSum.sum;
    const double chunklListFunctorTime = t2 - t1;



Results


I run all the tests using the next hardware / soft / arch:

compiler: gcc version 4.9.2
OS: ubuntu linux - 32bits
Processor: Intel Core2 Duo CPU P8700 @ 2.53GHz × 2  
Mem Ram: 4GB
(yes, prehistoric, but lovely dell still fighting!)

There are a lot of results, I measured the 5 different structures (std::list, nasty_list, std::vector, chunk_list iterator version, chunk_list functor version), with different compilation (optimization) flags and different data elements + chunk sizes, and allocator types.
I group all of them in 6 tables, divided in two groups: (A) using static allocator, (B) using dynamic allocator (new). Each group is divided in 3 sub groups: {No optimization flags, O3, O3 + vectorize flags}. For each sub group I measured the next combinations of: chunk_size x element_in_container: {1, 4, 8, 16} x {4, 8, 16, 32, 64}.

Since I didn't care the insertion / removal of the elements, the time difference between the groups (A) and (B) for iterating the elements in the containers could be considered the same, so I will only show one of them. Also, the difference between O3 vectorize and O3  is very similar so I also decided to show only one of them (O3 - vectorize).

Lets see the tables:



The table shows the 5 different containers (one each color) for each combination of a specific Chunk Size and N (number of elements that the containers has).
The Y axis shows the time in seconds it takes to iterate over all the elements of the container and sum them.
Comparing the performance of chunk_list:functor against std::list, nasty_list, std::vector I got the next bests performance gains: 0.80X, 0.83X, 0.59X respectively (std::list/chunk_list:functor, nasty_list/chunk_list:functor, std::vector/chunk_list:functor). Also, the next worsts performance gains: 0.40X, 0.41X, 0.30X.
These "performance gains" are obtained getting the maximum / minimum times from all the cases.

Following the Optimized version:


In this case the performance gains comparing the chunk_list:functor against the std::list, nasty_list, std::vector were:
best: 3.12X, 3.63X, 2.28X.
worst: 0.74X, 0.82X, 0.42X.

Note again that this are getting the best/worst "times" between all the possibilities.


Conclusions


From what I see in the results, I think it worth it to make an effort and implement the chunk_list in a cleaner and "performant" way since for almost all the cases the chunk_list performs very well in comparison with the list and even with the vector (COMPLETELY SENSELESS!!!!!!!!).
The chunk_list is faster than the std::vector, compiling with the next flags "-std=c++11 -O3 -ftree-vectorize" the vector is slower, but adding the "-march=native" compilation flag the performance of the vector is almost twice better than the chunk_list.
Maybe, and only maybe, I will check this in deep, try to get the root cause of why the chunk_list is faster than a vector (could be something related with the stl or using compile time sizes help the compiler to optimize even more...)... Next days / year / life maybe I will check it :p.

Salu



Monday, February 23, 2015

[C++] stack_vector measurements.


Eh???

Some years ago I was working in a game project and I realize that I was using the std::vector a lot of times inside functions (for intermediate / temporary operations). Knowing that this wasn't so good for performance (because of the heaps allocation / memory fragmentation) I decided to implement a wrapper class "stack_vector" to use the stack instead of the heap. It is not a magic class nor something complicated, is just an array with a counter :). The most relevant thing here are the time measurements. 

There were other solutions to the same problem:


  1. Change the stl for a better optimized lib.
  2. Use a custom allocator for the stl vector.
  3. Use an array.

Changing all the stl wasn't a fast solution, and using a custom allocator wasn't in my plans :p. No, the problem was that the main data I was using in these functions was trivial types (int, float, pointers, etc). So, using a stl still with a custom allocator will not be as performant as expected (or that was I thought in that moment). So I decided to implement this class.
Basically (3) is this, but with a counter and a simple interface :). 

After 2 years (yes, 2 years) I decided to check if this class has any really improvement comparing to the std::vector or not, without taking into account the memory fragmentation, that was something that will for sure improve.

The stack_vector

The class is as simple as this (with some helper functions like a normal vector).



///
/// This class is intended to be used only for built in types that doesn't
/// requiere construction / destruction of the elements.
/// Obviously could be extended to support this, but not now :)
///
template <typename T, unsigned int MAX_SIZE = 64>
class stack_vector
{
    // to avoid some issues when using this class we will make some checks here
    // to ensure the user don't try to use this with objects that will create
    // inconsistencies
    static_assert((std::is_pointer<T>::value || std::is_trivial<T>::value),
                  "We can only use objects that has trivial default constructor "
                  "or are pointers or basic types");

public:
    stack_vector() : m_size(0) {};
    ~stack_vector() {};

    // ....
private:
    T m_data[MAX_SIZE];
    unsigned int m_size;
};




In this case the stack vector should know at priory the maximum size we will use, as any other normal array. This is not nice since sometimes we don't know how big it will be, but that could be solved using the Variable Length Array feature (C99) and passing the memory address to the class on the constructor (I copy and paste the same class and renamed to stack_vector2, lazy bastard!). Note also that this class is intended to be used only with trivial types (I'm not handling construction / destruction of the elements).

I also add some other methods to the interface that were utils when working on the project, but that doesn't matter to much.

Measurements

I compile the sample test with 3 optimization levels (just using the -O flag):

  • Without flag.
  • With -O2.
  • With -O3.


And test for each case, different sizes (number of elements for the vectors): {50, 100, 200, 500, 1000, 5000}

I used g++ 4.9.2, old lovely dell vostro (intel dual core), linux.
The test routine that use the vector / stack_vector / stack_vector2 (VAL) are:




int
sampleCallHeap(void)
{
    int result = 0;
    std::vector<int> elements;
    elements.reserve(NUM_COUNT);
    
    for (int i = NUM_COUNT-1; i >= 0; --i) {
        elements.push_back(i);
    }

    for (int i = 0; i < NUM_COUNT; ++i) {
        result += elements[i];
    }
    return result;
}

int
sampleCallStack(void)
{
    int result = 0;
    stack_vector<int, NUM_COUNT> elements;
    for (int i = NUM_COUNT-1; i >= 0; --i) {
        elements.push_back(i);
    }

    for (int i = 0; i < NUM_COUNT; ++i) {
        result += elements[i];
    }
    return result;
}

int
sampleCallStack2(int size)
{
    int result = 0;
    int mem[size];
    stack_vector2<int> elements(mem);
    for (int i = size-1; i >= 0; --i) {
        elements.push_back(i);
    }

    for (int i = 0; i < size; ++i) {
        result += elements[i];
    }
    return result;
}



As you can see, the 3 methods are pretty similar and very simple, we just create a vector, push all the elements, and then iterate over each one and sum the result into a variable.

The first test I did, the performance between all the versions were very similar, because malloc was always matching the perfect block for the std::vector (printing the address was always the same). So I decided to add some "random allocations" in between the calls (but not measuring the times of those) so the vector could or could not get the same address (like should happen in normal apps).

The timings I got calling N (=99999) times each of the test functions, for each different size (quantity of elements) and for each optimization level.
I will show for each case a bars graphic showing the three implementations: {std::vector, stack_vector, stack_vector2}, the stack_vector2 is the same than stack_vector but using the VAL feature. In the graphics the y axis is the time in seconds that took call N times each version, and the x axis the size of the vector. Also, a table containing the times for each case, and 2 more columns: Heap/Stack showing the performance (X gain) from the stack version comparing to the heap one, and Heap/Stack(VAL) comparing the heap and the VAL version (stack_vector2).

Without optimization



Size Heap Stack Stack (VAL) Heap/Stack Heap/Stack(VAL)
50 0.3288 0.1696 0.1727 1.94 1.90
100 0.5142 0.2206 0.2373 2.33 2.17
500 2.0007 0.6031 0.7070 3.32 2.83
1000 3.8021 1.1257 1.2687 3.38 3.00
5000 18.2371 5.1559 5.9057 3.54 3.09

Optimization O2



Size Heap Stack Stack (VAL) Heap/Stack Heap/Stack(VAL)
50 0.1432 0.1238 0.1247 1.16 1.15
100 0.1621 0.1273 0.1277 1.27 1.27
500 0.3385 0.1868 0.1821 1.81 1.86
1000 0.5632 0.2554 0.2516 2.21 2.24
5000 2.2400 0.8119 0.7838 2.76 2.86


Optimization O3



Size Heap Stack Stack (VAL) Heap/Stack Heap/Stack(VAL)
50 0.1411 0.1207 0.1265 1.17 1.12
100 0.1628 0.1275 0.1283 1.28 1.27
500 0.3420 0.1841 0.1821 1.86 1.88
1000 0.5487 0.2525 0.2485 2.17 2.21
5000 2.2088 0.8040 0.7745 2.75 2.85


Conclusions

As we could see in the ugly graphics and nasty tables (sorry), the performance gain is not so much when we use the -O3 optimization level, the compiler do a lot of magic and the performance is almost the same. Bigger is the size of the vector (>~500) higher is the performance we gain, but note that using huge block of mem in the stack is not so recommended neither.
The main benefit I can see here, except the little performance gain (for ~100 elements), is minimizing the memory fragmentation that at the end will affect the overall of the app running time.

Note that this class was created a long time ago, just copy the old code into a new file but could be even more efficient (aligning the memory, checking if we can vectorize some operations, etc, but that will be done in 2 years :p).

Code

The test and code and everything is in github.


Salu. 



Thursday, February 19, 2015

"Blowmoney" idea!

What is "Blowmoney"

I have no idea, but this idea for sure, if possible, will give you A LOT OF MONEY! A LOT!!!! or at least that is what I think.

The idea

After some months, or maybe years, I still think that this idea is for sure pure gold :). This will sound weird or strange, but, suppose you can construct a device that let you detect if a human had recently sex, you just get close to the person and pass the device close to him/her and detect if the person had sex or not recently. Now imagine how much people will buy this device (have a commercial point of viewwwww!) The statistics of infidelities is high, maybe growing with the time, and for sure there are a lot of people who will want to buy this at the first doubt in the relationship. Now, if you ask me if this is moral / ethic / whatever, I think not and I think if you are in a relationship then you must trust in the other person (I love you Cony :)!)... but, the device has not the blame nor is bad nor is ethic or not, maybe the action of using it in some situations could be not ethic / moral or "nice", anyway, whatever is the case, this device will be worldwide sell!.

Talking with one of my best friends "Gornete" (Lucas GornĂ©), Biologist, genius, currently almost ending the Ph.D., he did a fast bibliographic research (not so deep research but still a research), and it may could be possible to detect some hormones that the human body secrets, but there are at least two major thing to take into account:
  1. During how long this hormones can be detected after "the act".
  2. Which is the error probability when detecting the hormones (or chemicals).   
If (1) is to short, let's say 1 hour, then it will be completely useless. If the time is too long, lets say > ~5 days, then could be a little bit more useful but still not so much.
Now, what happen if (2) is to high, because I don't know, the human body secrets the similar hormones when eat a burger, or walk, or is excited only, then, this will generate a lot of unnecessary problems (lets say divorces).

Anyway, there are already around there these electronic noses that can detect different kind of "smells" (chemical stuff), and I have the feeling (based on absolutely nothing else than "good feeling") that (1) and (2) could be completely solved with the right "very sensitive" technology to be able to measure different concentrations of these hormones very precisely... Anyway, talking without know is impossible to ensure.

So well, this is the idea, if someone who has the knowledge / resources to do it start investigating and create the device and start selling it and makes millions and millions of dollars I will be very happy and Gornete as well (I think :p), but send us at least some of them for free to save it as remembers :p.

Salu!






Tuesday, February 17, 2015

Swimming counting software

Swimming counting software

What is this!? an obsolete idea maybe, old, but in the old good times was a good one...

Some years ago when I had a problem in my knee (injured) so I had to go to swim for rehabilitation... - also is one of  the best sports to avoid new injuries!-.., after going multiple times to swim and swim I had the problem that I always lost the count of how many pools I did! 
Usually when I do something repetitive during long time (>30min) my mind start thinking in different stuff, and there is where my problem appears: thinking and counting is complicated. 
So I though how can I go swim, and don't torture myself repeating the counting and being afraid of lost the count XD! so how can I swim peacefully + think (or not) at the same time? (note that not thinking is maybe more complicated...!). Obviously having one person looking at me swimming for one hour is not an option :). 

So the idea was: what happen if you put some cameras on the pool and a lovely machine do the counting for you (and for all the other people in the swimming pool!). Not only the counting but also statistics about velocity, time, distance, etc, per day / week / month / year / after life?, sending this reports by email as well or whatever.

Obviously the software exists for professional sports and not only swimming, but what about of using some of these free / open tracking systems, create the soft and then just sell it to all the swimming pools so they can provide an additional service to their clients??!!

There are some things to take into account like how link a person from the moment he/she logging to the system until he/she ask for statistics?... But from what I remember, when we discuss this with the group of friends that we was thinking to work on the idea, everything had a solution :), but nobody had time, specially me :(.

Well, hope someone want to do it and sell it to all the swimming pools so next time we can swim without counting :). Or maybe now exists some hand-clock or an android app with a water resistant phone that do all this automatically, so this post is completely obsolete and useless.

Salu!

Tuesday, February 10, 2015

C++ Status: Little error handling idea [UI apps mainly]


Overview

After working some time in UI applications I always end with the same issues of how to handle the errors efficiently and clean. I decide to implement this little wrapper and use it as the base for the different modules of the projects. This is mainly for the ones who don't use exceptions (because they don't like it or because they cannot).

The wrapper "Status"

The main motivation to create this wrapper was the idea to have the information of the error (human readable) with the error itself and be able to propagate it efficiently over the functions (like exceptions do). It is like an intermediate solution between "old style error code checking" and exceptions.
To solve this problem I used and saw two main ways:
1) Use a error code (enum or whatever) and then map it to a predefined string list that shows the "information" to the user about what happened.
2) The modules (classes) had a extra method where returns the "detailed" information about the last error (getLastStringError() or similar).

In some cases (1) could not be the best option because the error is not detailed enough since is a "common" list where the same information is used in different scenarios.
The thing with (2) is that adding a additional method for each class and have a buffer where we save the last error info is not clean nor nice (at least for me :) ).

So, to solve this two things and do it as efficient as possible, since we don't want to loose performance, never, never loose performance, I decided to implement this wrapper. The main idea is that the Status wrapper will have the same size as an integer (like an error code) but will contain more information.
To be able to save the "detailed error description" we will not put the string in the wrapper itself but an index to a table of shared strings, where we will put the real description.
Here is the data we save in the class:


class Status
{
    //     ...


    // this is all the information we will hold to maintain the same efficiency
    // than normal integers.
    // the scheme as next
    // bits: 0..3   (4 bits)    => Severity (none, critical, normal, low)
    // bits: 4..15  (12 bits)   => StatusCode (error code)
    // bits: 16..31 (16 bits)   => Internal index referencing table

    struct Data {
        // define the invalid index
        static const unsigned short INV_INDEX = (1 << 15);

        unsigned int code       : 12;
        unsigned int severity   : 4;
        unsigned int index      : 16;

        Data(unsigned int c, unsigned int sev, unsigned short i) :
            code(c), severity(sev), index(i)
        {}
    } data;
};    

As we can see we are only storing 32 bits in the class (like an integer in some platforms). This way when we return the status we are just returning an 32 bit structure and will be fast to copy it, also, whenever we go up with the message (until the UI code to show it to the user), we never duplicate or copy the string itself, and here is where we earn some performance in comparison with other methods (copying the string up and up).
The other thing I decided to add is the severity of the error, this is useful in some cases when we just need to know if we can continue with code execution or return immediately, this way we can maintain a much more simpler and clean code!.

The Status can be constructed in different ways providing 3 main manners of using it: like booleans, like old error codes style, or error code with description.

Lets see the next example:



Status
coreMethod1(int arg1, int arg2)
{
    // check arg validity
    if (arg1 < 0) {
        // invalid arg
        return Status(StatusCode::STC_INVALID_ARGS,
                      StatusSeverity::STS_NORMAL,
                      "The arg1 is less than 0 so we cannot proceed!");
    }

    // everything is fine, just return no error (automatically constructed
    // from bool).
    return true;
}

// other methods...

Status
middleLayerMethod(void)
{
    // ...

    // suppose that we don't want to continue if coreMethod1 return a critical method
    Status st = coreMethod1(v1, v2);
    ST_CHECK_CRITICAL(st);

    // but know suppose we have multiple methods to init that are not
    // so important if they fails, but still we want to report to the user
    // the information:
    st += coreMethod2();
    st += coreMethod3();
    st += coreMethod4();

    return st;
}

void
uiLogicLevel(void)
{
    Status st = middleLayerMethod();
    if (!st) {
        // handle status / show description to the user if it fails
    }

}

Here we can see two of the benefits of this wrapper:

1) If the Status is moved from different functions (returned, copied, whatever), the copy of it is trivial, "no performance hit at all" (almost, but for sure less than a string being copied up).

2) We can accumulate and return the "information" avoiding ugly branches in certain cases (if the error is not critical of course). This second option could (should) be configured depending the needs of the project, this is implemented in the operator+= method



///
/// \brief operator += will be used to mix different status, we will
///        concatenate the strings and also maintain the highest priority
///        and the error. In case that the error is different, we will
///        simply set it to internal error (this is maybe not what you 
///        need/ want).
/// \param other    The other status we want to mix with
/// \return the mixed new status (this)
///
inline Status&
operator+=(const Status& other);




Initialization

To be able to use it you must first initialize the StatusHandler, that is the class who contains all the shared strings for the Status instances.




// init the handler
static StatusHandler statusHandler;
// set the global handler to be used by all the Status instances
Status::s_shandler = &statusHandler;

Then you can start using it normally. Note that this is not prepared for multithreading since we are sharing the strings table.


Project Source

This little project and wrapper is located in github so if you want to take a look.
Hope it helps someone!