ZWARTBERG Research & Development

Software Engineering BLOG

The Karoo Project

HOME

Index

GPS Tracker

Business Applications

History

Web Server

Motivation

Command Pattern

Brian Modra's Karoo Project BLOG

This is where I am writing interesting things that have to do with The Karoo Project, but not quite as directly as a description of it, or technical as the on-line documentation.

GPS Tracking using Linux SBC

I've been involved with design of a tracking system since 2006.

During this time, I've seen a lot of development money going into the server and the tracking unit in an effort to keep down the cost of the tracking unit hardware. This makes good business sense because tracking is a volume business, where profits depend on small markups per unit.

However, recently Linux SBCs have popped onto the market in ever decreasing prices and increasing features. At the moment, some low-cost SBCs are available at prices that start to make sense for tracking units. The SBC I've started to use is not quite cheap enough for volume production of regular tracking units... but:

If you consider the developoment cost that typically goes into a tracking company startup, if the trackiing unit is easier to program, then development costs will be lower. While the company is in start-up mode, development costs will be large when divided among the few units that will be sold in the first few years. I.e. the tracking unit won't actually be cheap. It will be cheaper to use more expensive hardware and therefore reduce developent costs. Then in a few years when volumes start to increase, the SBC will probably be cheaper by then, or you can then afford to create a more compact unit (in terms of computing power).

If I've made sense, you would be able to follow my reasoning: don't develop a tracking unit that has a tiny CPU and memory, thinking that this wll keep costs down. It will actually make it more expensive. Wait till your company has a hold of the market share, then re-develop the hardware and shrink the software, to make the unit cheaper to manufacture.

This same train of thought should apply to any startup company buildiing small devices: use Linux, even if that makes the hardware slightly more expensive. In the short term it will mean shorter time to market, and less up-front development costs. In the long run, it will result in greater profits due to faster market penetration.

See also the Karoo GPS tracker page

The Karoo Project

Business Applications

Unfortunately, business applications are usually under so much pressure to "get something out there", that they rarely get a chance to design the architecture properly, and also, as most "start-up" companies are small, the applications that are built usually only work for small loads. later on, they have to either re-engineer it, or buy massive servers.

I got interested in distributed systems, but could not find any current "super computing" designs that really would work for business applications. Business applications must be reliable, and they usually are not complex, but they get heavily loaded. Scientific applications, in contrast, don't have to be quite so reliable, and they are at the absolute high end of the scale with respect to mathematical complexity. They are typically large programs, that you throw at a large server, and let it chug away.

Too many systems are designed for the engineer's convenience rather than the user's convenience. The cheapest way to get a bigger hard disk is to use RAID. Then you need to add in redundancy to compensate for the fact that the probability of failure increases with the number of disks. I think its better to design the application so that it can use smaller disks: split the database up into multiple databases and implement software "database references" between applications.

The Karoo Project is an architecture to facilitate modularisation of the application so that each (largish) module is a process, a stand-alone program that is integrated so tightly with other such programs, that they as a group perform as one super-application.

History

Its becoming more and more common to use Java for services, because of tools like tomcat, which handle accepting connections from sockets and thread management. But somewhere along the line, you come across "unexplained bugs" which cause the whole service to crash. Probably the most common cause of crashes is that two threads may write the same variable at the same time, therefore causing a nonsense value (a "race condition").

I've been guilty of unjustly blaming tomcat and then Java (also unjustly) for being the cause of all my problems, rewriting everything using C++, to no avail (except to reduce resource problems.) I still had not solved the problem: we write bugs, especially race conditions, and these show up when lots of threads are running in one process, therefore bringing down the whole service. (Then if we are overly zealous with semaphores or mutexes to avoid race conditions, we end up with deadlocks, but thats another story.)

In a nutshell, I had "all my eggs in one basket", so that when one obscure part of code decided to crash, it brought the whole service down. The solution was simple: split the application into many programs, all communicating with each other.

The Karoo Project is a framework for application development.
It has a manager (which starts all the applications), and a library of tools to make it really simple to create large services.

The basic distinction between this framework and most others, is that it allows you to create a number of applications that all work together to make one service (or one "super application").

... and these applications can be running on different host machines, because The Karoo Project also provides a very tidy way to do inter process communication.

This makes it really easy to create scalable applications, and it makes your application much less vulnerable to problems such as memory trashing or memory leaks. If those things happen, they will only take down one applications in the system, not the whole system... the manager will automatically restart it, and create a log so you can later attend to the bug.

Web Server

One of the first applications I created using The Karoo Project framework was a web server.

However, this web server was never intended to compete with Apache. It just so happened that I needed a web interface, and when I started to create an Apache module, I realised it would be easier just to create a web server using my own framework!

The complicated part about a web server is really how to manage threads and file handles. The Karoo Project has support for TCP/IP streams and management of threads, so it was very simple to create a web server... and, while on the subject, this highlighted to me one of the reasons why The Karoo Project is a great framework for application development: Apache modules must use Apache's memory allocation. (I'm not criticising this, its obvious that Apache needs to manage the memory allocation of 3rd party modules that are added into it... because the modules become part of the Apache executable process.) But its also obvious that it would be better if those 3rd Party Extension modules didn't have to be part of the web server executable process.

"Extensions" is actually not a term we will use in The Karoo Project, because anything can be a part of "the application". The web server itself is a number of executable processes, each communicating with each other.

The key to this is the queue system (which makes it simple for one executable process to manage multiple threads), and the messaging system (which makes it simple for many processes to "call each other").

Rather than use RPC to do the inter-process communication, I chose to simply use a UDP-based messaging system. UPD is much lighter-weight than stream sockets, especially when you consider how many threads and file handles RPC would actually consume if your application is to be able to scale indefinitely and be able to distribute across thousands of machines...

Having brought up the subject of "distributing across thousands of machines" its obvious that we need a manager process. This is part of The Karoo Project: it has a manager which is configured by an XML file. It polls the file periodically to check for changes, and if changed, it may need to stop and restart some of the processes.

Each of these applications in The Karoo Project is called a "rock", and each object in an application that runs in a thread, is called a "pebble".

Motivation

I started this project because I need it. To make good applications, you need a good application framework. For example, you need a tidy way to catch coredumps. Also, I was struggling with resource bloat, and I was running out of power and space on servers. (This is all about business applications, not scientific applications.) I was distributing the load by duplicating the service and allocating certain ranges to certain servers... while this worked, it wouldn't have on another application.

While hunting around for ideas to solve the fact that my servers were getting overloaded, I considered just getting a big server. This has a few problems:

  • the cost of bigger and bigger servers rises disproportionaly,
  • when the number of threads increase in one application, so does the likelihood of race conditions,
  • I am not sure how well postgresql works when the database grows into terabytes,

Its also very inconvenient to have one large application program doing everything, because if you want to upgrade just one section of it, you have to shut down the entire service temporarily.

... so it was making more and more sense to make an application framework that allows an "application" to be made up of many programs.

Command Pattern

In Computer Science, its well accepted that large functions/methods/subroutines are a "bad idea". For a start, it makes debugging difficult... it will also likely have many local variables, which can cause stack overflow... etc.

Similarly, I believe that in real-time business application design, it should be considered a "bad idea" to have any section of code that takes a long time to complete, or that does a lot of things. Blocks of code should be organised so that each has a fairly simple logical function, and completes quickly.

So I have created a framework where its easy to put most of the program logic into "pebbles" (which use the Command Pattern). E.g. consider the following batch style of programming:

for (int x0 = 0; x0 < max_x; x0++) {
  for (int y0 = 0; y0 < y_max; y0++) {
    int x = 0;
    y = 0;
	
    int iteration = 0;
    int max_iteration = 1000;
	
    while (x*x + y*y <= (2*2)  &&  iteration < max_iteration) {
      int xtemp = x*x - y*y + x0;
      y = 2*x*y + y0;
      x = xtemp;
    
      iteration++;
    }
	
    if (iteration == max_iteration) {
      color = black;
    }
    else {
      color = iteration;
    }
    plot(x0,y0,color);
  }
}
	    

... compared to this style, which I am calling "Karoo style" :

class inner_mand: public pebble
{
private:
  int max_iteration;
  int x0, y0;
public:
  inner_mand(int x0, int y0, int max_iteration)
  {
    this->x0 = x0;
    this->y0 = y0;
  }
  virtual ~inner_mand() {}
  void run()
  {
    int x = 0, y = 0;
    int iteration = 0;
    while (x*x + y*y <= (2*2)  &&  iteration < max_iteration) {
      int xtemp = x*x - y*y + x0;
      y = 2*x*y + y0;
      x = xtemp;
      iteration++;
      q->add(this);
    }

    int color;
    if (iteration == max_iteration) {
      color = black;
    }
    else {
      color = iteration;
    }
    plot(x0,y0,color);
  }
}

...

for (int x = 0; x < max_x; x++) {
  for (int y = 0; y < y_max; y++) {
    inner_mand* p = new inner_mand(x, y, 1000);
    p->dereferenceAfterRun();
    q.add(p);
  }
}
	    

If "q" is a pool of queues (exepool) rather than a single queue (exeque), then some pixels will not appear in a sequential order.

The main point of interest about this style of programming is that what would have been a long loop, effectively "blocking" a thread for as long as it takes... becomes a series of chunks of execution, in a thread, which can still handle other pebbles added to it.

If the queue used is a pool (exepool), then it will define a maximum number of threads.

A flow of logic can be created... e.g. rather than this:

int function_a(int x)
{
  ...
  return x;
}
void function_b(int x)
{
  ...
}

...
x = function_a(x);
function_b(x);
	    

The "karoo style" would be:

class pebble_b: public pebble
{
private:
  int x;
public:
  pebble_b(x) { this->x = x; }
  virtual ~pebble_b() {}
  void run() {
    ...
  }
}

class pebble_a: public pebble
{
private:
  int x;
public:
  pebble_a(x) { this->x = x; }
  virtual ~pebble_a() {}
  void run() {
    ...
    pebble_b* b = new pebble_b(x);
    b->dereferenceAfterRun();
    q->add(b);
  }
}

...

pebble_a* a = new pebble_a(x);
a->dereferenceAfterRun();
q.add(a);
	    

Notice that the flow of logic in both the batch style and karoo style are the same, but the execution flow is interrupted and restarted in the karoo style. Anyone can see that the batch style of programming involves less typing than the karoo style... and also thet there is a memory overhead and a CPU overhead... but it would not be used for simple functions. It would only be used for reasonably complex functions so that the overhead is insignificant.

Karoo GPS tracker.


See also The Karoo Project homepage.


Get the docs.


See it on sourceforge.


Open Source alternatives.


The Karoo Project

HOME

Zwartberg Reseach & Development is a registered trading name of Open Source Software Consulting CC.
Phone:+27235411462, fax:+27235411379, mobile:+27796977082,
brian@zwartberg.com, P.O. Box 2, Prince Albert, 6930, South Africa.

The Karoo Project

Valid XHTML 1.0 Transitional