Thursday, March 12, 2009

How Software gets stable...

I have received a couple of questions the past days if this or that rsyslog feature can be introduced into the stable branch soon. So I thought it is time to blog about what makes software stable - and what not...

But let me first start by something apparently unrelated: let me confess that, from time to time, I like to enjoy some good wine (Californian Merlot and Cabernet especially - ask my for my mailing address if you would like to contribute some! ;)). And at some special occasions, I spend way to much money just to get the "old stuff": those nice wines that have aged in oak barriques. To cut a long story short, those wines are stored in barrels not only for storage, but because the exposure to the oak, as well as some properties of the storage container, interact with the wine and make it taste better. Wikipedia has the full story, and also this interesting quote:
The length of time that a wine spends in the barrel is dependent on the varietal and style of wine that the winemaker wishes to make. The majority of oak flavoring is imparted in the first few months that the wine is in contact with oak but a longer term exposure can affect the wine through the light aeration that the barrel allows which helps to precipitate the phenolic compounds and quickens the aging process of the wine.[8] New World Pinot noir may spend less than a year in oak. Premium Cabernet Sauvignon may spend two years. The very tannic Nebbiolo grape may spend four or more years in oak. High end Rioja producers will sometimes age their wines up to ten years in American oak to get a desired earthy, vanilla character.
Read it again: "High end Rioja producers will sometimes age their wines up to ten years in American oak to get a desired earthy, vanilla character."

So what would the Riojan winemaker probably say if you asked him for a great 2008 wine (we are in early 2009 currently, just for the records)? How about "Be patient, my friend - wait another 9 years, and you can enjoy it!" And what if you begged him you need it now, immediately? "I am sorry, but I can't accelerate time...". And if you told him you really, really need it because otherwise you can not close an important business deal? Maybe he says "Listen my friend. Some things simply need time. You can't hurry them. But if you need to have something that can't really exist, I can get you a bottle of that wine and label it as 'Famos Riojan 10-year aged Wine from 2008' - but we both know what is in the bottle!". Technically speaking, the winemaker is not even cheating - he claims that the wine is from 2008, and so how can it be aged 10 years? If anyone buys that (today), the onlooker is probably very much in fault.

As a side-note, all too often our society works in that way: someone requests something that is impossible to do, someone begs long enough until someone else cheats, everybody knows - and we all are happy (at least up to the point where the cheat gets us into real trouble... - pick your favorite economic crisis to elaborate).
The moral from the story? Some things need time. And you can't replace time by anything else. If you want to have the real taste of a wine aged 10 years in oak... you need 10 years.

By now you probably wonder what all of this has to do with software. A lot! Have you ever thought what makes software stable? In closed source, you hopefully have a large testing department that helps you nail down bugs. In open source, you usually do not have many of these folks, but you have something much better: a community of loyal users eager to break their systems with the latest and greatest of what you happen to have thrown together ;)

In either case, you start with a relatively unstable program and with each bug report (assuming you fix it), the software gets more stable. While fixing bugs, however, you may introduce new instabilities. The larger the fix, the larger the risk. So the more you change, the larger the need to re-test and the larger the probability that while one issue is fixed one (or more!) issues have been newly created. For very large fixes, you may even end with a much worse version of the software than you had before.

Thankfully, a patch to fix a bug is usually much smaller than what was fixed. Often, it is just a few lines of code, so the risk to worsen things is low. Why is the patch usually just a few lines long? Simply because you fix some larger thing that usually works quite well. So you need to change some details which were not properly thought out and thus resulted in wrong behavior (if you made a design error, that's a different story...).

So the more bug reports you get, and the more of them you fix, the more stable a software gets. You may have seen some formal verifications in computer science, but in practice, for most applications, this is the simple truth on how things work.

Now to new features: features are usually the opposite from a bugfix: introducing a new feature tends to be a larger effort, touching much more code and adding code where code never has been ;) If you add new features, chances are great that you introduce new bugs. So with each feature added, you should expect that the stability of your code decreases (and, oh boy, it does!). So how to iron out these newly introduced bugs? Simply wait for bug reports, fix them, wait for more - until you have reached at least a decent level of stability (aka "no new/serious bug reports received for a period of n days, whatever you have n defined to be).

And what if you then introduce a new feature? I guess by now you know: that'll decrease stability so you need to iterate through the bugfixing process ... and so on.

But, hey, we are doing open source. I *love* to add features every day! Umm... I guess my program will never reach a decent level of stability. Bad...

What to do? Taking a long vacation (seducing...) is not a real solution. Who will fix bugs while I am away (shame on me for mentioning this...)? But a pattern appears if you follow this thought: what you need to do to make a program stable is fix bugs for a period of time but refrain from adding new features!

Thanks to git, this can easily be done: you simply create one code branch for a version that shall become stable, and create another branch for the version where you create new features (the development branch). With a bit of git vodoo, you can even import fixes from your stabilizing branch to the development branch. Once you are happy with the stability of your code (in the stabilizing branch), you are ready to declare it to be stable! For that, you'll probably have a separate branch. Then, you can start the game again: copy the state of your development branch to the stabilizing branch, do not touch that branch except for bug fixes and continue adding new features to the development branch. Iterate this as long as you are interested in your project.

This, in short form, is how rsyslog is created. Currently, there are four main branches, plus a number of utility branches that aid the development of specific features (let's ignore them in this context here): we have the development (also called "master") branch which equates to the ... yes... development branch from the sample above;). The stabilizing branch is called "beta" in rsyslog terms. Then, we have a v2-stable and a v3-stable branch. Both are actually stable, but v2-is probably even more stable because it has - except for bug fixes - not been touched for many months more. It also has the fewest features, so it is probably the best choice if you are primarily interested in stability and do not need any of the new features. As rsyslog is further developed, we will add extra stable branches (e.g. there will probably be a v4- and v5-stable branch - but we may also no longer maintain v2-stable at this point because nobody uses it any longer [just like dinosaurs are no longer maintained ;)]).

Did you read carefully? Did you get the message? So let me ask:
What makes software stable?

Bug fixes? Testing? Money (yes, yes, please throw at me!)?

REALLY? Let me repeat:

There is only one real ingredient and that is: TIME! Just like good wine, software needs to age. Thankfully, age, for software, is defined in number of different test cases. So money can accelerate aging of software (as some chemistry guru may be able for wine, probably with the same side-effects...). But for the typical open source project, stability simply goes along with the rate at which the community adopts new releases, tests them AND submits bugs, so that the authors can work on fixing broken things.

And what is the moral of the story? Finally, I am coming back to the opening questions: there is nothing but time that make rsyslog stable. So if you ask me to add a feature today, and I do, you can not expect it to be immediately stable - simply because this is not how things work (thanks, btw, for trusting so much in my programming abilities ;)). The new feature needs to go through all the stages, that is it must be applied to the current development build (otherwise we would de-stabilize the current beta, what is not desirable). Then, this is migrated to the stable build over time, where it can finally fully stabilize and, whenever the bug rate seems to justify this, it can move on to the stable build. For rsyslog, this typically means between three to four, sometimes more month are needed before a new feature hits the stable branches. And there is little you can do against that.

"But... hey, I need a stable version of that cool feature now! My manager demands it. Hey, I'll also pay you for it..." Guess what? I can do the same the winemaker did. Of course, and if you ask really nicely, I can create a v3-stable-cool version for you, which is a version with the cool feature that I have declared immediately stable (btw, it's mostly the same thing that all others just cal l "the beta"). If that satisfies your boss, I'll happy to do. But we both know what you have gotten... ;)

Of course, I am exaggerating a bit here: in software, we can somewhat increase the speed of stabilizing by adding testers. Money (and even more motivation) can do that. We can also backport single new features to so-far stable branches (note the fine print!). This reduces the stability a bit, but obviously not as much as for the development version. However, this requires effort (read: time and/or money) and it may be impractical for many features. Some features simply rely on others that were newly introduced in that development version and if you backport the whole bunch of them, you'll have something as much changed as the development version, but in an environment where the component integration is not as well tested and understood. Of course, some company policies (seem to) force you to do that. If so, the end result is that you have a system that is much less stable than the development version, but has a seemingly "stable" label. Wow, how cool! As the common sense says says: "everyone gets what one asks for" ;)

So what is the bottom line? Good software and good wine has something in common: time to ripen! Think about this the next time to ask me to offer a new feature as part of a stable branch. Its simply impossible. But, of course, you can bribe me to stick that "stable" label onto a mangled-with version...

No comments: