Friday, April 11, 2008

TLS, loosely coupled modules, a runtime and licensing...

Again, I reproduce a mailing list post in the hopes to reach the broadest audience and also keep a permanent record here in my rsyslog history.

Hi folks,

I am sorry, this will be a long mail. But I would appreciate if you'd
read it in full and comment on it. This mail covers a really important
decision for rsyslog and will probably even influence if the project
succeeds in the long term. Package maintainers and code contributors are
especially requested to *really* read it. Though I t try hard to provide
all relevant facts (that's why it is getting long), I will probably miss
some and not properly convey others. Please feel free to ask.

Let me start with explaining that the rsyslog project conceptually
consists of three parts:

- the modules
- "helper" functions
- rsyslog-specific functionality

Modules are actually projects in their own right, just being distributed
with the rsyslog tarball for convenience. A module may be released under
any license. Note that modules call both rsyslog-specific functionality
(e.g. to submit a message) as well as helper function (e.g. to handle
tcp sessions).

The "helper" functions are a growing set of generic objects. Examples
are the module loader, the queue engine, networking support, the script
engine and virtual machine, ... - you get the idea: Things that are used
inside rsyslog but are not necessarily of use only for rsyslog.
Actually, this could be called a "rsyslog runtime library".

Rsyslog-specific functionality is primarily rsyslogd and everything it
takes to glue together helpers and plugins to build the working syslog
subsystem.

Let's stick with that for a brief while. Now let me explain the idea of
loosely coupled modules. This stems back to JF's effort to convince me
to the Unix philosophy of "small tools that work well together". We had
another good discussion yesterday (on the blog) and it made me change my
mind a bit (well, probably, not 100% convinced yet, but it he managed to
seed the thought ;)). While I still think that there are things that
need to be really tightly coupled to the rsyslog core, there are others
which not necessarily need to. Let me call the later "loosely coupled
modules", in contrast to the (tightly coupled) plugins that actually
become part of the rsyslogd process during runtime. The analysis plugins
I have on my mind could become such loosely coupled modules. As an
interface, the usual Unix "send it and forget it" pipe could be used,
and it would probably be acceptable to allow for minor message loss
during shutdown and plugin failure (anything else would require a pipe
application protocol, e.g. relp over pipe, which sounds scary).

The plus in doing so would be the ability to use those plugins in
configurations where rsyslog is not present (e.g. driven by syslog-ng or
a detached message generator [fed from a log file]). Done right, one
could even select the (sligly lossy) pipe interface or the full blown
plugin interface as a compile time switch. If you think it out, we may
even end with an abstraction layer where each module can be compiled for
either the plugin interface or the pipe (no promises, though).

One problem with this approach is that modules call into the rsyslog
helpers. For example, rsyslog's network support need to be available for
all those modules that do something over the net. That's not a problem
if I have a tightly coupled plugin as today (the rsyslog core makes the
necessary bindings). It would become more problematic if I move the
module to a pipe interface, because I now need to find a way to use the
rsyslog objects. But that's still doable (though pretty ugly). It
becomes really problematic if the same module, using a pipe interface,
is to be used with e.g. syslog-ng. I don't think that syslog-ng will be
able to provide it with an emulated rsyslog "net" object.

Let's stick with this problem for a moment. Coincidentally, we had
another discussion on the mailing list yesterday - on the TLS support
wrapper for rsyslog and librelp. That discussion centered around
licenses. Technically, there are also a number of issues. I have now
involved myself enough with GnuTLS and a bit of NSS so that I am able to
try draft a first abstraction layer. I thought hard and the "right
solution" involves encapsulation stream network access. So the right
thing to do is to have one object that handles network streams. That
object then is configured to use either plain tcp, TLS (via whatever
library) or even GSS-API. Nice and clean. It gets dirty if I think about
the details. If I do it that way, it makes rsyslog depend on this object
(so-far codenamed libsci). However, that would mean that any rsyslog
installation would need to pull in libsci. Not a big deal, except,
right, except if the crypto libraries are also pulled in by libsci. So
would every desktop system running rsyslog need to have the crypto
libraries installed? Scary... unacceptable.

In rsyslog, we had the same problem a few month ago (at that time the
mysql client libraries were the problem). The solution was the rsyslog
loader, which dynamically loads other libraries (and their dependencies)
on demand. The loader is what enables rsyslogd to be installed
everywhere, but only with minimal core requirements (and have everything
else in separate packages). So if libsci would be part of rsyslog, we
would not have any problem at all. After all, the necessary plumbing is
ready at hand in form of the rsyslog helper objects.

This is where we come back to loosely coupled modules. You notice it is
the same problem? Both them as well as libsci would need to call the
rsyslog helpers.

Now let's come a bit to licensing. In order to understand that, we need
to talk about rsyslog funding first. Obviously, I am spending full time
(and a bit more than that) on rsyslog for quite a while now. I even
intend to do that for some more months as rsyslog is currently mabye 55%
of what I would like it to be. Somehow I must get funding - for the
time, for the hardware and for all the other things ;) What made the
rsyslog project possible, and still 99.9% funds it is Adiscon, the
company that provides (of course ;)) the best-ever logging solutions on
Windows. Actually the Windows closed source pays for the rsyslog
project. While we hope to find other sources of funding in the future, I
can not ignore the fact. Once thing we would like to do at Adiscon is
include select parts of the technology I am now developing into the
closed source applications, too. The most prominent example is the RELP
protocol. I obviously find this a fair policy - after all, the
alternative would be to do it in closed source only and I was able to
convince my folks at Adiscon that it is far better to contribute to the
open source world.

There is one drawback in this requirement: licensing. Of course, we
could pick a BSD style license and every problem would be solved. But, I
have to admit, we do not like to give everything to our competitors in
the *closed source* space. We have made very bad experiences with folks
building on our technology and even turning it against us. I won't get
agreement from Adiscon to use a BSD license for everything (plus I
personally, too, don't like to see that effect).

We already discussed this on the mailing list here as part of
dual-licensing in the past. The solution was that the technology in
question was created as its own, dual-licensed, project. This lead to
the creation of librelp. Rsyslog itself was left under GPLv3 (which I
sincerely believe in because of its anti-patent, anti-drm clauses - even
though the license gives myself obviously some troubles).

Dual-licensing librelp lead to some duplicate code and made me not use
some features which I could have used if I had access to all rsyslog
helper objects. For librelp, that is not yet a big deal, because it is
quite unique, very few needs to access the rsyslog helpers. With TLS,
however, the situation changes and we get the dangling issue of rsyslog
helpers in librelp, too.

LETS TRY TO WRAP-UP

If I put all of this together, I think I have taken a (slightly? ;))
wrong path. The core problem is monolithic design from a very high point
of view. I have to admit I think this is what JF and some others were
pointing out, but I didn't realize it quickly enough. Sure, rsyslog is
quite modular by now. But rsyslog always requires rsyslog to do
everything. It is very hard to do any rsyslog-related work without the
rsyslog core. While rsyslog has a carefully crafted set of helper
objects, these are not exposed to the outside world. And the licensing
issues associated with that design begin to screw up everything in the
long run.

I think we need change. The obvious solution seems to be extracting the
rsyslog helpers out of the rsyslog core project and create a "rsyslog
runtime". That runtime than could individually be installed and be put
under a different license (bear with me, explanation follows below).

Let's consider a complicated case with the runtime. Assume we have a
plugin "NeverBeforeSeenAnalysis". Let's say someone wants to use it with
syslog-ng (!). With the runtime, all needed would be to compile it for
the pipe interface and install rsyslog-runtime and the module onto the
system.

Now let's consider Adiscon's MonitorWare products on Windows. When they
implement RELP, they need librelp and can pull in the rsyslog-runtime
(for network access including TLS).

For rsyslogd itself nothing really happens, the runtime is now just its
own library - linking to it needs not to be modified. So for rsyslogd,
the change would be transparent.

Technically, this indeed solves the issues. Let me stress the point that
it leads to code reuse, where I currently need to rewrite things (which
increasingly concerns me, especially from a maintenance point of view).

Now, on to the licensing. Obviously, the MonitorWare use case above
would be totally incompatible with GPLv3. So the rsyslog-runtime would
need to be under a different license. It could be dual-licensed, but I
think that would probably do more bad than good. I think I can convince
Adiscon to go with LGPL for the runtime part. Granted, it introduces
risk of closed source competitors pulling it in, but the advantages
should outweigh this risk.

>From the ability to put this work under a different license, I think I
am in good shape: most of the helper objects are freshly written and
have only received limited patches (if at all) from contributors. I can
contact them and ask for permission to change the license. Where I don't
get permission, I think I can re-implement the contribution. Again, most
of the code in question has been written in the past 4 month and is
99.999% non-contributed. There may be some few runtime objects which
stem back to sysklogd. There, a license change is impractical. I'll have
to life with the fact that those can not go into a re-licensed runtime.
Depending on how important the functionality is, I either need to
rewrite or drop it (for non-rsyslogd use). In any case, this looks
(pending detail analysis) quite possible.

Big question number one is what you think of this runtime approach? Have
I overlooked something? Do you object it for some reason? If so, which?

The next question is how to package this inside the source tree.
Remember, currently rsyslog and the plugins (considered separate
projects) are all packed together inside a single tarball. This is very
convenient, both for me as well as for package maintainers and users.
The question is if we split rsyslog into the rsyslogd and the
rsyslog-runtime, will we continue to deliver the runtime as part of the
rsyslog package? Or would it be better to move it to a librsyslog
project? Other than with the plugins, we actually would have two
different licenses, so it may be confusing to have both of it in the
same project (but I have seen that GnuTLS uses exactly this approach,
with the main library being LGPL and the - included - extras library GPL
only).

So that's the next question, obviously depending on the first: how to
pack projects if we do a runtime split?

I know this is a long and dense mail. My apologies for this. But I think
the discussion is needed. I honestly believe that a number of
discussions in the past weeks actually circled around this theme, we
just didn't actually get down to the point.

Please note that I will hold TLS development until we have reached
consensus on the runtime/licensing topic. The reason is if we don't do a
runtime split, I need to do things considerable different than when we
do one (much more code, probably yet another external library). So,
obviously, I have a current bias towards the split. However, experience
shows that I (as everyone ;)) tend to overlook or misunderstand things.
Thus your feedback is so important. I don't like the idea of jiggling
back and forth on such an important topic as licensing and high-level
modularization, so I would like to get it now done in "the right way"
and keep it stable for at least the foreseeable future. Given the fact
that the decision somehow affects rsyslog's development as whole, I
would even appreciate quick feedback.

In this spirit: please let the comments flow ;)

Thanks,
Rainer

2 comments:

David said...

I think you are mixing several different issues in this message.

1. you have input/output plugins

2. you have the queuing/filtering/dispatch mechanisms

3. you have the potential 'analysis plugins'

4. and finally, you have the library of useful functions that you use in the plugins that you have written

it looks like what triggered this discussion is the desire to extend one of the helper routines to support encryption.

you need to think long and hard about what you consider the core of rsyslog.

my thought is that this is #2 above.

just about all of the input and output modules could be completely separate projects, running as separate processes communicating over a pipe with a relp-like api

the analysis plugins could be a combination input/output module (registers to receive the output, analyses it and feeds it to the core as input)

this just leaves your helper routines.

these break down into three catagories

A. those that are necessary to communicate with the core (submit messages, retrieve messages)

B. those that are useful to make programming easier

C. those that are necessary in a multi-threaded program to keep the threads from stepping on each other

items in category C don't need to be shared if the plugins aren't run as threads inside of rsyslog

items in category A need to be available to anything you talk to (either as pre-packaged routines in library/runtime files or with a well-defined API, preferably both)

items in category B are conveniences, they are useful to have, but not required.

I think items in categories B and C should be licensed for your convenience, you need to be able to use them for anything you want (including your closed-source projects).

I suspect that if there were alternate ways to

in an ideal world you would define the following APIs

input
pipe lossy read (no confirmation)
pipe confirmed read (relp-like confirmation)
internal thread read (current api)

output
pipe lossy write (no confirmation)
pipe confirmed write (relp-like confirmation)
internal thread write (current api)

(note the pipe versions could be built as threaded plugins under the existing api)

the performance of these pipe input/output routines would need to be tested, but I don't see why they would be enough slower than the threaded approach to be noticeable)

If APIs like this were defined, I don't think there would be much need to export all your helper routines under more permissive licenses. dual-license them under the gplv3 + a license specifically to grant Adiscon the right to use and resell the code (essentially BSD to Adiscon only)

David Lang

Rainer said...

Hi David,

comment much appreciated. I can not elaborate on all of these things (unfortunately), but at least a few quick thoughts. The system is designed in a way that makes it easy to change things even after some time. So we still can change things. But that of course requires effort, so I think there are more important things at this point in time.

In regard to performance, I may be wrong but I think there will be a notable difference between the threading model and the process model. Sure, context switching is roughly the same (but more with relp-like acks), but we need to shuffle things into the different process spaces. Right now, every thread can access all memory, so we do not need to copy over all these large strings.

Also, I think that parts of the functionality is needed in the rsyslogd process in any case. For example, if the rsyslogd process can not queue by itself, how should it handle messages which it could not yet deliver to the outputs? The same issue is with the inputs. I think many routines would need to be duplicated (or at least be mapped into several process spaces via the OS loader).

but again... not seriously considering this at this time ;)

Rainer

Automating Coverity Scan with a complex TravisCI build matrix

This is how you can automate Coverity Scan using Travis CI - especially if you have a complex build matrix: create an additional matrix en...