Friday, October 25, 2013

A Proposal for Rsyslog State Variables

As was discussed in great lenghth on the rsyslog mailing list in October 2013, global variables as implemented in 7.5.4 and 7.5.5 were no valid solution and have been removed from rsyslog. At least in this post I do not want to summarize all of this - so for the details please read the mailing list archive.

Bottom line: we need some new facility to handle global state and this will be done via state variables. This posting contains a proposal which is meant as basis for discussion. At the time of this writing, nothing has been finalized and the ultimate solution may be totally different. I just keep it as blog posting so that, if necessary and useful, I can update the spec towards the final solution.

Base Assumptions
There are a couple of things that we assume for this design:
  • writing state variables will be a  very infrequent operation during config execution
  • the total number of state variables inside a ruleset is very low
  • reading occurs more often, but still is not a high number (except for read-only ones)
State variables look like the previous global variables (e.g. "$/var" or "$/p!var"), but have different semantics based on the special restrictions given below.

Restrictions provide the correctness predicate under which state vars are to be evaluated. Due to the way the rsyslog engine works (SIMD-like execution, multiple threads), we need to place such restrictions in order to gain good performance and to avoid a prohibitively amount of development work to be done. Note that we could remove some of these restriction, but that would require a lot of development effort and require considerable sponsorship.

  1. state variables are read-only after they have been accessed
    This restriction applies on a source statement level. For example,
    set $/v = $/v + 1;
    is considered as one access, because both the read and the write is done on the same source statement (but it is not atomic, see restriction #3). However,
    if $/v == 10 then
        set $/v = 0;
    is not considered as one access, because there are two statements involved (the if-condition checker and the assignment).
    This rule requires some understanding of the underlying RainerScript grammar and as such may be a bit hard to interpret. As a general rule of thumb, one can set that using a state variable on the left-hand side of a set statement is the only safe time when this is to be considered "one access".
  2. state variables do not support subtree access
    This means you can only access individual variables, not full trees. For example, accessing $/p!var is possible, but accessing $/p will lead to a runtime error.
  3. state variables are racy between diffrent threads except if special update functions are use
    State variable operations are not done atomic by default. Do not mistake the one-time source level access with atomicity:
    set $/v = $/v + 1;
    is one access, but it is NOT done atomically. For example, if two threads are running and $/v has the value 1 before this statement, the outcome after executing both threads can be either 2 or 3. If this is unacceptable, special functions which guarantee atomic updates are required (those are spec'ed under "new functions" below).
  4. Read-only state variables can only be set once during rsyslog lifetimeThey exist as an optimization to support the common  usecase of setting some config values via state vars (like server or email addresses).
Basic Implementation Idea
State variable are implemented by both a global state as well as message-local shadow variables. The global state holds all state variables, whereas the shadow variables contain only those variables that were already accessed for the message in question. Shadow variables, if they exist, always take precedency in variable access and are automatically created on first access.

As a fine detail, note that we do not implement state vars via the "usual" JSON method, as this would both require more complex code and may cause performance problems. This is the reason for the "no-subtree" restriction.

Data Structure
State vars will be held in a hash table. They need some decoration that is not required for the other vars. Roughly, a state var will be described by
  • name
  • value
  • attributes
    • read-only
    • modifyable
    • updated (temporary work flag, may be solved differently)
This is pseudo-code for the rule engine. The engine executes statement by statement.

for each statement:
   for each state var read:
      if is read-only:
         return var from global pool
         if is in shadow:
             return var from shadow pool
             read from global pool
             add to shadow pool, flag as modifiable
             return value
    for each state var written:
        if is read-only or already modified:
           emit error message, done
        if is not in shadow:
            read from global pool
            add to shadow pool, flag as modifiable
        if not modifieable:
           emit error message, done
        modify shadow value, flag as modified

at end of each statement:
   scan shadow var pool for updated ones:
       propagate update to global var space
       reset modified flag, set to non-modifiable
  for all shadow vars:
       reset modifiable flag

Note: the "scanning" can probably done much easier if we assume that the var can only be updated once per statement. But this may not be the case due to new (atomic) functions, so I just kept the generic idea, which may be refined in the actual implementation (probably a single pointer access may be sufficient to do the "scan").

The lifetime of the shadow variable pool equals the message lifetime. It is destrcuted only when the message is destructed.

New Statements
In order to provide some optimizations, new statements are useful. They are not strictly required if the functionality is not to be implemented (initially).
  • setonce $/v = value;
    sets read-only variable; fails if var already exists.
  • eval expr;
    evaluates an expression without the need to assign the result value. This is useful to make atomic functions (see "New Functions") more performant (as they don't necessarily need to store their result).
New Functions
Again, these offer desirable new functionality, but can be left out without breaking the rest of the system. But they are necessary to avoid cross-thread races.
  • atomic_add(var, value)
    atomically adds "value" to "var". Works with state variables only (as it makes no sense for any others).
This concludes the my current state of thinking. Comments, both here or on the rsyslog mailing list, are appreciated. This posting will be updated as need arises and no history kept (except where vitally important).

Special thanks to Pavel Levshin and David Lang for their contributions to this work!

No comments:

Busy at the moment...

Some might have noticed that I am not as active as usual on the rsyslog project . As this seems to turn out to keep at least for the upcomi...