Friday, July 25, 2008

What is an Event? And what an Event Log?

There is an interesting discussion going on on the loganalysis mailing list. It tries to define what some entities we deal with on a daily basis really mean. While most people involved in logging have an intuitive - and sufficiently close - idea of what they mean, it is interesting to see how much difference there is in actual thoughts.

I will reproduce my posts/definitions here in this blog entry. To save me some time, I more or less copy and paste my text to this page. I think it should be sufficiently self-contained to grasp the idea. I reproduce my thoughts for two reasons: first of all, I would like to have a handy reference to my view of the logging world (so that I can refer anyone who asks to this page). Secondly, I think it is useful for all those of you who dig a bit deeper into rsyslog, phpLogCon and the rest of the MonitorWare line. All of these softwares are build around the definitions I am giving here.

Please note that I may update this page when I write new content. This helps to keep things tidy together. For you, unfortunately, this means you need to check this page from time to time...

As always, feedback is highly appreciated. You may either comment on this post or mail me.

So here now comes the "meat":


I actually tend to agree that an event is *always* *associated* with a state change. The state change, however, may be subtle and not always obvious.

Let's take this post excerpt as an example:

##
some logs which don't indicate a state change such as login failures,
##

At first, a login failure seems to have not changed any state. After all, nobody was logged in before the event and nobody is logged in after it. HOWEVER, here our view is just not granular enough.

Let me take the position of the login machinery. That machine obviously has states for "awaiting login", "authentication requested", "auth success" and "auth failure" (among many others). What happens in the failed login case is that the machine transits between these states:

"awaiting login" --> "auth req" --> "auth failure" --> "awaiting login"

Obviously, state in this subsystem has changed three times. An upper level (e.g. a user process creator) may not have seen a state change, because the transitions did not request one. So at the upper level, we have no state change.

Depending on the granularity of logging we would like to have, we need to look at state changes at different levels. If I am interested in security logging, I most probably should look at the login machinery. And there, I see three transitions. As such, I may report three different events.

In a somewhat more abstract form, I even think an event is *always* bound to a state change. If we look closely enough at an entity, and this entity does not change state, so what can we then report? Is it noteworthy to report that entity x remains in the same state a it had a while ago? And even if so, couldn't one say that the state of x has changed in a subtle way - time needs to be considered part of the entities "state set".

With the "state set" I mean anything that makes up and influences the state of the entity. It includes all properties that differentiate between different states. The state set is obviously defined by the entity, but also by the needs of the observer (so the same entity may have different state sets if different observers look at it). For a given observer (use case) and a given entity, there is exactly one state set, consisting of mutually exclusive states and a set of properties that identify these states.

In this point of view, an event is actually another word for a state transition. If there is no state transition, there is no event. And why should there be one? Let's assume the use case requires tracking of (identical) states over time. As such, elapsed time is part of the state set. Now let's assume all other properties of the state set remain identical, but time elapses. Now we still have a state transition, because the state set changes between time "than" and time "now". This justifies an event. If, however, the use case does not require tracking of continuity, elapsed time is not part of the state set. In the same example above, we do now not have a state change (all other properties remain identical) and consequently no event happens. This may sound like loss of information, but we defined that the observer is not interested in elapsed time. As such, from the POV of this observer, nothing happened. So we are correct in not raising an event.

From this I conclude:

a) there is no such thing like an "absolute event" - an event is always relative to the needs of the observer
b) thus different observers may have different perception of what justifies raising an event
c) what the observer is interested in is defined in the "state set"
d) an event is generated when a change in the "state set" is detected

Trying (and probably failing) to get a short grip on this, I would say "An event is a change in observer-relevant properties of an entity". Which, of course leads to the need to define observer, observer-relevant, property and entity...

I haven't looked at CEE enough to know if there are definitions of these (or similar) terms. Actually, when I looked at CEE some very long time ago I have to admit it didn't look very appealing. But I now have subscribed to the mailing list and will see if I can contribute (or use ;)) to/from the work.

An interesting side-note of a) is that one may (try to) define a superset of state sets of all possible observers. Actually, I think this is what auditing people try to define when they design audit points. The knobs that trigger specific events provide the ability to limit the actual (running) state set to the subset that the configuring observer is interested in. And if you ever wanted to audit anything but did not find an audit point in the software for it, auditing designers made a wrong assumption on the required state superset ;)




> -----Original Message-----
> From: loganalysis-bounces@loganalysis.org [mailto:loganalysis-
> bounces@loganalysis.org] On Behalf Of Anton Chuvakin
> Sent: Friday, July 25, 2008 6:14 AM
> To: bschnzl@cotse.net
> Cc: loganalysis@loganalysis.org
> Subject: Re: [logs] How to define Log, Event, and Alert?
>
> Good point. So:
>
> Event = something that happened on a system.
> Log = a TIMED record of the above occurence.

If I may generalize things a bit...

I'd replace TIME by "sequence identifier", with sequence identifier defined as being something that is monotonically increasing. A timestamp is an object that we think to be a natural example of a monotonically increasing function. HOWEVER, if we look at existing technology, this is not always the case. In fact, it is more often NOT the case than it is... If we have two systems a and b and these systems do not have time synchronized, and have a system c which is the event collector (and collects only events from a and b), then c may record time stamps inside its log that do not monotonically increase. For example, it may record:

02:00:00 event a1
01:00:00 event b1
02:01:00 event a2
01:01:00 event b2

Of course, this is still a TIMED record of occurrences. However, in this sense this is used, "TIMED" includes a sense of temporal order (at least to me). In the above log, we may not have the correct temporal order. We may be able to reconstruct it by sorting on the timestamp. That would be a valid approach if the timestamps are indeed correct (compared to universal time). But if a and/or b has incorrect time, we would create a wrong temporal order. Indeed, in this sense the monotonically increasing identity of the log in question would actually not be the timestamp but rather the *sequence of recording*, kind of a meta-property not directly contained in the property set of the individual event record (but rather obtained by its relationship to its predecessor in the log file).

Now let's assume a log without a timestamp. These things happens, e.g. in debug logs (and all too often in others I have seen).

If we define

> Log = a TIMED record of the above occurence.

such a "log" would obviously not be a log, because it does not fulfill the requirement to be timed.

If a log instead is "a record of events with a sequence identifier", that problem does not exist. The sequence identifier in that case would be the derived property I mentioned above.

The question remains if such a definition is actually useful. The sequence identifier is obviously something with very vague semantics. They depend on the observer as well as the correctness of the "sequence identifier generating function" on all systems in question.

Let's get back to the simple case of timestamps: as outlined above, the semantics of timestamps depend on time sync. Even if there is ntp timesync, timestamps (with reasonable precision) are always questionable. They are approximate, even on the same system. With standard syslog timestamps (second precision!) the problem is easy to see: one may receive hundreds of events within the same second. So even if time is correct, an observer is unable to detect any order of events. If looking just at the timestamps, one must conclude that all events happened at once. If looking at the semantics of the messages, one most often also can conclude this is impossible (e.g. how to delete a file before it is created?). Obviously, the timestamp alone is never sufficient to detect order of events, even on a single system. Granted, for practical purposes a high resolution timestamp (with good time synchronization) is most often a sufficiently well approximation of the time an event happened. But do you really trust it? ...always? Have a look at your own correlation engines: do they work on pure timestamps - or do they include some other properties, like the order of event log records inside the log?

Now let me try to define what I think a log actually is:

An EVENT is a set of properties that describe a state change (in the sense I have described state change yesterday). The contents of this set is depending on the entity who's state changes as well as on the observer. [so it may actually be a set of two sets: entity-related properties and observer-related properties]

An event is generated when a state changes.

An EVENT RECORD is the physical representation of an event inside an event log. It is a set of (at least) two sets: the event's set itself as well as a set of meta-properties which describe the event record (e.g. it's size, it's order inside an event log record, it's format, ...). I think this has a tendency to contain observer-related properties, too. Some Meta-properties are obviously event log related (like sequence of recording).

Finally, an (EVENT) LOG is a set of event records. Please note that according to this definition, the stream of syslog messages flowing from system a to system b is also an event log - not just log files or database tables. So an event log does not necessarily need to be persistend.

There is no inherent order inside event logs. In practice we experience a "natural order" (one record begins before another), but that actually is a meta-property of the event record. So we can order event records based on their meta-properties. It just happens that a text log is physically ordered by the sequence meta property.

[side-note: if we look at a UDP based transmission of an event log, we notice that the sender's perception of the event log is different from the receiver's: the physical order of event log records may be different, as well as their absolute number (if UDP discards messages)]

Finally, different log files, database tables, log stores in general are just partitions of the event log set. This also explains why we need to merge different log files if we would like to have a broader view of events that happened inside our system: We need to re-unit the partitions that contain things of interest for our needs so that we build a sufficiently large set for our analysis algorithm (that set being a complete partition of the log set in regard to what we are interested in).

[side-note: if we need to unite such sets, we often experience the problem that we do not have a truly identifying function for event log records, which makes it often really hard to create real supersets, where no two elements are the same]

Again, I see that my definitions are not really one-liners, but I think if we try to use too-simple definitions, we end up incorrectly describing the objects we are talking about. That will most probably lead us to wrong conclusions, and, in a very practical sense, to a) wrong implementations and b) wrong actions taken on the conclusions we think we have drawn (but were invalid).

Rainer

No comments: