Thursday, May 29, 2008

why you can't build a reliable TCP protocol without app-level acks...

Again, there we are in the area of a reliable plain TCP transport. Some weeks ago, I noted down that plain tcp syslog is unreliable by design. Yesterday, Martin Schütte came up with his blog post "Reliable TCP Reconect made Easy". In it, he describes how he thinks one can get around the limitations.

I was more than interested. Should there be a solution I had overlooked? Martin's idea is that he queries the TCP stack if the connection is alive before sending any data. He outlined two approaches with the later being a non-blocking recv() right in front of the send(). The idea is that the recv() should detect a broken connection.

After I thought a bit about this approach I had a concern that it may be a bit racy. But in any case, new thoughts are worth evaluating. And a solution would be most welcome. So I quickly implemented some logic in my existing gtls stream driver. To make matter simple, I just did the recv() and sent the return status and errno to debug output (but did not try any reconnects based on it). And then the ugly happened: I always got the same EAGAIN return state (which is not an error), no matter in what state the connection was. I was even able to pull the receiver's network cable and the sender didn't care.

So, this approach doesn't work. And, if you think a bit more about it, it comes at no surprise.

Consider the case with the pulled network cable. When I plugged it in again a quarter hour later, TCP happily delivered the "in-transit" messages (that were sitting in the client buffer) after a short while. This is how TCP is supposed to work! The whole point is that it is designed to survive even serious network failures. This is why the client buffers messages in the first place.

What should the poor client do in the "pulled network cable" case. Assume the data is lost just because it can not immediately send it? To make it worse, let's assume the data had already left the system and successfully reached the destination machine. Now it is sitting in the destination's receive buffer. What now if the server application (for example due to a bug) does not pull this data but also does not abort or close the file descriptor? How at all should TCP detect these failures? The simple truth is it isn't and it is not supposed to be.

The real problem is the missing application level acknowledgment. The transport level ACK is for use by TCP. It shall not imply anything for the app layer. So if we depend on the TCP ack for an app-level protocol, we abuse TCP IMHO. Of course, abusing something may be OK, but we shouldn't wonder if it doesn't work as we expect.

Back to the proposed solution: the problem here is not that the send call does not fail, even though the stack knows the connection is broken. The real problem is that the TCP stack does not know it is broken! Thus, it permits us to send data on a broken connection - it still assumes the connection is OK (and, as can be seen in the plugged cable case, this assumption often is true).

As such, we can NOT cure the problem by querying the TCP stack if the connection is broke before we send. Because the stack will tell us it is fine in all those cases where the actual problem occurs. So we do not gain anything from the additional system call. It just reflects the same unreliable status information that the send() call is also working on.

And now let's dig a bit deeper. Let's assume we had a magic system call that told us "yes, this connection is fine and alive". Let's call it isalive() for easy reference. So we would be tempted to use this logic:

if(isalive()) then
send()
else
recover()

But, as thread programmers know, this code is racy. What, if the connection breaks after the isalive() call but before the send()? Of course, the same problem happens! And, believe me, this would happen often enough ;).

What we would need is an atomic send_if_alive() call which checks if the connection is alive and only then submits data. Of course, this atomicity must be preserved over the network. This is exactly why databases invented two-phase commits. It requires an awful lot of well thought-out code ... and a network protocol that supports it. To ensure this, you need to shuffle at least two network packets between the peers. To handle it correctly (the case where the client dies), you need four, or a protocol that works with delayed commits (as a side-note, RELP works along those lines).

Coming back to our syslog problem, the only solution to solve our unreliability problem without specifying app-layer acks inside syslog, is to define a whole new protocol that does these acks (quite costly and complex) out of band. Of course, this is not acceptable.

Looking at all of this evidence, I come to the conclusion that my former statement unfortunately is still correct: one can not implement a reliable syslog transport without introducing app-level acks. It simply is impossible. The only cure, in syslog terms, is to use a transport with acks, like RFC 3195 (the unbeloved standard) or RELP (my non-standard logging protocol). There are no other choices.

The discussion, however, was good insofar that we now have generally proven that it is impossible to implement a reliable TCP based protocol without application layer acknowledgment at all. So we do not need to waste any more effort on trying that.

As a funny side-note, I just discovered that I described the problem we currently see in IETF's transport-tls document back than on June, 16, 2006, nearly two years ago:

http://www.ietf.org/mail-archive/web/syslog/current/msg00994.html

Tom Petch also voiced some other concerns, which still exist in the current draft:

http://www.ietf.org/mail-archive/web/syslog/current/msg00989.html

As you can see, he also mentions the problem of TCP failures. The idea of introducing some indication of a successful connection was quickly abandoned, as it was considered too complex for the time being. But as it looks, plain technical fact does not go away by simply ignoring it ;)

UPDATE, half an hour later...

People tend to be lazy and so am I. So I postponed to do "the right thing" until now: read RFC 793, the core TCP RFC that still is not obsoleted (but updated by RFC3168, which I only had a quick look at because it seems not to be relevant to our case). In 793, read at least section 3.4 and 3.7. In there, you will see that the local TCP buffer is not permitted to consider a connection to be broken until it receives a reset from the remote end. This is the ultimate evidence that you can not build a reliable syslog infrastructure just on top of TCP (without app-layer acks).

more reliability for TCP syslog?

Martin Schütte posted an interesting approach to solving the syslog/plain TCP unreliability issue in his blog:

Reliable TCP Reconect made Easy

In short, he tries to do a non-blocking recv() from the connection to see if the remote peer has shut it down. This may work and I will give it a try. However, as of my understanding it will NOT solve the issue of unreliability because of broken connections. I have to admit that I also think there is still a race condition (what if the server closes the connection after the client has done the recv() but before the send()...

I'll report back as soon as I have some real-life data. It's an interesting approach in any case and good to know somebody else is working on the same issues. That will hopefully bring us to a better overall solution :)

Wednesday, May 28, 2008

rsyslog work log

Yesterday's rsyslog work log:
2008-05-27
- client now provides cert even if it is not signed by one of the
server's trusted CAs (gtls)
- implemented wildcards inside certificate name check authentication
- released 3.19.4

Tuesday, May 27, 2008

syslog-transport-tls-12 implementation report

I have finally managed to fully implement IETF's syslog-transport-tls-12 Internet draft plus some new text suggested to go into -13 (which is not yet out) in rsyslog. Please note that I am talking about actual software that you can download, install, run and even look at the source. So this is not a theoretical "what if" type of report but one of real practical experience.

I have roughly worked the past three weeks on the new -12 version of transport-tls. First of all, it is important to keep in mind that I already had implemented the -11 version (minus the then-unclear authentication) in rsyslog 3.19.0. That meant I had just to implement the new authentication stuff. This was obviously a major time-saver.

The current implementation utilizes the GnuTLS library for all TLS operations. I would like to thank the GnuTLS folks for all their help they provided on the mailing list. This was extremely useful. GnuTLS in rsyslog works as a "network stream driver" and can theoretically be replaced with other libraries (support for at least NSS is planned). For obvious reasons, this implementation report includes a number of GnuTLS specifics.

It is not exactly specified whether a syslog message traveling over -transport-tls must strictly be in syslog-protocol format or not. This may lead to interoperability problems. For rsyslog, I have implemented that any message format is accepted. Any message received is simply fed into the general parser-selector, which looks at the message format and selects the most appropriate parser. However, this may not really be desirable from a security point of view. When sending, rsyslog also does not demand anything specific. Due to rsyslog design, message creation and transmission are quite separate parts. So even if the draft would demand -syslog-protocol format, I would not be able to enforce that in rsyslog (it would break too many application layers). Of course, rsyslog supports -syslog-protocol format, but it requires the proper template to be applied to the send rule.

Rsyslog implements even most optional features. However, I have not implemented IP-address-based authentication, which is a MUST in Joe's new proposed text (section 5.1). The reason is that we found out this option is of very limited practical experience. IP addresses are very seldomly found in certificates. Also, there are ample ways to configure rsyslog in client role so that it knows the server's identity. This was also brought up on the IETF syslog mailing list and it looks like this feature will be dropped. Should it be actually survive and go into the final standard, I will implement it, even though I do not see any use in practice. Thus I have deferred implementation until it is actually needed. Rsyslog user feedback may also show if there is a need for this feature in practice.

Each rsyslog instance is expected to have one certificate identifying it. There can be different certificates for different senders and receivers, but this is considered the unusual case. So in general, a single certificate identifies the rsyslog instance both as a client and server.

Rsyslog support the three authentication modes laid out in the draft: anonymous, fingerprints and subject names. Obviously, anonymous authentication is easy to do. This was a quick task without any problems.

Fingerprint authentication was somewhat problematic to implement. The core problem was that GnuTLS, by default, sends only those certificates to the server that are in the server's trusted CA list. With self-signed certs on both the client and the server, this is never the case and GnuTLS does not provide any certificate at all. I used kind of a hack to get around this. There is a function in GnuTLS that permits to provide certificates on an as-needed basis. I used this hook. However, I now no longer have the ability to provide only those certificates a server can verify. When I have multiple certificate stores and the server is in subject name authentication mode, this would be valuable. So far, I have ignored this problem. If practice shows it needs attention, I will further investigate. But here is definitely a potential future trouble spot. A core problem is that a sender does not (should not need to) know if the receiver is using fingerprint or subject name authentication. For the later, the GnuTLS defaults are quite correct and provide a very convenient interface. But I can not select different modes on the client as I do not know which one is right.

Subject name based authentication did not pose any such problems. This comes at no surprise, because this is the the usual mode of operations for the TLS library. One can assume this to be best-tested.

One disappointment with GnuTLS was that during the TLS handshake procedure only basic authentication can be done. Most importantly, there is no hook that enables an application to check the remote peer's certificate and authorize it or deny access during the handshake. Authorization can only be done after the handshake has completed. Form asking around, NSS seems to provide this ability. OpenSSL, on the other hand, seems NOT to provide that hook, too (I could not verify that, though). As such, rsyslog needs to complete the handshake and then verifies fingerprint's or validates the certificate chain, expiration dates and checks the subject name. If these checks show that we are not permitted to talk to the peer, all we can do is close the connection.

If a client is connecting to a server, this is a minor annoyance, as a connection is created and dropped. As we can not communicate the reason why we close the connection, the server is left somewhat clueless and currently logs a diagnostic warning of a freshly created connection being immediately closed. I will probably change that diagnostic in the future.

Quite more problematic is the case when a server fails to authenticate the client. Here, the client received the handshake and already begun to send data when the server closes the connection. As there is no application level acknowledgment in transport-tls, the client does not know when exactly the connection is closed by the server. In the end result the client experiences message loss and may even not notice the failed connection attempt until much later (in most cases, the first message is always successfully sent and only the second message, possible hours later, will see a problem). In the end result, this can lead to massive data loss, even to complete data loss. Note that this is not a direct cause of transport-tls, but of the underlying plain TCP syslog protocol. I have more details in my blog post on the unreliability of TCP syslog.

Please note that -transport-tls does not specify when peer authentication has to happen. It may happen during the handshake but it is also valid to do it after the handshake. As we have seen, doing it after the handshake causes serious problems. It may be good to at least mention that. If the draft is changed to mandate authentication during the handshake, some implementors will probably not implement it, because the library they use does not support it. Of course, one could blame the library, but for existing products/projects, that will probably not help.

The need to authenticate during the handshake is a major problem for anyone implementing -transport-tls. For rsyslog and for now, I have decided to live with the problem, because I do have the unreliability problem in any case. My long-term goal is to switch to RELP to address this issue and provide TLS support for RELP (RELP uses app-level acks, so there is no problem with authenticating after a successful handshake - I can still emit an "authentication failed" type of message). Please note that the transport-tls specific problem only occurs if the remote client fails to authenticate - this is what make it acceptable to me. I expect this situation to be solved quickly (either something is misconfigured or an attack is going on in those cases).

As a side-note, I may see if I can provide a patch for GnuTLS if this turns out to become a major problem.

Besides implementing the required methods, I have also thought about how to create a sufficiently secure system with the least possible effort.

In home environments where the "administrator" has little or no knowledge and uses rsyslog to receive message from a few low-end devices (typically a low-end router), it is hard to think of any good security settings. Most probably, anonymous "authentication" is the best choice here. It doesn't protect against man-in-the-middle attacks, but it at least provides confidentiality for messages in transit. The key point here is that it does not require any configuration except for enabling TLS and specifying the syslog server's address in the device GUI.

Another good alternative for these environments may actually be auto-generating a self-signed cert on first rsyslogd startup. This is based on the assumption that the device GUI provides a way to view and authorize this certificate (after it has talked to the server and obtained the cert9). However, I have to admit that I see only limited advantage in implementing this. After all, if the admin is not able to configure things correctly, do we really expect him to be able to interpret and sufficiently frequently review the system logs? I'd say this is at least doubtful and so I prefer to put my implementation efforts to better uses...

The anticipated common use case for rsyslog is an environment where the administrator is at least knowledgeable enough to carry out some basic configuration steps and create certificates if instructed on which tools to run. We do not assume that a full PKI infrastructure is present. As such, we suggest that each organization creates its own CA for rsyslog use. This involves creating one root CA certificate. That certificate is then used to create certificates for each instance of rsyslog that is to be installed. There is one instance per machine. To keep configuration simple, each machine's DNS name is to be used.

All clients shall forward via a @@hostname action, where hostname must be the actual DNS name (as specified in the certificate) and not an IP address or something else. To prevent DNS failures or unavailability of DNS during startup, this name and its IP address may be set in /etc/hosts. With that configuration, the client can validate the server's identity without any extra configuration.

To achieve a similar automatic identity check on the server side (server authenticating client), subject name wildcards are used. It is suggested that all syslog client are within the same domain. Then, the server can be instructed to accept messages from all of them with a single configuration setting enabling message reception from e.g. "*.example.com". This, together with the fact that the certificate must have been signed with the "rsyslog root CA"'s certificate provides sufficient proof of identification in most cases.

In more complex scenarios, more complex authentication can be used. There will be no specific guidelines within the rsyslog documentation on other policies. It is assumed that those who have need for such complex policies know what they need to have, so there is no point in providing advise. From the engine point of view, rsyslog already provides for many advanced uses (e.g. different certificate stores for different sessions) and can easily extended to provide for others. As of my understanding, the latest text proposed by Joe permits me to do all of this under the umbrella of -transport-tls, so the draft is no limiting factor.

The bottom line is that an enterprise-specific rsyslog root CA provides quite automatic configuration of peer credentials while being easy to implement. Wildcard subject name matches play a vital role, because they are the only way to permit a server with the ability to authorize a wide range of clients in a semi-automatic manner.

IMO, subject name based authentication is easier to setup than fingerprint authentication, at least in a rsyslog-to-rsyslog case. If it is easier to setup in a heterogeneous environment depends on the ability of all peers to either generate certificate requests and accept the certificate and/or import prefabricated .pem files. If that is simple enough, subject name based authentication can be used with very low administrative overhead (but integrating it into a full-blown PKI is still another thing...).

To really prove the implementation, of course, at least one other independent implementation is needed. Currently there is none, but as it looks NetBSD's syslogd will be enhanced as a Google Summer of Code project. I am keeping an eye on that project and will try to do interop testing as soon as such is possible. Having one implementation from the "device camp" (e.g. a router) would be extremely useful, though, as that would provide more insight on how easy it will be to configure things via such an administrative interface (not in theory, but in actual implementation - I expect a difference between the two as there are always constraints that must be observed, like the overall application framework and programming tool set).

To wrap things up, -syslog-transport-tls-12+ is sufficiently easy to implement and deploy. IMHO it also provides sufficient extensibility to implement complex scenarios. Some details could be improved (when to authenticate, message format) and a decision on IP based authentication should be finalized. But I don't see any reason to hold it much longer and look forward to it being finalized.

-transport-tls-12+ text proposal

Joe, the current editor of -transport-tls, provided some suggested new text. I'll call it 12+ and post it here for easy reference (I've too often searched the mail archive to pull it up, so I think it is time to post it at some easier place). Other than an aid to me, you may also be interested to see how things are progressing. All in all, I'd say we are on the right path. Also, rsyslog does now everythig mandated in 12+. I am currently looking into wildcards, this seems to be neat for easy authentication of many senders.

Here comes Joe's text and the message that went along with it:

I reworked some of the text to try to capture the discussions in the
working group. I broke out the mechanical part of the validation from
the policy. There is some redundancy between the security
considerations section and the new policy section. I tried to focus the
requirements language on implementation requirements to enable secure
interoperability vs. deployment options. We are not finished yet, but I
think it is a step in the right direction.

Cheers,

Joe

4.2.1 Certificate-Based Authentication

Both syslog transport sender (TLS Client) and syslog transport receiver
(TLS server) MUST implement certificate-based authentication. This
consists validating proof of possession of the private key corresponding
to the public key in the certificate. To ensure interoperability
between clients and servers, the following methods for certificate
validation are mandatory to implement:

o Certificate path validation: the client is configured with one or
more trust anchors. Additional policy controls needed for authorizing
the syslog transport sender and syslog transport receiver are described
in Section 5. This method is useful where there is a PKI deployment.

o End-Entity Certificate Matching: The transport receiver or
transport sender is configured with information necessary to match the
end-entity certificates of its authorized peers (which can be
self-signed). Implementations MUST support certificate fingerprints in
section 4.2.3 and MAY allow other formats for end-entity certificates
such as a DER encoded certificate. This method provides an alternative
to a PKI that is simpler to deploy and still maintains a reasonable
level of security.

Both transport receiver and transport sender implementations MUST
provide a means to generate a key pair and self-signed certificate in
the case that a key pair and certificate are not available through
another mechanism.

4.2.2 Certificate Fingerprints

Both client and server implementations MUST make the certificate
fingerprint for their certificates available through a management
interface.

The mechanism to generate a fingerprint is to take the hash of the
certificate using a cryptographically strong algorithm and convert the
result into colon separated, hexadecimal bytes, each represented by 2
uppercase ASCII characters. When a fingerprint value is displayed or
configured the fingerprint is prepended with an ASCII label identifying
the hash function followed by a colon. Implementations MUST support
SHA-1 as the hash algorithm and use the ASCII label "SHA1" to identify
the SHA-1 algorithm. The length of a SHA-1 hash is 20 bytes and the
length of the corresponding fingerprint string is 64 characters. An
example certificate fingerprint is:

SHA1:E1:2D:53:2B:7C:6B:8A:29:A2:76:C8:64:36:0B:08:4B:7A:F1:9E:9D


During validation the hash is extracted from the fingerprint and
compared against the hash calculated over the received certificate.

[sections skipped]

5. Security Policies

Different environments have different security requirements and
therefore would deploy different security policies. This section
provides discusses some of the security policies that may be implemented
by syslog transport receivers and syslog transport senders. The
security policies describe the requirements for authentication,
credential validation and authorization. The list of policies in this
section is not exhaustive and other policies may be implemented.

5.1 Recommended Security Policy

The recommended security policy provides protection against the threats
in section 2. This policy requires authentication, certificate
validation and authorization of both the syslog transport sender and
syslog transport receiver. If there is a failure in the
authentication, certificate validation or authorization then the
connection is closed.

Authorization requires the capability to authorize individual hosts as
transport receivers and transport senders. When end-entity certificate
matching is used, authentication and certificate validation are
sufficient to authorize and entity. When certificate path validation
MUST support the following authorization mechanisms:

o Host-name-based authorization where the host name of the
authorized peer is compared against the subject fields in the
certificate. For the purpose of interoperability, implementations MUST
support matching the host name against a SubjectAltName field with a
type of dNSName and SHOULD support checking hostname against the Common
Name portion of the Subject Distinguished Name. Matching for
certificate credentials is performed using the matching rules specified
by [3]. If more than one host name identity is present in the
certificate a match in any one of the set is considered acceptable.
Implementations also MAY support wildcards to match a range of values.
For example, names to be matched against a certificate may contain the
wildcard character * which is considered to match any single domain name
component or component fragment. E.g., *.a.com matches foo.a.com but
not bar.foo.a.com. f*.com matches foo.com but not bar.com. Wildcards
make it possible to deploy trust-root-based authorization where all
credentials issued by a particular CA trust root are authorized.

o IP-address-based authorization where the IP address configured
for the authorized peer is compared against the subject fields in the
certificate. Implementations MUST support matching the IP address
against a SubjectAltName field of type iPAddress and MAY support
checking the configured IP address against the Common Name portion of
the Subject Distinguished Name. Matching for certificate credentials is
performed using the matching rules specified by [3]. If more than one
IP Address identity is present in the certificate a match in any one of
the set is considered acceptable.

Implementations MAY also support authorization based on other
attributes. For example, the authorization of a device Serial Number
against the SerialNumber portion of the Subject Distinguished Name or
restrictions on the depth of a certificate chain.

Implementations MUST support this policy and it is recommended that this
be the default policy.

5.2 Liberal Validation of a Syslog Transport Sender

In some environments, the authenticity of syslog data is not important
or it is verifiable by other means, so transport receivers may accept
data from any transport sender. To achieve this, the transport receiver
performs authentication and certificate consistency checks and forgoes
the validation of the certificate chain and authorization. In this
case, the transport receiver is authorized, however this policy does not
protect against the threat of transport sender masquerade described in
Section 2. The use of this policy is generally not recommended for this
reason. If this policy is used, the transport receiver SHOULD record
the end-entity certificate for the purpose of correlating it with the
sent data.

5.3 Liberal Validation of a Syslog Transport Receiver

In some environments the confidentiality syslog data is not important so
data may be sent to any transport receiver. To achieve this the
transport sender performs authentication certificate consistency checks
and forgoes validation of the certificate chain and authorization.
While this policy does authorize the transport sender, it does not
protect against the threat of transport receiver masquerade described in
Section 2, leaving the data sent vulnerable to disclosure and
modification. The use of this policy is generally not recommended for
this reason.

5.4 Liberal Syslog Transport Receiver and Sender Validation

In environments where security is not a concern at all the transport
receiver and transport sender authenticate each other and perform
certificate consistency checks and may forgo validation of the
certificate chain and authorization. This policy does not protect
against any of the threats described in section 2 and is therefore not
recommended.

6. Security Considerations

6.1 Deployment Issues

Section 5 discusses various security policies that may be deployed. The
only configuration that mitigates the threats described in Section 2 is
the recommended policy defined in section 5.1. This is the recommended
configuration for deployments.

If the transport receiver chooses not to fully authenticate, validate
and authorize the transport sender it may receive data from an attacker.
Unless it has another way of authenticating the source of the data, the
data should not be trusted. This is especially important if the syslog
data is going to be used to detect and react to security incidents. The
transport receiver may also increase its vulnerability to denial of
service, resource consumption and other attacks if it does not
authenticate the transport sender. Because of the increased
vulnerability to attack, this type of configuration is not recommended.

If the transport sender chooses not to fully authenticate, validate and
authorize the syslog transport receiver then it may send data to an
attacker. This may disclose sensitive data within the log information
that is useful to an attacker resulting in further compromises within
the system. If a transport sender operates in this mode it should limit
the data it sends to data that is not valuable to an attacker. In
practice this is very difficult to achieve, so this type of
configuration is not recommended.

Forgoing authentication, validation and/or authorization on both sides
allows for man-in-the-middle, masquerade and other types of attacks that
can completely compromise integrity and confidentiality of the data.
This type of configuration is not recommended.

6.2 Cipher Suites

[I think the mandatory to implement algorithm should be defined in
section 4.2 instead of the security considerations section]

rsyslog work log

Yesterday's rsyslog work log:
2008-05-26
- improved gtls error reporting
- added capability to auto-configure tls auth rule for client
connecting to server: must match hostname in send action
- changed fingerprint gtls auth mode to new format fingerprint
- added gtls name authentication based on common name (inside DN)
- added certificate validity date check (gtls)
- finally protected gtlsStrerror by a mutex

Monday, May 26, 2008

rsyslog work log

Past day's rsyslog work log:
2008-05-22
- added x509/name authentication (so far based on dnsName only)
- change config directive name to reflect different use
$ActionSendStreamDriverCertFingerprint is now
$ActionSendStreamDriverPermittedPeer and can be used both for
fingerprint and name authentication (similar to the input side)
2008-05-23
- updated TLS documentation with HOWTO on certificate generation

Thursday, May 22, 2008

rsyslog work log

Yesterday's rsyslog work log:
2008-05-21
- re-enabled anon mode (failed if client did not provide cert)
- added new transport auth methods to doc set
- bugfix: default syslog port was no longer used if none was
configured. Thanks to varmojfekoj for the patch
- bugfix: missing linker options caused build to fail on some
systems. Thanks to Tiziano Mueller for the patch.
- released 3.19.3 (with fingerprints)
- implemented x509/certvalid "authentication"
- added functionality to display invalid certificates

Wednesday, May 21, 2008

rsyslog work log

I have gone crazy the past days with all the subtleties of the the new -12 revision of IETF's syslog-transport-tls draft. I've not coded much and even forgotten to post here on the blog. After all, I at least now got fingerprint support (mostly) up and running - much more work than I initially though. The good thing is I learned quite a lot :)

So here is the past day's rsyslog work log, as far as I have it:
2008-05-09 to 2005-05-16
Somehow I lost track of the work done - see git log for details... :(
2005-05-17
- made action logic pass optional auth params only if they are
actually configured
- added new authMode and Fingerprint methods to ptcp netstream
driver (keeping them once again generic)
- added diagnostics messages when invalid auth modes were
configured
2008-05-19
- corrected fingerprint string formatting
- improved/added error messages
- first implementation of server-based client fingerprint check
- implemented permittedPeers helper construct to store names
- changed omfwd implementation to use new permittedPeers
2008-05-20
- worked hard on fingerprint auth, but can not seen in code
(lots of mailing list work and spec review)

Friday, May 09, 2008

more on syslog TLS, policies and IETF efforts...

I am still working hard on TLS support, but on the design level. Here, I'd like to reproduce a message I sent to the IETF syslog WG's mailing list. It outlines a number of important points when it comes to practical use of TLS with syslog.

I am quite happy that syslog-transport-tls got new momentum right at the time when I finished my TLS implementation in rsyslog and turned into fine-tuning it. The IETF discussion on authentication and policies actually is touching right those places where it in practice really hurts. For the initial TLS implementation, I decided to let rsyslog work in anonymous mode, only. Iit was clear that -transport-tls section 4 as in version 11 would not survive - just as we have now seen).

The next steps in rsyslog are to enable certificate based access policies and this is exactly what the IETF discussion is focusing on. Of course, I try to finish design and try to affect the standard in a positive way, so that the rsyslog implementation can both be standards-compliant and useful in practice.

And now - have an interesting read with my mailing list post. Feedback is highly appreciated.

Rainer


Hi all,

I agree to Robert, policy decisions need to be separated. I CC Pasi because my comment is directly related to IESG requirements, which IMHO cannot be delivered by *any* syslog TLS document without compromise [comments directly related to IESG are somewhat later, I need to level ground first].

Let me tell the story from my implementor's POV. This is necessarily tied to rsyslog, but I still think there is a lot of general truth in it. So I consider it useful as an example.

I took some time yesterday to include the rules laid out in 4.2 into rsyslog design. I quickly came to the conclusion that 4.2. is talking about at least two things:

a) low-level handshake validation
b) access control

In a) we deal with the session setup. Here, I see certificate exchange and basic certificate validation (for example checking the validity dates). In my current POV, this phase ends when the remote peer can positively be identified.

Once we have positive identification, b) kicks in. In that phase, we need to check (via ACLs) if the remote peer is permitted to talk to us (or we are permitted to talk to it). Please note that from an architectural POV, this can be abstracted to a higher layer (and in rsyslog it probably will). For that layer, it is quite irrelevant if the remote peer's identity was obtained via a certificate (in the case of transport-tls), a simple reverse lookup (UDP syslog), SASL (RFC 3195) or whatever. What matters is that the ACL engine got a trusted identify from the transport layer and verifies that identity [level of trust varies, obviously]. Most policy decisions happen on that level.

There is some grayish between a) and b). For example, I can envision that if there is a syslog.conf rule (forward everything to server.example.net)

*.* @@server.example.net

The certificate name check for server.example.net (using dNSName extension) could probably be part of a) - others may think it is part of b).

Also, even doing a) places some burden onto the system, like the need to have trust anchors configured in order to do the validation. This hints at at least another sub-layer.

I think it would be useful to spell out these different entities in the draft.


Coming back to policy decisions, one must keep in mind that the IESG explicitly asked for those inside the document. This was done based on the -correct- assumption that today's Internet is no longer a friendly place. So the IESG would like to see a default policy implemented that provides at least a minimum acceptable security standard. Unfortunately, this is not easy to do in the world of syslog. For the home users, we cannot rely on any ability to configure something. For the enterprise folks, we need to have defaults that do not get into their way of doing things [aka "can be easily turned off"]. There is obviously much in between these poles, so it largely depends on the use case. I have begun a wiki page with use cases and hope people will contribute to it. It could lead us to a much better understanding of the needs (and the design decisions that need to be made to deliver these). It is available at

http://wiki.rsyslog.com/index.php/TLS_for_syslog_use_cases

After close consideration, I think the draft currently fails on addressing the two use cases define above properly. Partly it fails because it is not possible under the current IESG requirement to be safe by default. We cannot be fully safe by default without configuration, so whatever we specify will fail for the home user.

A compromise may be to provide "good enough" security in the default policy. I see two ways of doing that: one is to NOT address the Masquerade and Modification threats in the default policy, just the Disclosure threat. That leads us to unauthenticated syslog being the default (contrary to what is currently implemented) [Disclosure is addressed in this scenario as long as the client configs are not compromised, which I find sufficiently enough - someone who can compromise the client config can find other ways to get hold of the syslog message content].

An alternative is to use the way HTTPS works: we only authenticate the server. To authenticate, we need to have trusted certificate inside the server. As we can see in HTTPS, this doesn't really require PKI. It is sufficient to have the server cert singed by one of few globally trusted CAs and have this root certificates distributed with all client installations as part of their setup procedure. This is quite doable. In that scenario, a client can verify a server's identity and the above sample (*.* @server.example.net) could be verified with sufficient trust. The client, however, is still not authenticated. However, the threats we intended to address are almost all addressed, except for the access control issue which is defined as part of the Masquerade threat (which I think is even a different beast and deserves its own threat definition now that I think about it). In short we just have an access control issue in that scenario. Nothing else.

The problem, however, is that the server still needs a certificate and now even one that, for a home user, is prohibitively expensive. The end result will be that people turn off TLS, because they neither know how to obtain the certificate nor are willing to trade in a weekend vacation for a certificate ;) In the end result, even that mode will be less useful than anonymous authentication.

The fingerprint idea is probably a smart solution to the problem. It depends on the ability to auto-generate a certificate [I expressed that I don't like that idea yesterday, but my thinking has evolved ;)] OR to ship every device/syslogd with a unique certificate. In this case, only minimal interaction is required. The idea obviously is like with SSH: if the remote peer is unknown, the user is queried if the connection request is permitted and if the certificate should be accepted in the future. If so, it is added permanently to the valid certificate store and used in the future to authenticate requests from the same peer. This limits the security weakness to the first session. HOWEVER, the problem with syslog is that the user typically cannot be prompted when the initial connection happens (everything is background activity). So the request must actually be logged and an interface be developed that provides for user notification and the ability to authorize the request.

This requires some kind of "unapproved certificate store" plus a management interface for it. Well done, this may indeed enable a home user to gain protection from all three threats without even knowing what he really does. It "just" requires some care in approving new fingerprints, but that's a general problem with human nature that we may tackle by good user interface desig but can't solve from a protocol point of view.

The bad thing is that it requires much more change to existing syslogd technology. That, I fear, reduces acceptance rate. Keep in mind that we already have a technically good solution (RFC 3195) which miserably failed in practice due to the fact it required too much change.

If I look at *nix implementations, syslogd implementers are probably tempted to "just" log a message telling "could not accept remote connection due to invalid fingerprint xx:xx:..." and leave it to the user to add it to syslog.conf. However, I fear that for most home setups even that would be too much. So in the end effect, in order to avoid user hassle, most vendors would probably default back to UDP syslog and enable TLS only on user request.

From my practical perspective this sounds even reasonable (given the needs and imperfections of the real world...). If that assessment is true, we would probably be better off by using anonymous TLS as the default policy, with the next priority on fingerprint authentication as laid out above. A single big switch could change between these two in actual implementations. Those users that "just want to get it running" would never find that switch but still be somewhat protected while the (little) more technically aware can turn it to fingerprint authentication and then will hopefully be able to do the remaining few configuration steps. Another policy is the certificate chain based policy, where using public CAs would make sense to me.

To wrap it up:

1. I propose to lower the default level of security
for the reasons given.
My humble view is that lower default security will result in higher
overall security.

2. We should split authentication policies from the protocol itself
... just as suggested by Robert and John. We should define a core
set of policies (I think I described the most relevant simple
cases above, Robert described some complex ones) and leave it
others to define additional policies based on their demand.

Policies should go either into their own section OR into their own documents. I have a strong favor of putting them into their own documents if that enables us to finally finish/publish -transport-tls and the new syslog RFC series. If that is not an option, I'd prefer to spend some more work on -transport-tls, even if it delays things further, instead of producing something that does not meet the needs found in practice.

Rainer


> -----Original Message-----
> From: syslog-bounces@ietf.org [mailto:syslog-bounces@ietf.org] On
> Behalf Of robert.horn@agfa.com
> Sent: Thursday, May 08, 2008 5:53 PM
> To: Joseph Salowey (jsalowey); syslog@ietf.org
> Subject: Re: [Syslog] I-D Action:draft-ietf-syslog-transport-tls-12.txt
>
> Section 4.2 is better, but it still needs work to separate the policy
> decisions from the protocol definition. Policy decisions are driven by
> risk analysis of the assets, threats, and environment (among other
> things). These are not uniform over all uses of syslog. That makes it
> important to separate the policy from the protocol, in both the
> specifications and in the products.
>
> In the healthcare environment we use TLS to protect many of our
> connections. This is both an authentication protection and a
> confidentiality protection. The policy decisions regarding key
> management
> and verification will be very similar for a healthcare use of syslog.
> Some
> healthcare sites would reach the same policy decision as is in 4.2, but
> here are three other policy decisions that are also appropriate:
>
> Policy A:
> The clients are provided with their private keys and the public
> certificates for their authorized servers by means of physical media,
> delivered by hand from the security office to the client machine
> support
> staff. (The media is often CD-R because it's cheap, easy to create,
> easy
> to destroy, and easy to use.) During TLS establishment the clients use
> their assigned private key and the server confirms that the connection
> is
> from a machine with one of the assigned private keys. The client
> confirms
> that the server matches one of the provided public certificates by
> direct
> matching. This is similar to the fingerprint method, but not the same.
> My
> most recent experience was with an installation using this method. We
> had
> two hours to install over 100 systems, including the network
> facilities.
> This can only be done by removing as many installation schedule
> dependencies as possible. The media method removed the certificate
> management dependencies.
>
> Policy B:
> These client systems require safety and functional certification
> before
> they are made operational. This is done by inspection by an acceptance
> team. The acceptance team has a "CA on a laptop". After accepting
> safety
> and function, they establish a direct isolated physical connection
> between
> the client and the laptop. Then using standard key management tools,
> the
> client generates a private key and has the corresponding public
> certificate generated and signed by the laptop. The client is also
> provided with a public certificate for the CA that must sign the certs
> for
> all incoming connections.
>
> During a connection setup the client confirms that the server key has
> been
> signed by that CA. This is similar to a trusted anchor, but not the
> same.
> There is no chain of trust permitted. The key must have been directly
> signed by the CA. During connection setup the server confirms that the
> client cert was signed by the "CA on a laptop". Again, no chain of
> trust
> is permitted. This policy is incorporating the extra aspect of "has
> been
> inspected by the acceptance team" as part of the authentication
> meaning.
> They decided on a policy-risk basis that there was not a need to
> confirm
> re-inspection, but the "CA on a laptop" did have a revocation server
> that
> was kept available to the servers, so that the acceptance team could
> revoke at will.
>
> Policy C:
> This system was for a server that accepted connections from several
> independent organizations. Each organization managed certificates
> differently, but ensured that the organization-CA had signed all certs
> used for external communications by that organization. All of the
> client
> machines were provided with the certs for the shared servers (by a
> method
> similar to the fingerprint method). During TLS connection the clients
> confirmed that the server cert matched one of the certs on their list.
> The
> server confirmed that the client cert had been signed by the CA
> responsible for that IP subnet. The server was configured with a list
> of
> organization CA certs and their corresponding IP subnets.
>
> I do not expect any single policy choice to be appropriate for all
> syslog
> uses. I think it will be better to encourage a separation of function
> in
> products. There is more likely to be a commonality of configuration
> needs
> for all users of TLS on a particular system than to find a commonality
> of
> needs for all users of syslog. The policy decisions implicit in
> section
> 4.2 make good sense for many uses. They are not a complete set. So a
> phrasing that explains the kinds of maintenance and verification needs
> that are likely is more appropriate. The mandatory verifications can
> be
> separated from the key management system and kept as part of the
> protocol
> definition. The policy decisions should be left as important examples.
>
> Kind Regards,
>
> Robert Horn | Agfa HealthCare
> Research Scientist | HE/Technology Office
> T +1 978 897 4860
>
> Agfa HealthCare Corporation, 100 Challenger Road, Ridgefield Park, NJ,
> 07660-2199, United States
> http://www.agfa.com/healthcare/
> Click on link to read important disclaimer:
> http://www.agfa.com/healthcare/maildisclaimer
> _______________________________________________
> Syslog mailing list
> Syslog@ietf.org
> https://www.ietf.org/mailman/listinfo/syslog

The MonitorWare Knowledge Base

There's currently a lot brewing over here. With the release of phpLogCon, I finally got to a stage where we have a decent web front-end that enables us to support admins with troubleshooting right while they look at their log data. Of course, such a system not only deserves careful design, but it requires a knowledge base so that people troubleshooting can find solutions.

If we look at our web sites, one notices there are lots of troubleshooting resources already available. Just have a look at forum.adiscon.com and you see what I mean. This forum not only offers product specific support, but has a number of quite generic discussion forums (covering Windows Events and syslog messages). We also have other troubleshooting databases. Then, each product site (like rsyslog) has its own support forum.

While all of this is great, it does not play well with the idea of the central, one-stop troubleshooting resource we intend to build. So our first step towards this resource is putting the existing resources under a single umbrella and placing them into a single system. That, of course, requires some redesign and won't be perfect from day one.

The initial step is to consolidate all forums into a single one. My friend Andre is right now doing that. He has set up the new site kb.monitorware.com which in the future will provide all troubleshooting resources in an easy to find way. That site will be highly integrated with phpLogCon, which will be able to pull troubleshooting info from the central repository while looking at the local logs. I am very excited in seeing this become a reality (though I have to admit we are several month away from the ultimated goal).

The knowledge base site is currently in experimental operation and being finalized for production use. I hope to be able to go officially online next week.

rsyslog work log

Past day's rsyslog work log:
2008-05-06
- added documentation for TLS
- merge TLS branch into main devel branch
- released 3.19.0 (MILESTONE, we now have TLS! :-D )
- some cleanup (gotten rid of some more plain chars)
- fixed some issues thanks to darix
- fixed problem with man pages thanks to Michael Biebl's help
2008-05-07
- fixed some issues in liblogging (thanks to darix for pointing out)
- limited number of unavoidable compiler warnings when compiling with
GnuTLS
- released 3.19.1
2008-05-08
- bugfix: gtls netstram driver did not specify threading model (can
possbly lead to "interesting effects" ;))
- server's X509 cert fingerprint is obtained by client on connect
- thought hard about syslog-transport-tls-12 (obviously, this does
not manifest in code ;))

Tuesday, May 06, 2008

rsyslog 3.19.0 released - world's first syslog-transport-tls implementation

I am very pleased to announce rsyslog 3.19.0.

It is the first release that natively supports TLS for plain TCP syslog. Actually, it is the world's first implementation of the upcoming syslog-transport-tls IETF standard. As this standard is not yet finished, the implementation is obviously experimental.

Native TLS is a big improvement over existing functionality. For example, rsyslog can now be used without the help of stunnel, which relieves us of some problems from those configurations. To the best of my knowledge, rsyslog is the first open-source syslogd offering TLS support natively.

The current TLS functionality is limited to the bare minimum. During the next few weeks, I will improve it based on my own spec and feedback (hopefully received). My hope is to have a production-grade implementation by summer at latest. Please note rsyslog's premium and ultra-reliable RELP protocol does not yet support TLS (but can be used with stunnel without the real problems legacy tcp had with it). My plan is to let TLS mature with legacy syslog and then move it to RELP. Thus I can limit my development to one major use case, which I think will facilitate things.

There is some documentation on how to use the new TLS mode:

http://www.rsyslog.com/doc-rsyslog_tls.html

Currently, TLS is provided via GnuTLS. As I outlined earlier on the list, GnuTLS offered much more support to getting started (documentation and sample-code wise). I will focus on GnuTLS until I am fully satisfied with the TLS implementation). I'll then see that I can also integrate NSS. Advise in this regard would be highly welcome, so if you have some knowledge in this area, please contribute.

In order to support TLS (and multiple libraries!), a major rewrite of the networking components has been done. Rsyslog now supports a so-called "network streams" (netstreams) driver interface. This interface enables app-level functionality (like the legacy tcp syslog sender and receiver) to work with dynamically selectable netstream drivers (like plain (unencrypted) TCP) and TLS). This interface will enable rsyslog to utilize other TLS drivers (and even other protocols) in the future. Different drivers can even be used concurrently.

Rsyslog now has been split into a runtime system and tools (with currently rsyslogd being the only tool). This design will further strengthen modularization and help make rsyslog functionality available in small parts.

Finally, the RFC 3195 input has been rewritten in the form of an input plugin. It can now be build as part of the normal build procedure.

Please note that there were a couple of major changes. I expect the initial 3.19.0 to be quite Unstable. I recommend it for testing environments, only. Even those parts that were not directly touched may have become a bit destabilized due to the runtime split. So please use it with care. Feedback, however, would be more than welcome, because I need to start somewhere to stabilize this release. That can only be done with your help. So please use it on test systems, try to break it and file bug reports when it fails.

Download:

http://www.rsyslog.com/Downloads-req-viewdownloaddetails-lid-102.phtml

Changelog:

http://www.rsyslog.com/Article221.phtml

File your bug reports here ;) :

http://bugzilla.adiscon.com/rsyslog-bugs.html

I hope this release is useful. Feedback is much appreciated.

rsyslog work log

Yesterday's rsyslog work log:
2008-05-05
- made imgssapi work with new netstrm driver model
there were a couple of things where imgssapi was not compatible
with the new encapsulation. I did a somewhat dirty fix. The real
solution would be to turn gssapi functionality into a netstream
driver, which is too much for now (after all, we want to release
some time AND we need to have the code mature in practice
- added $DefaultNetstreamDriverCAFile config directive
- added $DefaultNetstreamDriverCertFile config directive
- added $DefaultNetstreamDriverKeyFile config directive
- added $ActionSendStreamDriver config directive

Monday, May 05, 2008

rsyslog work log

Past day's rsyslog work log:
2008-04-30
- added server-side TLS capability for plain tcp server
- added $InputTCPServerStreamDriverMode config directive
2008-05-02
- fixed a bug in imklog which lead to startup problems (including
segfault) on some platforms under some circumsances. Thanks to
Vieri for reporting this bug and helping to troubleshoot it.

Busy at the moment...

Some might have noticed that I am not as active as usual on the rsyslog project . As this seems to turn out to keep at least for the upcomi...