Monday, March 30, 2015

Call for Log Samples

There is one big problem in research for better logging methods: no good logging sample repositories exist. Well, not even bad ones... I am currently doing some preliminary steps towards a new, better log normalization system. Among others, it will contain a structure analyzer which will remove much of the manual burden of creating normalization rules. But, guess what: while the project looks very promising, lack of log samples is a real big problem!

To solve that problem, I have setup a public log ingestor that you can simply send logs to. The system is reachable as follows:

hostname: logsubmit.rsyslog.com
port: 514
protocol: any flavor of syslog or other text data

If you run rsyslog, you can use add this snippet to /etc/rsyslog.conf:

*.* @logsubmit.rsyslog.com

How did this idea materialize? During my talk at the German Unix User's Group FFG 2015 conference last week in Stuttgart, I mentioned that problem and Dirk Wetter had the idea to provide a log receiver that makes it very easy for people to contribute. There were some concerns that this may open up my server for DoS, and that of course is true. Nevertheless, I liked the idea and so we setup a machine today. It may be DDoS'ed and other bad things may happen, but then we got more experience. It's split from the main systems, so that shouldn't cause much harm.

For log contributors, please keep on your mind that you send data to a public service and so this is probably not a great idea to do this for sensitive systems. But if we get enough data from uncritical systems, we can still gain a lot from that, most importantly it helps us gain insight into structural log mining methods -- which will also lead to above-mentioned tool. All logs gathered by this method will be placed in the research log repository, which currently is hosted on github. It is licensed under BSD 2-clause in the hope that a sufficiently large and diverse data set is also of great value for other researchers (did I mention it is ultra-hard to find any log sample data sets?). If you are interested in cotributing logs, but would want to do so under NDA, that's of course also possible. In that case, please just drop me an email to see how to best go forward with that.