Stop logging in lines

In the 1970s UNIX was made, and thus everything was fine, and log files existed before that, but I’m not that old.

Universities had a UNIX machine, or sometimes too, as did a number of research places, etc etc. Logfiles were wonderful for keeping track of what was happening, and seeing what your system had been up. You looked at them with more as less didn’t exist until the early 80s (okay fine… syslog, be like that).

It’s 2018, you have more than one UNIX machine, some of you maybe even have three of them!!!11 Viewing logfiles like it was 1982 just does not work at scale. They’re often missing context unless you read a number of lines together, which is gross, or just don’t tell you information that works at scale. We need to let go of 80s and 90s style logging, now we have a lot of tools, and view them as data sources, not the lone machine talking to its one human operator.

What are the issues

Engage @mipsytipsy mode!

  • If have say a pool of webservers (say), and you’re ever logging in to them to read their logs, you’re possibly doing it wrong.
  • If you’re not using centralised logging, you’re definitely doing it wrong.
  • if you’re sending unstructured logs to your log thing, and it isn’t from some archaic ancient platform that will never get updated, you’re really doing it wrong.

In the whole treating servers like cattle not pets (which I don’t have a good vegan version of yet), the logs should somewhat be treated similar. To a degree you don’t care whether it was webserver 1 or webserver 2 that had the log line you’re looking in to, just it happened. The goal should be to get those logs out of apache/nginx to your logging platform (and sure disk/S3 whatevs) as quickly and least painfully as possible.

If you have a load balancer looking at individual hosts logs involves a lot of iterm2 windows, versus one Splunk window. You cannot do comparison and correlation with tmux.

If you’re spending time parsing logs with regexs, you’ve made a mistake. If you control the programme that outputs these logs, and the programmes that input these logs you should not be sending them as lines (or even worse multiple lines). JSON, sadly, is actually really good for this, and there are fast (read: written in C) JSON libraries for most languages.

I will in fact go as far to say, if you are doing enough traffic, your logs do not need to be human readable in the middle. Yup. Have your logs written as BSON, msgpack, protobufs, whatever to disk, and then ingested quickly and easily at the other end (Splunk can have inputs written for it if there’s not one already, logstash can do anything rubby can). If you need to read it off of disk, then you were using jq anyway, so having another tool that reads whatever and outputs it as a nice readable JSON blob is actually an acceptable thing to do, considering you will spend 99.999999999% of the time not doing this.

This isn’t even optimising your logs for speed, it’s optimising them for consumption, easy uniform consumption. The easier you get logs parsed correctly in your thing, the more of them you actually have that you can trust!