puts - the probably simplest ruby method

Hello World V2

puts - the probably simplest ruby method

If you learn a new language, your first program will output "Hello World" to the screen. Maybe something like this:

puts "Hello World"

So it's safe to assume you're pretty familiar with the purpose of the puts method.

I had last time a very "funny" debugging session. I had such much fun during the whole process, I thought I will share it with you.

In our web application, we do some logging. During each request, a json entry is written into a file, for consecutive request we should have several lines in our log file.

something like that:

{"foo": "bar", "x": "y"}
{"bar": "baz", "j": "k"}
{"baz": "foo", "a": "b"}
# and so on...

but in reality the lines are longer and you get files like this one:

{"_id": "55df7573976825f3e30bb473", "index": 0, "guid": "fb79e489-202e-45fb-a2a1-aa4292ed204d", "isActive": true, "balance": "$2,856.84", "picture": "http://placehold.it/32x32", "age": 35, "eyeColor": "green", "name": "Russell Davidson", "gender": "male", "company": "ESCENTA", "email": "russelldavidson@escenta.com", "phone": "+1 (892) 599-2788", "address": "590 Orient Avenue, Sisquoc, Louisiana, 4786", "about": "Mollit velit ullamco officia minim eiusmod mollit. Nulla laboris exercitation anim nisi ad minim tempor velit. Tempor consectetur minim nisi cupidatat nisi.\r\n", "registered": "2015-08-04T02:22:20 -02:00", "latitude": -82.306857, "longitude": 32.112557, "tags": [ "amet", "exercitation", "culpa", "est", "voluptate", "Lorem", "aliquip" ], "friends": [ { "id": 0, "name": "Fry Maldonado" }, { "id": 1, "name": "Hester Fuller" }, { "id": 2, "name": "Scott Baker" } ], "greeting": "Hello, Russell Davidson! You have 8 unread messages.", "favoriteFruit": "apple"}
{"_id": "55df7573976825f3e30bb473", "index": 0, "guid": "fb79e489-202e-45fb-a2a1-aa4292ed204d", "isActive": true, "balance": "$2,856.84", "picture": "http://placehold.it/32x32", "age": 35, "eyeColor": "green", "name": "Russell Davidson", "gender": "male", "company": "ESCENTA", "email": "russelldavidson@escenta.com", "phone": "+1 (892) 599-2788", "address": "590 Orient Avenue, Sisquoc, Louisiana, 4786", "about": "Mollit velit ullamco officia minim eiusmod mollit. Nulla laboris exercitation anim nisi ad minim tempor velit. Tempor consectetur minim nisi cupidatat nisi.\r\n", "registered": "2015-08-04T02:22:20 -02:00", "latitude": -82.306857, "longitude": 32.112557, "tags": [ "amet", "exercitation", "culpa", "est", "voluptate", "Lorem", "aliquip" ], "friends": [ { "id": 0, "name": "Fry Maldonado" }, { "id": 1, "name": "Hester Fuller" }, { "id": 2, "name": "Scott Baker" } ], "greeting": "Hello, Russell Davidson! You have 8 unread messages.", "favoriteFruit": "apple"}

This file is then processed by Logstash and piped to Elasticsearch.

We noticed we are missing certain entries in Elasticsearch from our logfiles.

We run our logfiles trough the uniq | wc -l and noticed we have duplicate lines there, which was an error.

As we tried get the duplicated lines from several GB of files, we got nothing, at least we thought we got nothing, because the duplicate lines were... empty lines.

We removed the empty lines, and tried parse the json entries line by line.

Sometimes we got malformed json errors.

After doing some investigation I've seen that the not all json entries were seperated by a newline character, and that some empty lines are present in our log file.

something like that:

{"foo": "bar", "x": "y"}{"bar": "baz", "j": "k"}

{"foo": "bar", "a": "b"}
{"foo": "bar", "j": "k"}

{"foo": "bar", "x": "y"}{"baz": "bar", "a": "b"}

Due to the length of our json lines this pattern was harder to spot.

The code which generates our logs looks like this:

data = input.to_json
File.open('json.log', 'a') {|f| f.puts(data) }

Hmmm, puts looks suspicious... seems like it writes the data, and the newline character in seperate steps.

A quick look at the sources of puts:

rb_io_write(out, line);
if (RSTRING_LEN(line) == 0 ||
  !str_end_with_asciichar(line, '\n')) {
  rb_io_write(out, rb_default_rs);
}

Right, as you can see here, our data is written to the file, and then if it doesn't end with a newline (which is our case), a newline character is written in a second step.

This would be totally fine in a single threaded environment, but if multiple threads are appending to the same file (Unicorn anyone ?), the above described scenario happens, and we get a malformed log file.

Our fixed version:

data = input.to_json
File.open('json.log', 'a') {|f| f.write(data + "\n") }

Now the writes are atomic, we can't be sure about the order of the incoming lines, but in our case as we have a timestamp in our json for Logstash, this doesn't matter.

The problem was in front of the machine, as usual.

Happy coding!