Recently I was working on a project involving rubyzip. There was a Queue of incoming Tempfiles from several threads, and my code in the main thread was happily adding them to the zipfile. Nothing terribly strange. Except…about 10% of the time I’d get an Errno::ENOENT from rubyzip (i.e. it couldn’t find the file).

queue, num_threads = spawn_threads
num_dead = 0

while num_dead < num_threads
  if (tmpfile = queue.pop)
    filename = File.basename(tmpfile.path)
    zipfile.add(filename, tmpfile.path)
  else
    num_dead += 1
  end
end

The astute reader will recall that Ruby’s Tempfile, depending on how it’s initialized, will automatically delete itself during garbage collection. I’ve been bitten by it before, but the above code seems safe in that regard.

My next suspect was my Very Clever (TM) use of threads and a queue. Was there some weird interaction when a Tempfile was created in one thread and passed to another? Seemed unlikely, but my multi-threading confidence was shaken for a bit. (Not necessarily a bad thing.)

After that wild goose chase, I found myself back at rubyzip. I noticed I was only passing the path into zipfile.add. But that should be fine - the tmpfile var was still in scope. Nevertheless I traced through the add method, and guess what? It doesn’t add the path to the zip file right away. The path gets added to some internal buffer and isn’t actually zipped until later. Later…in another loop…when the Tempfile instance has gone out of scope and the underlying file potentially deleted. Damn it.

Didn’t take long to find a workaround after that. Calling zipfile.commit will cause anything in the internal buffer to be zipped up right away. The final code was:

queue, num_threads = spawn_threads
num_dead = 0

while num_dead < num_threads
  if (tmpfile = queue.pop)
    filename = File.basename(tmpfile.path)
    zipfile.add(filename, tmpfile.path)
    zipfile.commit # force tmpfile.path to be zipped right now
  else
    num_dead += 1
  end
end