Mina Naguib This is a technical article chronicling one of the most interesting bug hunts I’ve had the pleasure of chasing down. At AdGear Technologies Inc. where I work, ssh is king. We use it for management, monitoring, deployments, log file harvesting, even real-time event streaming. It’s solid, reliable, has all the predictability of a native unix tool, and just works. Until one day, random cron emails started flowing about it not working. The machines in our London data center were randomly failing to send their log files to our data machines in our Montreal data center. This job is initiated periodically from cron, and the failure manifested itself as: cron emails stating that the ssh was unsuccessful Sometimes hangs Sometimes exits with a timeout error nagios warnings down the line for in-house sanity checks detecting the missing data in Montreal We logged into t...