Backblaze Online Backup has released raw data collected from the more than 41,000 disk drives in its data center to correlate drive failures with drive model numbers, with SMART statistics, and other variables.

The data that collected is in two files, one containing the 2013 data and one containing the 2014 data. Backblaze says they will add data for 2015 and so on in a similar fashion. You’ll find links to download the data files at http://tinyurl.com/nqffmvz. You’ll also find instructions on how to create your own sqlite database for the data, and other information related to the files you can download.

Every day, the software that runs the Backblaze data center takes a snapshot of the state of every drive in the data center, including the drive’s serial number, model number, and all of its SMART data. The SMART data includes the number of hours the drive has been running, the temperature of the drive, whether sectors have gone bad, and many more things.

Each day, all of the drive “snapshots” are processed and written to a new daily stats file. Each daily stats file has one row for every drive operational in the data center that day. For example, there are 365 daily stats files in the 2014 data package with each file containing a “snapshot” for each drive operational on any given day. 

Each daily stats file is in CSV (column-separated value) format. The first line lists the names of the columns. Normalized values for 40 different SMART stats as reported by the given drive. Each value is the number reported by the drive.