par2 is a Reed-Solomon error correcting utility for general files. It does not handle Unicode, and it does not work on directories. It’s initial use was to maintain the integrity of Usenet posts on the Usenet server against bit rot. It has since been used by anyone who is interested in maintaining data integrity for any sort of regular file on the filesystem.
It works by creating “parity files” on a source file. Should the source file suffer some damage, up to a certain point, the parity files can rebuild the data, and restore the source file, much like parity in a RAID array when a disk is missing.
Lets see it in action, we create a 100M file full of random data, and we get a SHA256 of that file:
$ dd if=/dev/urandom bs=1M count=100 of=test.img 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 7.76976 s, 13.5 MB/s $ sha256sum test.img > test.img.sha256 $ cat test.img.sha256 986afd8e4f92569275bb4d3e3283f77dd556f8bc8138f80a1331c21c57456d16 test.img
If we were to have any sort of corruption of this file the sha256sum hash would most likley change. We can protect this file against some mild corruption with par2.
$ sudo aptitude install par2 $ par2 create test.img.par2 test.img (...snip...) Block size: 52440 Source file count: 1 Source block count: 2000 Redundancy: 5% Recovery block count: 100 Recovery file count: 7 Opening: test.img Computing Reed Solomon matrix. Constructing: done. Wrote 5244000 bytes to disk Writing recovery packets Writing verification packets Done
By default we get a redundancy of 5%, this means that the file can be damaged up to 5% and we still can recover it with the parity files. This can be increased by using the “-r” switch.
$ ls -l test.img* -rw-rw-r-- 1 ttyse ttyse 104857600 Apr 1 08:06 test.img -rw-rw-r-- 1 ttyse ttyse 40400 Apr 1 08:08 test.img.par2 -rw-rw-r-- 1 ttyse ttyse 92908 Apr 1 08:08 test.img.vol000+01.par2 -rw-rw-r-- 1 ttyse ttyse 185716 Apr 1 08:08 test.img.vol001+02.par2 -rw-rw-r-- 1 ttyse ttyse 331032 Apr 1 08:08 test.img.vol003+04.par2 -rw-rw-r-- 1 ttyse ttyse 581364 Apr 1 08:08 test.img.vol007+08.par2 -rw-rw-r-- 1 ttyse ttyse 1041728 Apr 1 08:08 test.img.vol015+16.par2 -rw-rw-r-- 1 ttyse ttyse 1922156 Apr 1 08:08 test.img.vol031+32.par2 -rw-rw-r-- 1 ttyse ttyse 2184696 Apr 1 08:08 test.img.vol063+37.par2
We corrup the source file with dd by adding 256k zeroes in the “middle”.
$ dd seek=5k if=/dev/zero of=test.img count=256 conv=notrunc count=1k 1024+0 records in 1024+0 records out 524288 bytes (524 kB, 512 KiB) copied, 0.00442836 s, 118 MB/s $ sha256sum -c test.img.sha256 test.img: FAILED sha256sum: WARNING: 1 computed checksum did NOT match
The file is corrupted and we can no recover it using par2:
$ par2 repaire test.img.par2 test.img Loading "test.img.par2". Loaded 4 new packets Loading "test.img.vol007+08.par2". Loaded 8 new packets including 8 recovery blocks Loading "test.img.vol031+32.par2". Loaded 32 new packets including 32 recovery blocks Loading "test.img.vol003+04.par2". Loaded 4 new packets including 4 recovery blocks Loading "test.img.vol001+02.par2". Loaded 2 new packets including 2 recovery blocks Loading "test.img.vol015+16.par2". Loaded 16 new packets including 16 recovery blocks Loading "test.img.vol063+37.par2". Loaded 37 new packets including 37 recovery blocks Loading "test.img.vol000+01.par2". Loaded 1 new packets including 1 recovery blocks There are 1 recoverable files and 0 other files. The block size used was 52440 bytes. There are a total of 2000 data blocks. The total size of the data files is 104857600 bytes. [...] Repair is required. 1 file(s) are missing. You have 1989 out of 2000 data blocks available. You have 100 recovery blocks available. Repair is possible. You have an excess of 89 recovery blocks. 11 recovery blocks will be used to repair. Computing Reed Solomon matrix. Constructing: done. Solving: done. Wrote 104857600 bytes to disk Verifying repaired files: Target: "test.img" - found. Repair complete.
Does the resulting SHA256 hash now match?
$ sha256sum -c test.img.sha256 test.img: OK