
Par2 is a Reed-Solomon error-correcting utility for general files. It does not handle Unicode or work with directories. Its initial use was to maintain the integrity of Usenet posts on Usenet servers against bit rot. It has since been used by anyone interested in maintaining data integrity for any type of regular file on the filesystem.
It works by creating “parity files” for a source file. Should the source file suffer damage, up to a certain point, the parity files can rebuild the data and restore the source file, much like parity in a RAID array when a disk is missing.
Lets see it in action, we create a 100M file full of random data, and we get a SHA256 of that file:
$ dd if=/dev/urandom bs=1M count=100 of=test.img
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 7.76976 s, 13.5 MB/s
$ sha256sum test.img > test.img.sha256
$ cat test.img.sha256
986afd8e4f92569275bb4d3e3283f77dd556f8bc8138f80a1331c21c57456d16 test.img
If we were to have any sort of corruption of this file the sha256sum hash would most likley change. We can protect this file against some mild corruption with par2.
$ sudo aptitude install par2
$ par2 create test.img.par2 test.img
(...snip...)
Block size: 52440
Source file count: 1
Source block count: 2000
Redundancy: 5%
Recovery block count: 100
Recovery file count: 7
Opening: test.img
Computing Reed Solomon matrix.
Constructing: done.
Wrote 5244000 bytes to disk
Writing recovery packets
Writing verification packets
Done
By default we get a redundancy of 5%, this means that the file can be damaged up to 5% and we still can recover it with the parity files. This can be increased by using the “-r” switch.
$ ls -l test.img*
-rw-rw-r-- 1 ttyse ttyse 104857600 Apr  1 08:06 test.img
-rw-rw-r-- 1 ttyse ttyse     40400 Apr  1 08:08 test.img.par2
-rw-rw-r-- 1 ttyse ttyse     92908 Apr  1 08:08 test.img.vol000+01.par2
-rw-rw-r-- 1 ttyse ttyse    185716 Apr  1 08:08 test.img.vol001+02.par2
-rw-rw-r-- 1 ttyse ttyse    331032 Apr  1 08:08 test.img.vol003+04.par2
-rw-rw-r-- 1 ttyse ttyse    581364 Apr  1 08:08 test.img.vol007+08.par2
-rw-rw-r-- 1 ttyse ttyse   1041728 Apr  1 08:08 test.img.vol015+16.par2
-rw-rw-r-- 1 ttyse ttyse   1922156 Apr  1 08:08 test.img.vol031+32.par2
-rw-rw-r-- 1 ttyse ttyse   2184696 Apr  1 08:08 test.img.vol063+37.par2
We corrup the source file with dd by adding 256k zeroes in the “middle”.
$ dd seek=5k if=/dev/zero of=test.img count=256 conv=notrunc count=1k
1024+0 records in
1024+0 records out
524288 bytes (524 kB, 512 KiB) copied, 0.00442836 s, 118 MB/s
$ sha256sum -c test.img.sha256
test.img: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
The file is corrupted and we can no recover it using par2:
$ par2 repaire test.img.par2 test.img
Loading "test.img.par2".
Loaded 4 new packets
Loading "test.img.vol007+08.par2".
Loaded 8 new packets including 8 recovery blocks
Loading "test.img.vol031+32.par2".
Loaded 32 new packets including 32 recovery blocks
Loading "test.img.vol003+04.par2".
Loaded 4 new packets including 4 recovery blocks
Loading "test.img.vol001+02.par2".
Loaded 2 new packets including 2 recovery blocks
Loading "test.img.vol015+16.par2".
Loaded 16 new packets including 16 recovery blocks
Loading "test.img.vol063+37.par2".
Loaded 37 new packets including 37 recovery blocks
Loading "test.img.vol000+01.par2".
Loaded 1 new packets including 1 recovery blocks
There are 1 recoverable files and 0 other files.
The block size used was 52440 bytes.
There are a total of 2000 data blocks.
The total size of the data files is 104857600 bytes.
[...]
Repair is required.
1 file(s) are missing.
You have 1989 out of 2000 data blocks available.
You have 100 recovery blocks available.
Repair is possible.
You have an excess of 89 recovery blocks.
11 recovery blocks will be used to repair.
Computing Reed Solomon matrix.
Constructing: done.
Solving: done.
Wrote 104857600 bytes to disk
Verifying repaired files:
Target: "test.img" - found.
Repair complete.
Does the resulting SHA256 hash now match?
$ sha256sum -c test.img.sha256
test.img: OK
Good stuff!