ZAR 8.3 RAID recovery speed test results

ZAR has been discontinued

After about twenty years, I felt ZAR can no longer be updated to match the modern requirements, and I decided to retire it.

ZAR is replaced by Klennet Recovery, my new general-purpose DIY data recovery software.

If you are looking specifically for recovery of image files (like JPEG, CR2, and NEF), take a look at Klennet Carver, a separate video and photo recovery software.

The problem

As the storage systems grew, there was influx of complaints that ZAR performance is inadequate for the tasks it has to deal with. The most recent example is when we have a report of ZAR taking impractically long and finally running out of memory while attempting to recover an 1TB RAID5 array (exactly following our RAID5 recovery tutorial) holding about three millions of files.

There are three major aspects of this,

1TB of disk space takes quite a long time to scan
Processing 3,000,000 files requires massive computation effort.
Memory requirements also become a concern with 3,000,000 files.

So we started on it and achieved some significant progress as it came to version 8.3 build 7. The measurements were mostly done out of curiosity; however, you can see what the speed is like.

Test system setup

The machine is 3.06GHz hyperthreaded Pentium 4, with 2GB physical memory. The array is created on the two 40GB Seagate U6 drives in a RAID 0 (stripe set). These two 5400RPM drives (made circa 2002) are capable of delivering about 40MB/sec combined throughput in linear reads. Tests were performed with ZAR version 8.3 build 7.

Additional tweaks were

Cache sizes in ZAR are set to the minimum allowed values.
The /3GB switch was added to the boot.ini entry to allow ZAR to use 3GB of the virtual memory

Test results

Run number		1	2	3	4	5	6
Number of files		5M	6M	7M	8M	9M	11M
Number of directories		33K	133K	143K	153K	163K	183K
Time, minutes	Disk scan⁽¹⁾	15	16	15	16	16	17
	RAID reconstruction	13	13	15	16	19	23
	Filesystem reconstruction	42	44	51	57	104	1105
	Total⁽²⁾	68	71	79	89	138⁽³⁾	1145
Peak memory usage, MB		1200	1440	1660	1900	2120⁽³⁾	2600

Notes:

⁽¹⁾ In a best case scenario, disk scan time for a RAID array is proportional to the member disk size. In a worst case, disk scan time is proportional to the array size.⁽²⁾ This does not include time required to actually copy the files.⁽³⁾ Starting with nine millions of files we ran out of the available physical memory and massive swapping started. It was impractical to continue testing under these conditions and just one more run was performed to ensure we can actually handle 11M files.

Performance and hardware guidelines

Recovering a huge RAID volume requires that you do some prior planning and set up. You do not want the system to run out of memory and start swapping, as the recovery will grind to a halt. So to recover some really big array, you need to

add a /3GB switch in your boot.ini file
have at least 3GB (preferably 4GB) physical memory installed.
make sure you have some place to offload that data

Under these conditions, ZAR will recover an array containing about 12,000,000 objects (files and directories combined). This number is a full-stop limit - if the volume contains more than 12M objects, ZAR will eventually run out of memory.

On the other hand, a typical NTFS volume, something you most likely have at home, contains 500,000 files or less. For this typical setup, 512MB of a physical memory is enough with low cache size, and 1GB is enough if you have the cache size at maximum. Most of the reasonably modern systems will thus perform quite well on a typical volume. In these cases, no additional tweaks are necessary.

RAID recovery performance