Difference between revisions of "Why not tarballs too large"

Revision as of 16:17, 3 May 2018

The recommendation is to not generate tarballs larger than 500GB.

The simplest way to understand why is by watching this domino chain reaction video:

The HPSS system has many moving parts (computer nodes, databases, network, switches, cables, disks, tape drives, robot arms in the library, etc). There are a number of minor hiccups that can happen in the pipeline as data is transferred from GPFS to HPSS, and back from tape to GPFS later on.
if one of these hiccups happens to a large tarball the whole transfer process is compromised, and you will have to start the transfer again from square one. The same type of hiccup may happen to ONE small tarball as well, however the probability of it affecting ONE very large tarball is much high. In this case, the waste of time if you have to restart the process and resume it trouble free is much higher.
htar for instance, does not have a built-in retrial feature, it's not resilient to external problems, and it will not pickup the slack from where a transfer failed.
besides our LTO5 tapes can only fit 1.5TB. It's easier to fit several 200GB files onto those tapes without wastage at the end than a 800GB file for instance. Although it's possible, we prefer to not split the same file over multiple tapes. By design, we do not stripe files over multiple tapes either.

Revision as of 21:15, 20 May 2016 (view source) Pinto (talk \| contribs) m ← Older edit		Revision as of 16:17, 3 May 2018 (view source) Pinto (talk \| contribs) m Newer edit →
Line 1:		Line 1:
−	The recommendation is to not generate tarballs larger than ~~200GB~~.	+	The recommendation is to not generate tarballs larger than 500GB.

	The simplest way to understand why is by watching this [https://www.youtube.com/watch?v=_1x99bOX7Yo domino chain reaction video:]		The simplest way to understand why is by watching this [https://www.youtube.com/watch?v=_1x99bOX7Yo domino chain reaction video:]