Thursday, July 11, 2019

filemanager - A 100+ GB tar.gz file does not properly extract




I have downloaded a 120 GB dataset in a tar.gz using Download Accelerator Plus, and used the following command to extract it:



tar -xvzf train_val2018.tar.gz


The dataset, when extracted, should contain a little more than 8,142 folders and exactly 461,939 image files.



When I open up the extracted folder, it does contain parts of the dataset. However, a huge portion of it is missing - it only contains 3,542 folders and 179,689 files.



Using the file explorer or the built-in file extractor GUI to extract it on Ubuntu only does worse. I've tried various applications on Windows as well, like WinZip, WinRAR, and 7Zip. None of these work, but the issue with them is that they had memory issues with such a large dataset.




Could the file be corrupted? The file has the same size as they've mentioned in the dataset website, and I've downloaded several larger files with DAP, and I've never encountered a corruption issue. For this reason, I'd like to know if there's some limitation with the built-in extractor or another issue.



The dataset I'm referring to is the iNaturalist 2018 Contest Dataset.



Yes, off course it could be.
And if you read they have the check step:



Running md5sum train_val2018.tar.gz should produce b1c6952ce38f31868cc50ea72d066cc3



if you do not want to compare manually the md5sum you can create a file md5sum-db (name can be whatever you like) containing the following:




b1c6952ce38f31868cc50ea72d066cc3 train_val2018.tar.gz




enter in the folder the compressed DB is and run:



md5sum -c md5sum-db



If the hash check match you will receive this output:



 train_val2018.tar.gz: OK


else



train_val2018.tar.gz: FAILED

md5sum: WARNING: 1 computed checksum did NOT match


Another way to check the gz file is to use the test it:



gunzip -t file.tar.gz


NOTE: this method do not ensure the data contained in the archive integrity.


No comments:

Post a Comment

11.10 - Can't boot from USB after installing Ubuntu

I bought a Samsung series 5 notebook and a very strange thing happened: I installed Ubuntu 11.10 from a usb pen drive but when I restarted (...