Transferring massive files—such as a 20GB database export—between servers can be a daunting task. Network interruptions, disk space constraints, and compatibility issues often get in the way. While there are many specialized tools available, most DevOps workflows can be handled efficiently using nothing but the built-in utilities found on almost every Linux and macOS system.
In this tutorial, we will walk through a robust workflow to compress, split, download, verify, and reassemble large files safely.
The Scenario
Imagine you have a large database dump named big_dump.sql (e.g., 20GB). Your goal is to move it from a remote server to your local machine (or another server) while ensuring:
- Reduced Transfer Size: Through compression.
- Reliability: By splitting the file into manageable chunks.
- Integrity: Verifying that no data was corrupted during transit.
Step 1: Compress the File
The first step is to reduce the file size. gzip is the most universal tool for this.
gzip big_dump.sql
This will replace big_dump.sql with a compressed version: big_dump.sql.gz.
Step 2: Generate a Checksum
Before splitting or moving the file, generate a "health check" fingerpint. This ensures we can verify the file's integrity at the destination.
md5sum big_dump.sql.gz > big_dump.sql.gz.md5
Step 3: Split into Manageable Parts
Large files are prone to transfer failures. By splitting the file into 100MB chunks, you make the download more resilient. If one chunk fails, you only need to re-download that specific part.
mkdir split_parts
split -b 100M -d --suffix-length=3 --numeric-suffixes=1 big_dump.sql.gz split_parts/part_
This creates files like part_001, part_002, etc., in the split_parts directory.
Step 4: Download the Parts
You can now download the entire directory using scp, rsync, or even a simple HTTP server.
scp -r user@server:/path/to/split_parts .
Step 5: Reassemble the File
Once all parts are on your local machine, use cat to stitch them back together in order.
cd split_parts
cat part_* > big_dump.sql.gz
Step 6: Verify Integrity
Compare the MD5 checksum of the reassembled file with the original checksum you generated in Step 2.
# On macOS
md5 big_dump.sql.gz
# On Linux
md5sum big_dump.sql.gz
# Compare with original
cat big_dump.sql.gz.md5
If the strings match, your file is perfectly intact.
Step 7: Final Extraction and Restore
Now you can uncompress the file and proceed with your work.
gunzip big_dump.sql.gz
If it's a database dump, you can restore it:
mysql -u root -p dbname < big_dump.sql
Pro Tips for Large Transfers
| Task | Command | Why? |
|---|---|---|
| Resume Downloads | rsync -avP |
Automatically resumes interrupted transfers. |
| View Progress | pv file |
Shows a progress bar while processing files. |
| Cleanup | rm part_* |
Always clean up chunks after successful reassembly. |
| Check Space | df -h |
Ensure you have enough disk space before starting. |
Why This Method Works
Professional DBAs and DevOps engineers prefer this method because it is:
- Universal: Works on any server (Ubuntu, CentOS, Alpine, macOS, WSL).
- Tool-Agnostic: No need to install third-party software.
- Corruption-Safe: Checksums guarantee reliability.
- Cloud-Friendly: Ideal for moving data between cloud providers.
Summary
Moving large files doesn't have to be stressful. By leveraging gzip, split, and md5sum, you create a transfer process that is both resilient to failure and easy to verify. It is the most universal and reliable way to move large data sets safely across the modern web.