How To Archive and Compress Files on Linux

As a system administrator, you may have downloaded some archives that you need to extract in order to reveal their files.

You may be also backing up an entire database made of a wide variety of small files that you want to aggregate in one single archive.

Archiving and compressing files are common operations in the Unix world, done by system administrators on a very regular basis.

Luckily for you, Linux exposes a set of different commands in order to archive, compress, uncompress and extract files from an archive.

In this tutorial, you will learn more about the tar command as well as the different compression methods that can be used in order to save space on Linux.

Ready?

Archive files on Linux using tar

Tar is a very popular command among system administrators.

Sometimes referred as tarball, tar was historically used to write data to devices that did not have file systems at the time.

As a consequence, the tar command was introduced in 1979 in order to replace the “tp” program that was used back then.

Nowadays, the tar command is widely used in order to archive files (meaning putting files together in a single archive).

To archive files on Linux using tar, run “tar” with the “cvf” options.

$ tar -cvf archive.tar file1 file2 directory1 directory2

file1/
file2/
directory1/
directory2/

In this case, we used three different options :

-c : for create archive, a pretty self-explanatory option if you want to create a new archive made from the files selected;
-v : for verbose, this is the reason why the command displays the files added to the archive when executing it;
-f : for file, this option is used in order to specify the filename of the archive we want to create (in this case archive.tar)

Those options are probably the most important options for archiving files on Linux.

When running the tar command with the “-f” flag, a new archive was created in your current working directory.

$ ls -l
total 20
-rw-rw-r-- 1 schkn schkn 10240 Nov  9 10:41 archive.tar
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory1
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory2
-rw-rw-r-- 1 schkn schkn     0 Nov  9 10:41 file1
-rw-rw-r-- 1 schkn schkn     0 Nov  9 10:41 file2

As you can see, the size of the archive is bigger than the sum of the files in it.

Why?

Creating a tar archive does not simply put files and directories in a big box : an archive is also a special file made of special file headers that may take a substantial amount of space.

As a consequence, your archive is way bigger than the sum of the files in it.

This is a very important fact because we are able to understand that archiving files does not mean that your files are compressed in it.

In order to compress files when archiving, you need to provide other options to the tar command.

File compression will be explained in the next chapters.

Extract files using tar on Linux

Now that you have created an archive file, you may want to extract the files located in your archive.

To extract files using the tar command, append the “-x” option instead of the initial “-c” option.

$ tar -xvf archive.tar

file1
file2
directory1/
directory2/

Note that extracting your files does not mean that the archive will be deleted from your current working directory.

$ ls -l

total 28
-rw-rw-r-- 1 schkn schkn 10240 Nov  9 12:01 archive.tar
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory1
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory2
-rw-rw-r-- 1 schkn schkn     0 Nov  9 12:00 file1
-rw-rw-r-- 1 schkn schkn     0 Nov  9 10:41 file2

When extracting files on Linux, there a little gotcha that you need to be aware of.

If a file on the current working directory has the same name as a file inside the archive, the content of the file in the working directory will be replaced with the one from the archive.

In order to illustrate it, add some content to one of your file, extract your files and re-inspect the content of your file again.

$ echo "Added some content to the file" > file1

$ tar -xvf archive.tar

$ cat file1
<empty>

Comparing local files with archive files

In order to prevent data to be erased during the process, the tar command can compare files located in your current working directory with files in your archive.

Back to the example we discussed earlier, let’s add some content back to the “file1” file.

$ echo "Added some content to the file" > file1

In order to compare files with tar, use the “-d” option.

$ tar -dvf archive.tar

file1
file1: Mod time differs
file1: Size differs
file2
directory1/
directory2/

As you can see, tar will compare timestamps and more specifically the latest modification date of the file.

If the modification date of the local file is more recent than the one from the archive file, the tar command will display a notice showing that the modification time differs.

Similarly, tar can inspect file sizes and highlight size differences between your files.

In order to avoid erasing your files, you can use the star command which is a great alternative to the existing tar command.

Prevent file overwriting using star

By default, the star utility might not be installed on your system.

In order to install the star utility, run the YUM utility

$ sudo yum install star

Then, in order to archive files with star, simply run “star” with the “-c” option.

$ star -c -f=archive.tar file1 file2

Then, you can use the gzip or gunzip utility in order to compress your new archive.

$ gzip archive.tar

As a consequence, the initial tar file will be transformed into a tar.gz archive.

Now if you were to create a file with the exact same name, the star utility would not overwrite it by default.

$ echo "This is some content" > file1

$ gzip -d archive.tar.gz

$ star -x -f=archive.tar
star: current 'file1' newer.
star: current 'file2' newer.
star: 1 blocks + 0 bytes (total of 10240 bytes = 10.00k).

$ cat file1
This is some content

Quite handy when you are afraid of losing your content!

Compressing files using gzip on Linux

Now that you have your tar archive ready, the next step is to compress it in order to reduce its size.

For that, we are first going to use the gzip utility.

By default, the gzip utility should be installed, but if this is not the case, make sure to install it depending on your distribution.

$ sudo apt-get install gzip

$ sudo yum install gzip

Now that gzip is installed, run “gzip” and pass the archive you just created as an argument.

$ gzip archive.tar

Running the gzip command will create a tar.gz file in the current working directory.

Most importantly, the initial tar file will be upgraded to a tar.gz so you won’t have the initial archive anymore.

$ ls -l
total 12
-rw-rw-r-- 1 schkn schkn  184 Nov  9 10:41 archive.tar.gz
drwxrwxr-x 2 schkn schkn 4096 Nov  9 10:41 directory1
drwxrwxr-x 2 schkn schkn 4096 Nov  9 10:41 directory2
-rw-rw-r-- 1 schkn schkn    0 Nov  9 10:41 file1
-rw-rw-r-- 1 schkn schkn    0 Nov  9 10:41 file2

As you can see, the file size was dramastically reduced from 10 Kb to a stunning 184 bytes, gzip reduced the filesize by over 98%.

However, if you don’t want to use the gzip utility, you can also compress files using the tar command with options.

Do you think it can improve the compression rate?

Compressing files on Linux using tar

As mentionned in the first section, the tar command can be used in order to archive and compress files in one line.

In order to compress files with tar, simply add the “-z” option to your current set of options.

$ tar -cvzf archive1.tar.gz file1 file2 directory1 directory2

Similarly to the first tar command that you have run, a new compressed archive file will be created in your current working directory.

To inspect files created, simply run the “ls” command again.

$ ls -l
total 28
-rw-rw-r-- 1 schkn schkn   184 Nov  9 10:41 archive.tar.gz
-rw-rw-r-- 1 schkn schkn   172 Nov  9 11:10 archive1.tar.gz
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory1
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory2
-rw-rw-r-- 1 schkn schkn     0 Nov  9 10:41 file1
-rw-rw-r-- 1 schkn schkn     0 Nov  9 10:41 file2

Now as you can see, the compressed archive created is slightly lighter than the one created with gzip.

Compressing files using bzip2

Most of the time, the gzip command is used in order to compress files or archives.

However, this is not historically the only compression method available in software engineering : you can also use bzip2.

The main difference between gzip and bzip2 is in the fact the gzip uses the LZ77 compression algorithm while bzip2 uses the Burrows-Wheeler algorithm.

Bzip2 is known to be quite slower than the gzip algorithm, however it can be handy in some cases to know how to compress using bzip2.

To compress files using bzip2, simply run “bzip2” with the filename that you want to compress.

$ bzip2 archive.tar

In order to decompress files compressed using bzip2, simply append the “-d” option to your command.

$ bzip -d archive.tar.bz2

Alternatively, you can create bz2 archives using the tar command and by specifying the “-j” option.

$ tar -cjf archive.tar.gz2 file1 file2

Using tar, you have the option to compress using a wide panel of different compression methods :

-j : compress a file using the bz2 compression method;
-J : uses the xz compression utility;
–lzip : uses the lzip compression utility;
–lzma : uses the lzma compression utility;
–lzop : uses lzop to compresss files
-z : equivalent to the gzip or gunzip utility.

Conclusion

In this tutorial, you learnt how you can archive and compress files using the tar utility on Linux.

You also learnt about the different compression methods available and how they can be used in order to reduce the size of your files and directories.

If you are curious about Linux system administration, we have a complete section dedicated to it on the website, so make sure to have a look.