Writing Scripts on Linux using Bash

This tutorial discusses how you can easily write your own Bash scripts on Linux.

As a system administrator, it is quite likely that you are performing repetitive tasks that could be automated.

Luckily for you, there is a programming language that can be used on Linux in order to write scripts : the Bash programming language.

Using Bash, you can schedule entire system backups by specifying as many commands as you want in Bash scripts.

You can also have custom scripts that create and delete users and remove their associated files.

With Bash, you can also have scripts running and dumping useful performance metrics to a file or database.

Bash is a very powerful programming language for system administrators.

In today’s tutorial, we are going to learn all the basics that there is to know about Bash : how to create and run scripts, how to use variables and shell built-ins effectively.

What You Will Learn

If you read this tutorial until the end, you are going to learn about the following concepts regarding Bash

  • How to create and run Bash scripts using the command line;
  • What shebang is and how it is used by Linux for scripting;
  • What shell built-ins are and how they differ from regular system programs;
  • How to use Bash variables and what special variables are;
  • How to use Bash command substitution;
  • How to use simple IF statements on Bash;

As you can see, this is quite a long program, so without further ado, let’s start by seeing how you can create and run Bash scripts.

Getting Started with Bash

Before issuing any commands, let’s have a quick word on Bash and Shell common histories.

History of Bash

The first version of the Bash shell was released in 1989 by Brian Fox and it comes as an open-source implementation of the Unix shell.

Back then, as Unix systems started to arise, such systems were using standard Unix shells named Bourne shells.

Getting Started with Bash sh
Image from LinuxJournal

In the early days of Unix, systems developed by companies such as the MIT or Bell Labs were not free and they were not open-source.

Even if documentation was provided for those tools, it became a priority for the GNU initiative (led by Richard Stallman) to have its own version of the Unix Shell.

Six years after announcing the GNU project, the Bash (Bourne-Again Shell) shell was born with even more features than the original Bourne shell.

Bash programming language

When working with a Unix-like system such as Linux, Bash usually has two meanings :

  • Bash is a command-line interpreter or in other words a Unix shell. It means that whenever you are opening a terminal, you will be facing a Unix shell that is most of the time a Bash shell.

When typing commands directly in the terminal, commands are interpreted by the shell, executed using system calls and return values are given back to the end-user.

If you are not sure about the current interpreter that you are using, the SHELL environment variable indicates which shell you are currently using.

$ printenv SHELL

Bash programming language shell

As you can see, in this case, we are correctly using the Bash command interpreter to work.

It is important to note that even if terms like “Bash scripting” and “Shell scripting” are used interchangeably, they might not actually describe the same thing depending on your distribution.

Some recent distributions (such as Debian 10) have symbolic links from the original Bourne shell (named sh) to their own shell implementations (in this case Dash or Debian Almquist shell)

Bash programming language dash

  • Bash is also describing a command-line language and it is also referred as the Bash language. Bash exposes a set of operators and operands that can be used in order to have some basic features such as piping or executing multiple commands at once.

When executing some basic piping, you are used to work with the “|” symbol. This symbol is part of the Bash command-line language.

The same logic goes for the “&&” symbol that executes the second command if, and only if, the first command succeeded.

$ command1 && command2

Create and Run Bash Scripts

Now that you have some background about the Bash shell and the Bash command-line language, let’s start by creating and running simple Bash scripts.

In order to create your first Bash script, simply create a file named “script.sh”.

As you probably already noticed, we are still using the “sh” extension referring to the original Bourne shell (also denoted as sh).

$ touch script.sh

Now creating a file ending with the “sh” extension is not enough for your script to be considered as a Shell script.

You can actually see that your file is not yet considered as a shell script by running the file command.

$ file script.sh

Create and Run Bash Scripts file-command

As you can see here, your file is only described a simple empty file.

In order for your file to be described as a shell script file, you need to specify the shebang line at the top of your file.

Specifying shell using shebang

If you have been using Linux for quite some time, it is very likely that you have already encountered the shebang line at the beginning of your file.

Shebang, short for “Hash + “Bang”, is a one-liner set at the beginning of shell scripts in order to specify which shell is to be used to interpret this script.

#!/bin/<shell>

In our case, we want to work with Bash scripts. In other words, we want our scripts to be interpreted by a Bash interpreter.

In order to determine the path to the interpreter, you can use the “which” command.

$ which bash
/bin/bash

Now that you know the path to your interpreter, edit your script file and add the shebang line at the top of your file.

#!/bin/bash

Now that you have added this line at the beginning of your file, re-execute the “file” command in order to see the difference.

Specifying shell using shebang shebqng

As you can see, the output is slightly different : this time, your script is seen as a “Bourne-Again shell script” and more importantly as an executable.

So what would happen if you didn’t specify the shebang line at the top of the script.

When not specifying the shebang line, the script is run using the current shell used in order to start the execute command.

Now that you know how to create Bash scripts, let’s see how you can executing them.

Execute Bash Scripts

In order to execute Bash scripts on Linux, you essentially have two options :

  • By specifying the shell interpreter that you want to use and the script file;
  • By using the path to the script file

Specifying the shell interpreter

The first method is pretty straightforward.

In order to execute your bash script, you are going to specify the interpreter that you want to use by yourself.

$ bash <script>

$ /bin/bash <script>

Using the example that we used before, this would give us the following output.

Specifying the shell interpreter bash-script

As you can see, this method does not even require the execute permissions on the file, you just need to be able to use the bash executable.

Specifying the shell interpreter bash-script-2

As you can see, when logged as another user, without execute permissions, I am still able to execute this script.

This is an important remark because you might want to store your script files in protected directories (that only you can access) in order to prevent other users from executing your files.

Specifying the path to the script

The other way to execute bash scripts is to specify the path to the file.

In order to use this method, the file needs to have execute permissions.

First, use the “chmod” command in order to set execute permissions for the current user.

$ chmod u+x <script>

Specifying the path to the script chmod

As you can see, the file color is quite different : your current terminal highlights executable files using specific colors, in this case the green color.

Now that your script is executable, you can execute it by specifying the relative or absolute path to the script.

Using a file named “script.sh” located in my current working directory, the script can be executed by running

$ ./script.sh

If you are in another directory, you will have to specify the absolute path to the script file.

$ /home/user/script.sh

Specifying the path to the script absolute-path

As you probably realized by now, this method is not very convenient if you have to specify the path to the script every single time.

Luckily for you, there is a way to execute your script by simply typing the filename in the command-line.

Adding the script to PATH

The “.sh” extension is not needed for a script to be considered a script file.

For the sake of simplicity, we are going to rename the existing “script.sh” file to “script”.

To rename files on Linux, simply use the “mv” command and specify the source and destination targets.

$ mv script.sh script

Now, what if you wanted to execute your script by typing “script”?

In order to do that, you have to add the path to your script to your PATH environment variable.

To print the current value of your PATH environment variable, use “printenv” with the “PATH” argument.

$ printenv PATH

To update the PATH in your current working environment, edit the PATH environment variable using the following syntax.

$ export PATH="<path_to_script>:$PATH"

Now, the “script” command you just defined will be directly available without specifying any paths : you can launch it like any other commands.

Adding the script to PATH script-command

Note : if you want to make your changes permanent, follow those steps to update your PATH variable properly.

Shell built-ins explained

Before declaring any variables in your shell script, it is important for you to know about shell built-ins.

When you are working with the Bash shell, you are most of the time executing “programs“.

Examples of programs are “ls”, “fdisk” or “mkdir”. Help for those commands can be found by using the “man” command short for “manual”.

However, have you ever tried to read the documentation for the “source” command?

Shell built-ins explained man-source

You would not be able to read the documentation using “man” because the source command is a shell built-in function.

In order to read the documentation for shell built-ins, you have to use the “help” command.

$ help <command>

Shell built-ins explained help-source
The list of shell built-ins is quite extensive but here is a screenshot of every bash built-in command that you may find on Ubuntu systems.
built-in

Using Bash Variables

Now that you know about Bash built-ins, it is time for you to start writing your own Bash scripts.

As a reminder, the commands typed in your terminal can be used in a Bash script in the exact same way.

For example, if you want a script that simply executes the “ls -l” command, simply edit your script, add the shebang line and the command.

#!/bin/bash

# This simple script executes the ls command

ls -l

Using Bash Variables ls-l

Now, what if you wanted to have Bash variables?

Bash variables are simple program variables that can store a wide variety of different inputs.

To declare a Bash variable, simply specify the name of the variable and its value separated by an equal sign.

VAR=value

In order to be able to use the content of your Bash variable in your script, use “$” and append the name of your variable.

echo $VAR

variable

Even if you can use this syntax in order to have the variable value, you can also use the “brackets” notation.

echo ${VAR}

Using this syntax, variables can be combined together.

If you have two Bash variables named VAR1 and VAR2 for example, you can have them both printed using the following syntax

echo "${VAR1}${VAR2}"

curly-brackets

Executing commands within scripts

In order to execute commands inside Bash scripts, you have to use command substitution.

Command substitution is a technique used in Bash shells in order to store the result of a command in a variable.

To substitute a command in Bash, use the dollar sign and enclose your command in brackets.

VAR=$(command)

For example, in order to get the result of the numbers of files in your current directory, you would write

#!/bin/bash

NUMBER=$(ls -l | wc -l)

echo "${NUMBER} files in this directory!"

Executing commands within scripts files

As you can see, command substitution is pretty handy because it can be used to dynamically execute commands in a shell script and return the value back to the user.

Speaking about returning results to the end-user, how do you handle scripts not terminated correctly?

What if a command inside the script did not execute properly?

Understanding Exit Statuses

When you are executing a script, even if you are not returning a value, the script always returns what we call “an exit status”.

An exit status in Bash scripting indicates whether the script execution was successful or not.

If the status code is zero, your script execution was successful. However, if the value is any different from zero (say one, two or more), it indicates that the script execution was not successful.

To demonstrate the exit status, run any valid command in your bash shell.

echo "This is a simple working command"

Now, use this command in order to inspect the exit status of the last command run.

echo ${?}

Understanding Exit Statuses exit-status

As you can see, the output of this command is “0” or the exit status of the last command I executed.

This syntax (“${?}”) can be used in scripts in order to make sure that commands executed properly.

The exit status can be used in scripts in order to exit the script with a specific status code.

For example, if you want to exit the script with an error, you can use the following command in your script.

exit 1

Similarly, you can use the “zero” exit code in order to specify that the script executed successfully.

exit 0

In order to verify if the status code was correct, you are going to need basic conditional statements such as the IF statement.

Manipulating conditions in Bash

Sometimes, executing bash scripts is not only about having multiple commands next to each other : you want to have conditional actions.

In some cases, it might be handy to have a condition checking whether the current user is the root user (or just a specific user on your system).

One simple way to have conditions in Bash is to use the if statement.

“If” is shell built-in, as a consequence, the manual is available via the “help” command

$ help if

Manipulating conditions in Bash if-help

The help page describes the syntax for the if command using semi-colons, but we are going to use this syntax (that is equivalent)

if [[ condition ]]
then
  <commands>
else
  <command>
fi

Practice case : checking if the user is root

In order to showcase what the if statement can be used for, we are going to write a simple script checking whether a user if the root user or not.

As a reminder, the root user always has the UID set to zero on any Unix system.

Knowing this information, we are going to check whether the UID is set to zero, if this is the case, we are going to execute the rest of the script, otherwise we are going to exit the script.

As explained in other tutorials (about user administration), you can get the current user ID by using the “id” command.

$ id -u
1000

We are going to use this command in order to check whether the user executing the script is root or not.

Create a new script and add the shebang line to it.

#!/bin/bash

Right below it, add the “id” command and store the result in a variable named “USERID” using command substitution.

USERID=$(id -u)

Now that the “USERID” contains the current user ID, use an IF statement in order to check whether the user ID is zero or not.

If this is the case, write a simple informational message, if not exit the script with an exit status of 1.

if [[ "${USERID}" -eq 0 ]]
then
  echo "This is root"
else
  exit 1
fi

Now if you execute the script as your current user, the script will simply exit with an exit status of one.

Practice case checking if the user is root script-root

Now, try to execute the script as the root user (with the sudo command)

Practice case checking if the user is root sudo-script

As you can see, your informational message was displayed and the script exited with an error code of zero.

Conclusion

In this tutorial, you learnt about the Bash programming language and how it can be used in order to create Bash scripts on your system.

You also learnt about exit statuses and conditional statements that are key in order to have custom logic set into your scripts.

Now that you have more knowledge about Bash, you should start by writing your own scripts for your needs : you can start by having a tutorial on create archive backup files for example.

If you are interested in Linux System administration, we have a complete section dedicated to it on the website, so make sure to check it out!

How To Check If File or Directory Exists in Bash

When working with Bash and shell scripting, you might need to check whether a directory or a file exists or not on your filesystem.

Based on this condition, you can exit the script or display a warning message for the end user for example.

In order to check whether a file or a directory exists with Bash, you are going to use “Bash tests”.

In this tutorial, you are going to learn how to check if a file or directory exists in a Bash script.

Check If File Exists

In order to check if a file exists in Bash, you have to use the “-f” option (for file) and specify the file that you want to check.

if [[ -f <file> ]]
then
    echo "<file> exists on your filesystem."
fi

For example, let’s say that you want to check if the file “/etc/passwd” exists on your filesystem or not.

In a script, you would write the following if statement.

#!/bin/bash

if [[ -f "/etc/passwd" ]]
then
    echo "This file exists on your filesystem."
fi

Check If File Exists check-file

Check File Existence using shorter forms

In some cases, you may be interested in checking if a file exists or not directly in your Bash shell.

In order to check if a file exists in Bash using shorter forms, specify the “-f” option in brackets and append the command that you want to run if it succeeds.

[[ -f <file> ]] && echo "This file exists!"

[ -f <file> ] && echo "This file exists!"

Using the example used before, if you want to check if the “/etc/passwd” file exists using shorter forms, you write the following command

[[ -f /etc/passwd ]] && echo "This file exists!"

Check File Existence using shorter forms check-file-shorter

So how does this command work?

Shorter forms are closely related to exit statuses.

When you run a command on Bash, it always exits with an error status : 0 for error and numbers greater than 0 for errors (1, 2.. 6 and so on)

In this case, the “&&” syntax will check if the exit status of the command on the left is equal to zero : if this is the case, it will execute the command on the right, otherwise it won’t execute it.

Protip : you can use “echo ${?}” in order to see the exit status of the latest command run

Checking multiple files

In some cases, you may want to check if multiple files exist on your filesystem or not.

In order to check if multiple files exist in Bash, use the “-f” flag and specify the files to be checked separated by the “&&” operator.

if [[ -f <file1> ]] && [[ -f <file2> ]]
then
  echo "They exist!"
fi

Check If File Does Not Exist

On the other hand, you may want to check if a file does not exist on your filesystem.

In order to check if a file does not exist using Bash, you have to use the “!” symbol followed by the “-f” option and the file that you want to check.

if [[ ! -f <file> ]]
then
    echo "<file> does not exist on your filesystem."
fi

Similarly, you can use shorter forms if you want to quickly check if a file does not exist directly in your terminal.

[[ ! -f <file> ]] && echo "This file does not exist!"

[ ! -f <file> ] && echo "This file does not exist!"

check-file-does-not-exist-shorter

Note that it is also possible to check if a file does not exist using the “||” operator.

The “||” operator will execute the command on the right if and only if the command on the left fails (i.e exits with a status greater than zero).

To test if a file does not exist using the “||” operator, simply check if it exists using the “-f” flag and specify the command to run if it fails.

[[ -f <file> ]] || echo "This file does not exist!"

Check If Directory Exists

In order to check if a directory exists in Bash, you have to use the “-d” option and specify the directory name to be checked.

if [[ -d "$DIRECTORY" ]]
then
    echo "$DIRECTORY exists on your filesystem."
fi

As an example, let’s say that you want to check with Bash if the directory /etc exists on your system.

In order to check its existence, you would write the following Bash script

#!/bin/bash

if [[ -d /etc ]]
then
    echo "/etc exists on your filesystem."
fi

When executing this script, you would get the following output

Output

$ /etc exists on your filesystem

Check Directory Existence using shorter forms

In some cases, you may be interested in checking if a directory exists or not directly in your Bash shell.

In order to check if a directory exists in Bash using shorter forms, specify the “-d” option in brackets and append the command that you want to run if it succeeds.

[[ -d <directory> ]] && echo "This directory exists!"

[ -d <directory> ] && echo "This directory exists!"

Let’s say that you want to check if the “/etc” directory exists for example.

Using the shorter syntax, you would write the following command.

[ -d /etc ] && echo "This directory exists!"

check-directory-2

Creating a complete Bash script

If you find yourself checking multiple times per day whether a file (or multiple) exists or not on your filesystem, it might be handy to have a script that can automate this task.

In this section, you are going to create a Bash script that can take multiple filenames and return if they exist or not.

If they don’t, a simple notification message will be displayed on the standard output.

Create a new Bash script and make it executable using chmod.

$ mkdir -p ~/bin 

$ cd ~/bin && touch check_file && chmod u+x check_file && vi check_file

Here is the content of the script to be used to dynamically check if files exist.

#!/bin/bash

# Using argument expansion to capture all files provided as arguments.

for FILE in ${@}
do
  if [[ ! -f $FILE ]]
  then
    echo "The file ${FILE} does not exist!"
  fi
done

Save your script and add the “bin” folder you just created to your PATH environment variable.

$ export PATH="~/bin:$PATH"

$ printenv PATH

~/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Now that your script is accessible wherever you are on the system, you can call your script and start checking if files exist or not.

$ check_file /etc/passwd /etc/pass /etc/file

The file /etc/pass does not exist!
The file /etc/file does not exist!

Awesome!

You created a custom to check whether files exist on your filesystem or not.

Conclusion

In this tutorial, you learnt how you can check if a file exists or not using Bash tests and Bash short syntax.

Similarly, you learnt how it is possible to verify if a directory exists.

Finally, you have written a complete Bash script that accepts dynamic arguments in order to check if multiple files exist or not.

If you are interested in Bash programming or in Linux System administration, we have a complete section dedicated to it on the website, so make sure to check it out!

Screen Command on Linux Explained

The screen command is a very common command used on Linux to launch and arrange multiple terminal shells within one single shell.

Screen is most of the time used for two purposes.

It can either be used in order to organize multiple shells and navigate between them easily.

It can also be used in order to have long running commands on remote servers.

In fact, screen is launched in order to ensure that you don’t lose any distant work in case of a sudden network outage.

In this tutorial, we are going to have a complete overview of what the screen command on Linux is and how it can be used effectively.

Ready?

Prerequisites

In order to install new packages, you will have to be a sudo user.

If you need to add a user to sudoers on Debian, there is a tutorial for it.

There is also one for Red Hat based distributions.

When you are ready, type this command to make sure that you have sudo rights.

$ sudo -l

User user may run the following commands on localhost:
   (ALL) ALL

Installing Screen on Linux

In order to install screen, you will have to run one of the following commands.

On Debian-based distributions, you can run

$ sudo apt-get install screen

For Red Hat based distributions, you will have to run

$ sudo yum install -y screen

When you are done, you can run this command in order to check your current screen version.

$ screen -v
Screen version 4.06.02 (GNU) 23-Oct-17

Interacting with Screen on Linux

As described previously, screen is used in order to start an interactive shell where you can rearrange multiple shells in it.

To start your first screen, simply type the following command.

Interacting with Screen on Linux screen

As you can see, there are no differences with your previous shell except for the fact that the header writes

screen 0: antoine@localhost:~

From there, it means that a bash interpreter is executed within another bash interpreter through the screen utility.

To run a command, simply type a command like you would normally do.

simple-command

Getting help with screen

When interacting with the screen command, you will have to run shortcuts on your keyboard to execute actions within the screen session.

By default, the “Ctrl + A” keystrokes are designed to interact with screen.

As an example, type “Ctrl + A” then “?” in order to get the screen help.

screen-help

As you can see, you many different options to interact with screen.

The most popular ones are probably the following ones :

  • “Ctrl + A” then “d” : detach mode. You can use this option in order to detach (meaning going back to your original shell) and let screen run in the background. This is particularly handy when you are running long tasks on your host.
  • “Ctrl + A” then “x” : lockscreen. This is used in order to protect your screen session to be used by another user. As a consequence, a password is set to your session.
  • “Ctrl + A” then “c” : screen command. One of the most used commands by far, those bindings are used in order to create a new window within your screen instance.
  • “Ctrl + A” then “|” : vertical split. By default, this command will split your current window into two different areas that you can interact with.
  • “Ctrl + A” then “S” : horizontal split.
  • “Ctrl + A” then “n” : used to navigate between your different window sessions from the one with the lowest index until the one with the greater index.
  • “Ctrl + A” then “Tab” : used in order to move your input cursor to one of your different areas within screen.

In most of the cases, those are the commands that you are going to use with screen.

Splitting regions with the screen command

As an example, we are going to create the following layout with the screen command on Linux.

Splitting regions with the screen command

As you can see, we are going to split the terminal both vertically and horizontally.

One window will be used in order to execute a command (like a long running command for example).

A second window will be used in order to monitor the system performance using the top command.

Finally, another window can be used in order to edit a file via the nano editor.

Screen window creation

First, create a screen session by executing the following command.

$ screen -S user-screen

The -S option will create a named session for your screen environment.

When multiple administrators are working on the same system, it might be a good idea to have named sessions to distinguish your sessions from others.

Next, as you are going to manipulate three shell sessions, you are going to need two additional screens.

Execute the “create” command two times (“Ctrl + A” then “c”). You should end up with the following screen.

Notice the header or your terminal shell displaying “screen 2” because three screens are currently active for your session.
Screen window creation screen

Splitting screen windows vertically and horizontally

The next step will be to create your different regions on your current window.

To achieve that, first split your current region vertically by pressing “Ctrl +A” then “|”

You should end up with the following output.

Splitting screen windows vertically and horizontally

Next, split your layout horizontally by pressing “Ctrl +A” then “S”.

This is what you should get.

Splitting screen windows vertically and horizontally layout

Navigate to your second region by pressing “Ctrl +A” then “Tab”. By default, your region should be empty, so you can press “Ctrl +A” then “n” in order to navigate to your screen 0 session.

From there, execute the command you want to run.

Splitting screen windows vertically and horizontally screen-4

Repeat the previous steps in order to execute commands in the other regions.

Remember that you have to navigate between your screen windows when you first enter a split region.

Splitting screen windows vertically and horizontally final-screen

Awesome! You created your complete custom screen session.

Detaching from your screen session

In order to detach from your screen session, simply press the following keystrokes.

$ Ctrl + A then d

Detaching from your screen session detach

Your screen session is still executing in the background.

This is one of the main aspects of the screen command.

It can be used in order to create to a remote host via SSH, perform some actions, and exit to come back to them later on.

This way, you don’t have to manipulate background and foreground that you may lose by closing your current terminal.

To verify that your screen session is still running, run the following command

$ pstree | grep -A 2 -E "screen-"

Detaching from your screen session pstree-1

Reattaching to your screen session

First of all, you can list your screen windows by executing the following command.

$ screen -ls

Reattaching to your screen session screen-ls

If you named your screen session, you can simply run the following command

$ screen -r <session_name>

In the example above, I would have to run this command in order to go back to my session.

$ screen -r user-screen

Note that you can also get back to your session by using the screen ID at the very left of the ls command.

$ screen -r 3600

Unfortunately, you lose all your visual changes and you will have to split your windows again.

However, with enough practice, you might learn about the shortcuts very easily.

Locking your screen session

Sometimes, you might want to prevent other users from interacting with your screen session.

In order to lock your screen, hit “Ctrl + A” then “x”.

Locking your screen session lock

In order to unlock it, you have to enter the password of the user owning the screen session.

Conclusion

In today’s tutorial, you learnt how you can easily manipulate the screen command on Linux in order to create custom shell environments within a common shell.

You learnt that it can be used on remote servers in order to make sure that you can exit the session and still save your work (if you have long running commands in the foreground for example).

If you are interested about Linux System Administration, we have a complete section dedicated to it on the website.

How To Rename a Directory on Linux

If you have been working with Linux systems for quite some time, you already know how important it is to keep your filesystem structured.

In some cases, you may need to create temporary directories with random names that need to be renamed later on.

Renaming directories on Linux is not done with a dedicated renaming command but with a command that serves multiple purposes : the “mv” command.

The “mv” command is used on Linux in order to be able to move files but also to rename directories.

In this tutorial, we are going to learn how you can rename directories on Linux.

Rename Directories on Linux using mv

To rename a directory on Linux, use the “mv” command and specify the directory to be renamed as well as the destination for your directory.

$ mv <source_directory> <target_directory>

For example, let’s say that you want to rename a specific directory on your filesystem named “temp” (located in your home directory) to “directory” (also in your home directory)

To rename this directory, you would use the “mv” command and specify the two directory names.

$ mv /home/user/temp /home/user/directory
Note : using the mv command will not delete the content stored inside your directories, you won’t lose any files by renaming your directories on Linux.

Now if you take a look at all the directories stored in your home directory, you will see a new entry for your “directory” folder.

$ ls -l /home/user

drwxr--r-x   2 user user 4096 Nov  9 16:41 Desktop/
drwxr-xr-x   2 user user 4096 Nov  9 16:41 Documents/
drwxr-xr-x   2 user user 4096 Nov  9 16:41 Downloads/
drwxr-xr-x   2 user user 4096 Nov  9 16:41 Music/
drwxrwxr-x   2 user user 4096 Dec 20 10:53 directory/

Awesome, you just renamed a directory on Linux.

Rename Directories using find

In some cases, you may not know directly where your directories are located on your system.

Luckily for you, there is a command that helps you find and locate directories on a Linux system : the find command.

In order to find and rename directories on Linux, use the “find” command with the “type” option in order to look for directories. You can then remove your directories by executing the “mv” command with the “-execdir” option.

$ find . -depth -type d -name <source_directory> -execdir mv {} <target_directory> \;

For this example, let’s pretend that you want to rename a directory beginning with “temp” on your filesystem to “directory”.

The first part of the command will locate where your directory is located.

$ find . -depth -type d -name "temp"

./temp

Now that you know where your directory is, you can rename it by using the “execdir” option and the “mv” command.

$ find . -depth -type d -name temp -execdir mv {} directory \;

Rename Multiple Directories using Bash

As described in our previous tutorials, the Bash scripting language can also be used in order to rename multiple directories on your filesystem.

To rename multiple directories on Linux, create a new script file and use the “mv” command in a “for” loop to iterate over directories.

#!/bin/bash

# Takes directory entries specified and renames them using the pattern provided.

for directory in *
do
    if [ -d "$directory" ]
    then
      mv "${directory}" "${directory}_temp" || echo 'Could not rename '"$directory"''
    fi
done

Save this script as “change_name” and add it to your PATH environment variable if you want to use it on your entire system.

In this script, we are listing all the files and directories that are located in the current working folder (where the script is located).

We are testing if the entry is a directory and if the directory exists using the “-d” option.

Then, if the directory exists, it is renamed to have a “_temp” extension at the end. Feel free to customize this line in order to rename the directories however you want them to be renamed.

$ ls

folder1/  folder2/

$ change_name

$ ls 

folder1_temp/  folder2_temp

Congratulations, you just renamed directories using a Bash script on Linux.

Rename Directories using rename

Instead of using the “mv” command, you can use a dedicated built-in command, however this command may not be directly available on your distribution.

In order to rename directories on Linux, use “rename” with how you want the files to be renamed as well as the target directory.

$ rename <expression> <directory>

As an example, let’s say that you want to rename all your directories written in uppercases to directories names in lowercase letters.

In order to rename those directories, you would run the following command

$ rename 'y/A-Z/a-z/' *

$ ls -l 

drwxrwxr-x 2 user user 4096 Dec 21 02:26 a_temp
drwxrwxr-x 2 user user 4096 Dec 21 02:26 b_temp

Filtering directories to be renamed

In some cases, you may want to rename only a few directories using the rename command.

In order to achieve that, you essentially have two options :

  • Use wildcards in order to filter directories to be renamed.

For example, if you want to rename directories ending with a given string, you would run the following command.

$ rename 'y/_html/_temp/' *
The syntax used by the rename command is the same one as the sed command : you can use this reference to have more information about this syntax.
  • Use input redirection in order to filter directories to be renamed
$ ls -d *_html | rename 'y/*_html/*_temp/'

When using one of those two options, your folders will be renamed to have a “_temp” extension.

$ ls -l

drwxrwxr-x 2 user user 4096 Dec 21 02:42 a_temp
drwxrwxr-x 2 user user 4096 Dec 21 02:42 b_temp

Awesome, you successfully renamed your directories using the rename command!

Conclusion

In this tutorial, you learnt all the ways of renaming directories on Linux, the most common way being the “mv” command.

You also learnt that it is possible to rename directories using the “find” command in order to locate your directories or by using the rename command (that may not be directly available on your system by default).

If you are interested in Linux System Administration, we have a complete section dedicated to it on the website, so make sure to check it out!

Working Remotely with Linux Systems

As a Linux system administrator, you are responsible for many machines that may be located locally or on distant sites.

In some cases, you will need to connect to them in order to fix an issue with disk space for example.

If you are working with users, some of them may get stuck in an application and you will have to kill the application for them.

In order to perform all those operations, you will need to remotely access those instances from your own computer.

Working with remote Linux instances can be done with handful different solutions depending on your needs.

Sometimes, you simply need to have a remote shell bound to a remote host.

However, what if you are working with an application that requires a graphical interface?

In this tutorial, we are going to explore all the ways of working with remote Linux systems.

From X11, to SSH until XRDP, you will learn everything that there is to know about remote Linux system administration.

Remote communication basics

Before listing and detailing the different ways of connecting to remote Linux hosts, you have to understand a few basics related to the way computers communicate.

Client-server model

Nowadays, computers are rarely standalone computers, they actually communicate with other computers all the time.

When you are browsing the Internet in order to retrieve a web page, you are communicating with another computer in order to perform this task.

When you are playing your favourite game, you are also communicating with another computer (or many different computers) in order to know if you won or not.

All those tasks are based on the same model : the client-server model.

The client-server model is one model that defines and organizes how communication takes place between two computers.

In this scenario, one computer is acting as the client (the one that asks for resources or computation) and another computer is acting as the server (the one that does the work, that performs the computation).

client-server

As you probably understand it, the server is responsible for the actual work and clients are only responsible for delivering the end information to you.

Note : is it the only way for computers to communicate? Not at all, peer-to-peer is another way for example. Ever tried sharing files online via tools like Bittorrent?

Client-server model examples

Now that you know a bit more about the client-server architecture, it is time to unveil what protocols are based on this model to exchange information.

The most famous protocol based on the client-server architecture is the HTTP protocol.

As a reminder, the HTTP protocol is used in order to fetch HTTP resources (usually web pages) from a HTTP server.

In this scenario, you are contacting a remote HTTP server and you are asking for a specific page.

If you are allowed to communicate with this server, you will be answered with the resources that you asked for : usually a web page.

Client-server model examples http-protocol

Note : HTTP servers can be used in order to deliver web pages, but they can be used to download files remotely, or access any kind of information located on a HTTP server

Great!

But why were we talking about the client-server model in the first place?

Because most of the ways of working remotely with Linux systems are based on the client-server model.

Secure and unsecure protocols

In our case, we are not interested in requesting web pages from a distant server.

We are interested in being able to execute commands remotely from one machine (the client) to another (the server).

In order to execute commands remotely from one computer to another, you can choose between two protocols : telnet and SSH.

Telnet protocol

Developed in 1969, telnet is a protocol enabling a bi-directional communication between two hosts in order to be able to send commands remotely.
telnet-2

The telnet protocol is based on the client-server architecture we saw before : a client connects to a TCP server (usually located on port 23 of the remote machine) and starts writing down commands to be executed.

The TCP server understands those commands, performs those operations on the server and returns the output of the command.

If you install a telnet server on your machine, you would be able to connect to it and run commands remotely.

$ sudo apt-get install telnetd -y

$ sudo systemctl status inetd

telnetd

From there, you can use a simple Telnet client in order to connect to your remote server.

$ telnet 127.0.0.1 23

telnet-example
In this scenario, my TCP client is connecting to my remote host and sending commands through the network.
telnet

Awesome, so what’s the catch?

The Telnet protocol is not encrypted.

In short, it means that if a spy was to spy on the network traffic, it would be able to see all the commands that you are sending, as well as the results of your commands.

This is a big issue because most of the time you are accessing distant servers that are located outside your premises, on distant sites where you might not be in complete control of the security.

If you were for example to connect to a SQL database, your username as well as your password would be sent over the network in plain text.

Luckily for you, there is another protocol, safer than Telnet, that was developed in order to execute commands remotely : the SSH protocol.

SSH protocol

On the other hand, the SSH protocol is completely secured.

ssh-1

SSH stands for Secure Shell and it is widely used in order to perform commands remotely in a safe and secure way.

In short, SSH is built on common cryptographic techniques: symmetrical encryption or asymmetrical encryption for the most part.

Those two techniques are in a way verifying the identity of the two hosts as well as encrypting the traffic between those two hosts.

The SSH protocol is also based on the client-server model we saw earlier.

When connecting through SSH to distant servers, you are essentially talking with a SSH server, located remotely, with a SSH client located on your machine.

Note : as you can see in the diagram, the port number should be customized in order to avoid having SSH brute force attacks on your SSH server.

ssh-protocol

In this case, as the traffic is encrypted, if somebody spies on the traffic it won’t be able to read the content sent.

In this tutorial on remote Linux system administration, we are going to focus on the SSH protocol as the Telnet one is considered obsolete in terms of security.

Execute shell commands remotely using SSH

In order to execute commands remotely, we are going to install a SSH server on our host.

There are plenty of different SSH servers but we are going to use an open-source alternative called OpenSSH.

OpenSSH was first released in 1999 and it provides a suite of secure networking tools in order to ensure communications between different hosts over SSH.

OpenSSH brings a SSH server but it also brings many different utilities that are using the power of SSH such as sftp, scp or ssh-keygen.

OpenSSH Server Installation

First of all, you need to install the OpenSSH SSH server on your host.

By default, OpenSSH may be installed depending on your distribution, but you will have to make sure that this is the case.

First, for safety purposes, make sure to update the packages on your system.

$ sudo apt-get update

To install OpenSSH, run one of those commands depending on your distribution (APT or YUM)

$ sudo apt-get install openssh-server

$ sudo yum install openssh-server openssh-clients

Running those commands, the OpenSSH server will be installed on your computer.

$ sudo systemctl status sshd

sshd-service

By default, the SSH server is listening to connections on port 22 as described in the previous sections.

In order to verify that this is the case, run the “netstat” command to list open ports.

$ sudo netstat -tulpn | grep 22

netstat-1

Great!

Your SSH server is now listening on incoming connections from port 22.

Updating firewall rules

As you probably know, the Linux operating system manipulates a built-in firewall that will blocks unauthorized requests.

This built-in firewall can be manipulated with the iptables utility which is one of the tools of the Netfilter framework.

In our case, on Debian recent systems, we are manipulating the UFW front-end in order to manipulate rules.

On CentOS or RHEL systems, you will have to update the FirewallD rules.

To allow SSH traffic on your host, run of the following commands.

$ sudo ufw allow ssh         (on Debian/Ubuntu systems)

$ sudo firewall-cmd --permanent --zone=public --add-service=ssh    (on RHEL/CentOS systems)
$ sudo firewall-cmd --reload

Next, make sure that your settings were correctly taken into account

$ sudo iptables -L | grep ssh

iptables-greop

Connecting to your SSH server

Now that your SSH server is up and running and that traffic is allowed on port 22, it is time for your client to connect.

On your client machine, make sure that you have the SSH utility.

$ ssh -V

ssh-client-version

If you are getting the following output, it means that the SSH client utility is currently installed on your machine.

In order to connect to your SSH server, simply use “ssh” followed by your username and the IP address (or hostname if it is configured) of your SSH server.

$ ssh <username>@<ip|host>

ssh-remote

Right off the bat, you are asked to verify the identity of the SSH server you are communicating with.

Known hosts

When first connecting to your remote SSH server, you are asked to verify the authenticity of the host.

The SSH utility displays a ECDSA key-fingerprint that can be used in order to verify the server authenticity.

Unless you want to verify the identify of your SSH server, you want to enter “yes”.

As a consequence, the server identify will be added to your known_hosts file located in the .ssh directory.

$ cat ~/.ssh/known_hosts

Executing commands

Now that you are connecting to your SSH server, you can start executing remote tasks on your server.

$ ls

ls-command-remote

Your commands are not executed on your local machine but they are executed on the remote machine.

To verify it, run a command that is simply sleeping for an extended period of time on the client.

$ sleep 100

Back on the server, you can actually verify that the command is actually running on the server.

server-ssh

You can even see which user started it and the terminal it used in order to start it.

Great!

But what if you wanted to execute a graphical application, like Firefox for example?

$ firefox

firefox-no-display

When trying to execute a graphical application remotely, you will get an error message stating that you have no DISPLAY environment variable specified.

So what happened here?

In order to execute graphical applications remotely over SSH, we are going to dig deeper into another protocol that is used to draw applications remotely : the X11 protocol.

Execute graphical applications remotely using X11 forwarding

Before executing your first commands with X11, you will have to understand how applications are being displayed on Linux hosts.

Understanding the X protocol

Back in the early days on Linux, users did not have any graphical applications or desktop environments installed on their computers.

Instead, they were dealing with plain terminals in order to execute commands (also called tty terminals)

Later on, graphical applications, relying on window systems, became popular and were used as a way to democratize administration for users.

In order to be able to display windows on the screen, the X protocol was invented.

The X protocol, also known as the X Window System Core Protocol, is a client-server protocol where applications (known as X clients) are connecting to X servers.

X servers, also known as display servers, are responsible for transmetting data from the user to the client applications, receiving the response from client applications and communicating with drivers in order to render your application.

In short, a X server is responsible for making applications appear on your screen.

x-architecture

As a consequence, every Linux system that is running graphical applications has a display server running on them.

This is an important concept to grasp because it essentially means that the graphical interface is decorrelated from the process itself.

As a consequence, programs can be run on distant machines but they can be presented on other machines : your local machine for example!

In this case, this is the architecture that you would have.

remote

In this example, you have two separate computers, but each computer has a display server running on it.

This is also an important point to grasp because even if the X protocol is based on the client-server model, the client AND the server might be located on the same computer.

Now that you know more about the X protocol, let’s see how applications are being displayed on your physical computer screen.

X Protocol & Displays

As detailed before, the display server (also called X server) located on your computer will be responsible for drawing graphical interfaces on your screen.

As a consequence, display servers need to be connected to display devices that are most of the time materialized as computer screens.

In order to know where the graphical interface needs to be drawn, your session has a DISPLAY environment variable that details the output device.

$ echo $DISPLAY

echo-display

The syntax for the DISPLAY environment variable is as follows :

hostname:D.S

Where :

  • hostname : the name of the computer where the display server runs. In this case, this is omitted meaning that it runs on localhost;
  • D : represents the number of displays connected to your computer. If you have two screens, you might get a different sequence number;
  • S : represents the screen number in case your display device has multiple screens.

In this case, we are in the standard case, meaning that the display server displays graphical interfaces on the localhost machine, on the first screen.

The xrandr command can be used in order to see screens connected to a Linux system.

$ xrandr --query

querying-displays

When running X clients applications on your instance, graphical instructions will be redirected to your main screen, but what if you wanted to have applications running on your server and the graphical interface on your local machine?

For this, we are going to use SSH forwarding.

Using SSH and X11 forwarding

When using the SSH client, you can append the “-x” option in order to redirect X11 traffic to your local machine.

As a consequence, the application will run on the server but it will be displayed on your client.

$ ssh -x <user>@<host>

remote-ssh-x

As you can see, when displaying the DISPLAY environment variable, the output is quite different.

This time, we have the hostname (localhost), the device number (10) as well as the screen number (0).

In this case, the device number is not actually connected to a real piece of hardware on the server but it is mapped to a remote connection on the server.

To illustrate that, try listing the open connections on your server starting with 60.

open-ports

As you can see, there is one line stating that one application is listening on port 6010 : this is actually the end point used in order to access remote displays on the server.

With that in mind, you are now able to forward the X11 traffic to your client, meaning that you can launch graphical applications from the server to your client.

On the client, while being connected to a SSH session, try running Firefox.

$ firefox

firefox-display

Awesome!

You are now able to run graphical programs remotely.

The Firefox process is running on the server, but the graphical interface is displayed on the client.

Now displaying graphical applications is fantastic, but what if you wanted to display whole desktop environments?

For this, you need to deal with what we call the RDP protocol.

Execute remote desktop using the RDP protocol

Developed by Microsoft and embedded in the early versions of Microsoft Windows servers, the RDP protocol is a proprietary protocol used in order to provide remote desktop access.

The RDP is well established on Windows operating systems and you might have already dealt with the RDP client on your Windows machine : mstsc.

xrdp-windows

Luckily for you, Linux has a wide variety of open-source alternatives in order to be able to access your Linux system remotely.

The RDP protocol is also based on the client-server model : on one hand you will have a RDP server sitting and waiting for connections on port 3389.

A popular implementation of the RDP protocol on Linux is the xRDP project that is essentially an open-source RDP server.

On the other hand, you will have a RDP client making connections to this remote server and forwarding the display to your RDP client program.

Popular RDP clients on Linux can be RemminaTigerVNC or RealVNC.

You can even use the mstsc Windows client if you are on a Windows machine.

Here is an example of a RDP connection to a Linux host using a Microsoft host computer.

Conclusion

In this tutorial, you learnt a bit more about the different ways of working with Linux systems remotely.

You learnt about at least four different ways of doing so : using the Telnet protocol (which is not used anymore today), the SSH protocol (which is far more secure), the X protocol (used in order to display graphical applications remotely) and the RDP protocol (in order to display remote desktop environments).

If there are tools that you use and that you want to share with the community, make sure to leave a comment below with your implementation, it always helps.

Also, if you are curious about Linux system administration, make sure to have a read at our other tutorials on the subject.

APT Package Manager on Linux Explained

As a system administrator, knowing how to installupdate or delete Linux packages is crucial in order to maintain your Linux hosts and servers.

You may have to update your current packages in order to get the latest security patches for your servers.

On the other hand, you may have to setup an entire HTTP Web Server in order to deploy a brand new website your application team developed.

Sometimes, you might just want to test new software in order to see if it fits your needs, just to uninstall the packages later on.

As a consequence, knowing how to manage Linux packages is crucial.

In this tutorial, we are going to focus on Linux package management using the APT package manager.

First, we are going to go through a bit of history on the origins of Open Source Software in order to grasp the fundamentals of Linux packages.

Later on, we will be focusing a bit more on APT (Advanced Package Tool) and we are to see how you can compile your own programs in order to have custom installations.

Ready?

What You Will Learn

By reading this tutorial, you are going to learn about the following subjects:

  • The GNU/Linux project and the origins of free software;
  • How program installation is designed on Linux and how it differs from Windows hosts;
  • What software repositories are and what are the different repository types that you may encounter;
  • APT cache complete guide : how to update your cache and how to look for programs on your system;
  • APT get detailed commands : how to install and remove Linux packages properly;

That’s a long program, so without further ado, let’s jump right into it.

GNU/Linux Origins explained

Before detailing how you can install and uninstall packages on Linux, it is quite important to have some basics about what GNU and Linux are.

When discussing about Linux, it is quite common to refer to Linux as an operating system but it is not entirely true.

When we refer to Linux operating systems, we actually refer to GNU/Linux operating systems.

What is GNU?

GNU (short for “GNU Not Unix“) is an operating system designed to provide free software that is easy to acquire, distribute and modify by yourself.

Today, it is quite popular to think of Unix (or at least GNU) as a free operating system but it has not always been the case.

For example, if you were to run the “uname” command on your system, with the “-o” option (for operating system), you would see that you are most probably running a GNU/Linux OS.

$ uname -o

uname-o

Interesting!

Back in the early days of Unix, the Unix operating system was not free open source software at all : it was developed by companies such as IBM, HP or AT&T for business needs.

However, given the rise of copyrighted software and licensed distributions, Richard Stallman, researcher at the MIT, decided to develop an UNIX alternative : GNU.

GNU was designed to be an open source operating system that is backwards compatible with Unix operating systems.

On a GNU operating system, commands (such as “ls”, “cat” or “gcc”) are completely open source.

Moreover, GNU is defined as “free software”, but are we talking about money?

Free Software Explained

As we explained before, GNU is self-proclaimed to be free software.

But the “free” software does not literally refers to money, it refers to fundamental freedoms that are core to GNU :

  • Freedom to run programs as you wish, for whatever purpose you are trying to achieve;
  • Freedom to modify the program, preconditioning a complete access to the program source code;
  • Freedom to redistribute copies in order to help your neighbours;
  • Sharing your modifications to the world : this way you are not limited to your own changes, but you can help thousands of developers around the world.

Those fundamental freedoms are tied to the concept of copyleft.

You can have specific rules added to this set of fundamental rules as long as they don’t interfere with the core rules themselves.

It would be for example illegal to take open source code, modify it and sell it without providing end users with the source code.

This is an important statement because it means that it is not forbidden to sell code per-se, it is illegal to prevent end-users from fundamental rights.

As a consequence, GNU not only defines those fundamental freedom rules but it also guarantees that those rights are spread on all subsequent versions of the software.

In this context and in order to deal with copyrighted content, the GPL license (GNU General Public License) was created.

The GPL license helps aggregating the different inherent rules dedicated to free software and to define license breaches. A complete GPL FAQ can be found on this page.

Accessing GNU packages

As you probably already understood it, GNU packages are packages designed to be sharedmodified and run wherever you want, for whatever purpose.

In this context, GNU packages can be accessed directly on the GNU website.

If you head over to the software page of the GNU official website, you would be able to see the entire list of GNU packages available to you.

gnu-packages-2

As you can see, this is quite a long list of packages.

One of the most popular package is the core-utils one, providing the most popular Linux commands such as “ls“, “cat” or “find“.

And those commands can be modified and run at will!

All those packages come whenever you installed a GNU/Linux operating system, but what if you wanted to install third-party software?

For that, Linux is designed to communicate with what we call software repositories.

What are Linux software repositories?

Before describing Linux software repositories, it is quite important to have some background about how the Linux packaging system is designed.

If you come from a Window environment, you are used to download executable files (.exe) from the Internet, open an installation wizard and click “Next” a couple of times in order to have your program installed.

On Linux, this is not the case.

On Linux, packages are downloaded and installed from online repositories by a package manager.

linux-packaging-system

APT Package Manager on Linux

On Debian-based distributions, packages are downloaded via the APT package manager.

When we refer to packages, we are essentially dealing with archive files that contain multiple deb files that are used by the dpkg utility to install programs.

So why would we need APT?

APT, standing for Advanced Package Manage, is responsible for downloading, installing, updating and removing packages from your system.

But why should we need a package manager for that?

Couldn’t we install the programs by ourselves?

It would be hard because packages are tied together through package dependencies.

Concept of packages dependencies

Let’s take for example the VIM Editor on Linux.

If you are using a Debian distribution, the VIM page for Debian repositories is available here.

This is where you would be able to find information about this specific package : what functionalities it provides, who created it but most importantly what packages it depends on.

On Linux, most packages don’t come as “pure” packages : they depend on a wide variety of different packages in order to provide third-party features to the actual program.

As you can see by scrolling the page, the VIM package depends on a lot of other packages.

dependencies

As you can see here, dependencies are split into four different categories :

  • depends : as the name states, this dependency is needed in order for the program to run. Without it, the program cannot start at all;
  • recommends : without this dependency, the program would be able to run but it would not provide all the features that this tool is designed to provide;
  • suggests : without this dependency, the program would be able to run and provide core functionalities but it would not be the final version of the tool;
  • enhances : this dependency can enhance the actual tool by improving performance or display but it is not needed at all.

Now you know why we use the APT tool : package dependencies are resolved for you at the installation time.

Now that you know more about packages, let’s have a look at what software repositories are.

Linux Software Distribution Architecture

On Linux, software is distributed through software repositories.

Software repositories are used in order to aggregate free software provided by the community.

Repositories may be tied to a specific distribution.

Ubuntu, Debian, CentOS or RHEL have their own repositories that are updated daily.

As a consequence, when you want to install a new program, you are querying those base repositories in order to retrieve packages from them.

If you wanted to install packages that are not located on distribution based repositories, you would add your own trusted repositories to your system in order to install new packages.

This is essentially why Linux is said to be safer than Windows when it comes to installing new programs.

Unless you are installing shady packages, you are most of time communicating with trusted repositories where hundreds of different developers review the code you are executing on your machine.

It does not prevent viruses or malwares from spreading but it would be unlikely because multiple independant reviewers have inspected the package code.

Note that repositories are split into different categories and you may have to take the correct repository to guarantee that you are running safe versions.

Such versions include “stable” repositories, “testing” repositories and “unstable repositories”.

Note : in other distributions, those repositories may have a different name (Ubuntu has multiverse, universe, main and so on)

By default, your distribution is most likely linked to the stable (also called “main”) repository via the sources.list configuration file.

$ cat /etc/apt/sources.list

sources-list

Now that you know more about software repositories and how your package manager interacts with it, let’s see what your package manager cache is.

APT Cache Explained

As we discussed before, the APT package manager is responsible for downloading packages from the Internet (or local package repositories) in order to install them.

However, operations done by the APT package manager are not done online all the time.

APT manages an internal database called a cache.

The APT cache is used in order to provide offline information about current packages installed on your system. It essentially guarantees that you are able to access package information without having to be connected to Internet.

Searching packages in the cache

For example, you can search for all Python related packages that may be stored in your APT cache.

For that, you need to run the “apt-cache” command with the “search” option.

$ apt-cache search <search string>

$ apt-cache search python
Tip : you can pipe your cache search with “less” in order to see all the results provided.

apt-cache search python | less

cache-search

Besides searching for specific packages, the APT cache can also show complete details about a package.

Showing package information

In order to show package information, you have to execute the “apt-cache” command with the “show” option.

$ apt-cache show <package_name>

For example, in order to have more information about the gcc package (which is a GNU compiler), you would type

$ apt-cache show gcc

apt-cache-show

So if we don’t need an Internet connection to show some details, it means that they are stored somewhere on our instance.

Let’s say for example that I want to look for the cache documentation page for the nano command.

By default, package information is stored in “/var/lib/apt/lists

$ ls /var/lib/apt/lists

var-lib-apt

In this directory, you have access to a bunch of different files that store information about packages stored in your system.

To prove it, let’s look for the file providing information for the nano command.

$ sudo grep -r -A 3 -B 3 -E "Package: nano$" . | less

cache-info

As you can see, many files contain a reference to the nano packagebut the one I am interested in is definitely contained in the “main amd64” file (as a reminder, “main” stands for stable and “amd64” to my processor architecture)

If you have any doubts about your CPU architecture, you run the “lscpu” command.

$ lscpu

processor-architecture

Updating the APT cache

As you already probably understood, the APT cache works offline.

As a consequence, the APT cache has to be updated periodically in order to make sure that :

  • You are pointing to the correct repositories;
  • You are getting the latest updates for the software installed on your computer.

In order to fetch updates for your cache, you need to run the “apt-get update” command.

$ sudo apt-get update
Note : you have to be sudo in order to update your system cache.

update-packages

As you can see, the command executes a couple of GET calls to distant repositories in order to fetch new information.

When you update your repositories, it is important to note that no software was updated on your computer.

Running the “apt-get update” command only updates the cache in order to have latest information about software, it does not directly update your programs.

In order to update your programs, you need to execute the “upgrade” command.

Updating (upgrading) your local packages

In order to update your local programs, you need to run “apt-get” with the “upgrade” option.

$ sudo apt-get upgrade

At some point during the command, you will be asked if you want to install the updates.

want-to-continue

Hit “Y” and press Enter.

When confirming, your programs will be upgraded to their latest stable version.

upgrade-packages

Nice!

You have successfully upgraded your packages using the upgrade command.

Updating your cache and upgrading your current packages is really nice, but what if you wanted to install new packages on your system?

In the next section, you are going to see how you can install new programs on your system using APT.

Installing new packages with APT

When installing new packages with APT, you essentially have two options :

  • The package is already located in one of the repositories you are linked to;
  • The package is in a distant repository and you need to add it.

In order to see if a package is already located into your APT cache, you have to search your cache.

In this example, we are going to pretend that we want to install the “gcc” package.

First, let’s see if GCC is already located into our APT cache.

$ apt-cache search --names-only ^gcc$

search-cache

As you can see, the GCC package is directly available with my default local repositories.

Installing software found in the cache

When the package is directly available in default repositories, you can install it by running the “apt-get” command with the “install” option.

Note : you will need sudo privileges in order to install new packages on your system.
$ sudo apt-get install gcc

You may also be asked if you accept to install this package on your system.

Hit “Y” whenever you are prompted with this question.

Installing software found in the cache

Shortly after, your program should be install and usable.

In order to verify that the command was correctly installed, you can run the “whereis” command.

$ whereis gcc

whereis-gcc

Awesome, you have correctly installed the GCC compiler on your system!

Installing software unavailable in the cache

In some cases, you may want to install software that is not directly stored into your APT cache.

For this section, we are going to install Grafana, an open-source dashboarding that is not directly available in the default packages.

Note : this is not a tutorial dedicated to Grafana, this part will only cover commands related to APT.

First, we need to make sure that the package is not already contained in our APT cache.

$ apt-cache search --names-only ^grafana$
<empty>

In this case, no default packages are installed for Grafana, we need to add it by ourselves.

Adding custom APT repositories

In order to add custom APT repositories, you need to understand the APT sources directory structure.

By default, repositories are stored into the “/etc/apt” folder.

$ ls -l /etc/apt

Adding custom APT repositories etc-apt

In this directory, you have multiple entries :

  • apt.conf.d : the APT configuration folders containing configuration files in order to configure proxies for example;
  • auth.conf.d : can be used in order to store authentication details related to proxies;
  • preferences.d : can be used in order to set priorities to specific repositories or packages;
  • sources. list : contains the default set of repositories for your distribution;
  • sources.list.d : a directory that may contain custom APT repositories;
  • trusted.gpg.d : a set of GPG keys that you trust in order to certify download authenticity.

The “sources.list” is already filled with default repositories, so we may not want to modify this file.

sources-list-2

Instead, we are going to add a custom repository to the sources.list.d directory.

$ sudo touch /etc/apt/sources.list.d/custom.list

In this newly created file, add one entry for the custom repository you want to install.

$ sudo nano /etc/apt/sources.list.d/custom.list

deb https://packages.grafana.com/oss/deb stable main

In order to install packages securely, you may have to import GPG keys.

$ sudo wget -q -O https://packages.grafana.com/gpg.key | sudo apt-key add -
OK
Note : you may not have to add a GPG key for every package but it guarantees that the package is installed securely.

Now that your repository was correctly added, update your APT cache in order for the changes to be applied.

$ sudo apt-get update

apt-get-update (1)

Now, if you search for your package, you should be able to find it.

$ sudo apt-cache show grafana

As a consequence, you are now ready to install your package.

To install your package, simply run the “apt-get” command with the “install” option.

$ sudo apt-get install grafana

Awesome!

Now your package is successfully installed.

As you can see, installing custom software is quite different from installing software available in the cache : you have to add custom repositories and eventually add GPG keys.

Now that you know how to install packages, let’s see how you can uninstall them.

Uninstalling packages with APT

When uninstalling packages using the APT package manager, you essentially have two options : remove or purge.

Removing packages using apt-get remove

The first way to uninstall package is to use the apt-get remove command.

$ sudo apt-get remove <package>

Using the grafana package we installed earlier, this would give

$ sudo apt-get remove grafana

So why would we need a purge function?

When using apt-get remove, the packages are removed but the files associated to the package are left intact.

To see it, try listing the files associated to a package on your system using the dpkg command.

$ dpkg -L <package>

dpkg-list

As you can see, configuration files are still there, that’s why you need to use the purge command.

Purging packages using apt-get purge

In order to use the purge command, simply execute “apt-get” with the “purge” option.

$ sudo apt-get purge <package>

Now if you try to see files associated with your package, you won’t be able to find any of them.

$ dpkg -L <package>
<empty>

Removing dependencies with autoremove

As you probably understood it, the point of using APT is that it retrieves package dependencies for you and install them correctly.

As a consequence, whenever you install new packages, you have the risk of leaving dangling dependencies on your system.

In order to remove dangling dependencies (i.e dependencies not used anymore), you have to use the autoremove option.

You can either do it while uninstalling a package

$ sudo apt-get autoremove <package>

Or you can do it after by executing the autoremove option with no arguments at all.

$ sudo apt-get autoremove

Conclusion

In this long tutorial, you learnt how you can install and uninstall packages using the APT package manager.

You also learnt more about the origins of Open Source Software, where it comes from and how the GNU/Linux operating system emerged from standard Unix operating systems.

If you are interested about Linux system administration, we have a complete section dedicated to it on the website, so make sure to check it out.

Find Text in Files on Linux using grep

This tutorial focuses on finding text in files using the grep command and regular expressions.

When working on a Linux system, finding text in files is a very common task done by system administrators every day.

You may want to search for specific lines in a log file in order to troubleshoot servers issues.

In some cases, you are interested in finding actions done by specific users or you want to restrict lines of a big file to a couple of lines.

Luckily for you, there are multiple ways of finding text into files on Linux but the most popular command is the grep command.

Developed by Ken Thompson in the early days of Unix, grep (globally search a regular expression and print) has been used for more than 45 years by system administrators all over the world.

In this tutorial, we will focus on the grep command and how it can help us effectively find text in files all over our system.

Ready?

Grep Syntax on Linux

As specified above, in order to find text in files on Linux, you have to use the grep command with the following syntax

$ grep <option> <expression> <path>

Note that the options and the path are optional.

grep-version

Before listing and detailing all the options provided by grep, let’s have a quick way to memorize the syntax of the grep command.

In order to remember the syntax of the grep command, just remember that grep can be written as grEP which means that the expression comes before the path.

This is a great way to remember the grep syntax and the find syntax at the same time but the find syntax is the exact opposite : path first and expression after.

Quick grep examples

There are complex options that can be used with grep, but let’s start with a set of very quick examples.

Listing users using grep

On Linux, as you already probably know it, user accounts are listed in a specific file called the passwd file.

In order to find the root account in a specific file, simply enter your text and the file you want to search into.

$ grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash

Another very popular way of using the grep command is to look for a specific process on your Linux system.

Filtering Processes using grep

As explained in one of our previous tutorials, you have to use the “ps” command in order to list all the processes currently running on your system.

You can pipe the “ps” command with the “grep” command in order to filter the processes you are interested in.

$ ps aux | grep <process>

If you are interested in bash processes for example, you can type the following command

$ ps aux | grep bash

root      1230  0.0  0.0  23068  1640 tty1     S+   Jan11   0:00 -bash
user      2353  0.0  0.1  23340  5156 pts/0    Ss   03:32   0:00 -bash
user      2473  0.0  0.0  14856  1056 pts/0    S+   03:45   0:00 grep --color=auto bash
user      6685  0.0  0.0  23140  1688 pts/2    Ss+  Nov09   0:00 bash
Note : if you are not sure about how to use pipes on Linux, here’s a complete guide on input and output redirection.

Inspecting Linux Kernel logs with grep

Another great usage of the grep command is to inspect the Linux Kernel buffer ring.

This is heavily used when performing troubleshooting operations on Linux systems because the kernel will write to its buffer ring when starting or booting up.

Let’s say for example that you introduced a new disk into your system and you are not sure about the name given to this new disk.

In order to find out this information, you can use the “dmesg” command and pipe it to the grep command.

$ dmesg | grep -E sd.{1}

list-disks-grep

Grep Command Options

The grep command is very useful by itself but it is even more useful when used with options.

The grep command literally has a ton of different options.

The following sections will serve as a guide in order to use those options properly and examples will be given along the way.

Search specific string using grep

In some cases, you may interested in finding a very specific string or text into a file.

In order to restrict the text search to a specific string, you have to use quotes before and after your search term.

$ grep "This is a specific text" .

To illustrate this option, let’s pretend that you are looking for a specific username on your system.

As many usernames may start with the same prefix, you have to search for the user using quotes.

$ grep "devconnected" /etc/passwd

grep-specific-text

Search text using regular expressions

One of the greatest features of the grep command is the ability to search for text using regular expressions.

Regular expressions are definitely a great tool to master : they allow users to search for text based on patterns like text starting with a specific letters or text that can be defined as an email address.

Grep supports two kinds of regular expressions : basic and extended regular expressions.

Basic Regular Expressions (BRE)

The main difference between basic and extended regular expressions is the fact that you can use regular expressions symbols with BRE (basic regular expressions) but they will have to be preceded by a backslash.

Most common regular expression patterns are detailed below with examples :

  • ^ symbol : also called the caret symbol, this little hat symbol is used in order to define the beginning of a line. As a consequence, any text after the caret symbol will be matched with lines starting by this text.

For example, in order to find all drives starting with “sd” (also called SCSI disks), you can use the caret symbol with grep.

$ lsblk | grep "^sb"

caret-symbol-linux

  • $ symbol : the dollar sign is the opposite of the caret symbol, it is used in order to define the end of the line. As a consequence, the pattern matching will stop right before the dollar sign. This is particularly useful when you want to target a specific term.

In order to see all users having bash shell on your system, you could type the following command

$ cat /etc/passwd | grep "bash$"

dollar

  •  (dot symbol) : the dot symbol is used to match one single character in a regular expression. This can be particularly handy when search terms contain the same letters at the beginning and at the end but not in the middle.

If for example, you have two users on your system, one named “bob” and one named “bab”, you could find both users by using the dot symbol.

$ cat /etc/passwd | grep "b.b"

dot-regex

  • [ ] (brackets symbol) : this symbol is used to match only a subset of characters. If you want to only match “a”, or “o”, or “e” characters, you would enclose them in brackets.

Back to the “bob” example, if you want to limit your search to “bob” and “bab”, you could type the following command

$ cat /etc/passwd | grep "b[ao]b"

brackets-regex

Using all the options provided before, it is possible to isolate single words in a file : by combining the caret symbol with the dollar symbol.

$ grep "^word$" <file|path>

word-grep

Luckily for you, you don’t have to type those characters every time that you want to search for single word entries.

You can use the “-w” option instead.

$ grep -w <expression> <file|path>

search-word

Extended Regular Expressions (ERE)

Extended regular expressions as its name states are regular expressions that are using more complex expressions in order to match strings.

You can use extended regular expressions in order to build an expression that is going to match an email address for example.

In order to find text in files using extended regular expressions, you have to use the “-E” option.

$ grep -E <expression> <path>

One great usage of the extended regular expressions is the ability to search for multiple search terms for example.

Searching multiple strings in a file

In order to search for multiple strings in a file, use the “-E” option and put your different search terms separated by straight lines (standing for OR operators in regular expressions)

$ grep -E "text1|text2|text3" <path>

Back to our previous grep example, you could find the root account and the bob account using extended regular expressions.

$ grep -E "root|bob" /etc/passwd

regular-expression-or

Search for IP addresses using grep

In some cases, you may want to isolate IP addresses in a single file : using extended regular expressions is a great way of finding IP addresses easily.

Plenty of different websites provide ready-to-use regular expressions : we are going to use this one for IP addresses.

"\b([0-9]{1,3}\.){3}[0-9]{1,3}\b"

How would you read this regular expression?

An IP address is made of 4 3-digit numbers separated by dots, this is exactly what this regular expression describes.

([0-9]{1,3}\.){3}      = 3 3-digits numbers separated by dots

[0-9]{1,3}             = the last 3-digits number ending the IP address

Here is how you would search for IP addresses using the grep command

grep -E "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" <file|path>

ip-address-regex

Search for URL addresses using grep

Similarly, it is entirely possible to search for URL addresses in a file if you are working with website administration on a daily basis.

Again, many websites provide regular expressions for URLs, but we are going to use this one.

grep -E '(http|https)://[^/"]+' <file|path>

url-regex (1)

Search for email addresses using grep

Finally, it is possible to search for email addresses using extended regular expressions.

To find email adresses, you are going to use the following regular expression

grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" <file|path>

grep-email

Now that you have seen how to use extended regular expressions with grep, let’s see how you can recursively find text in a file using options.

Find Text Recursively with grep

In order to find text recursively (meaning exploring every directory and its children) on Linux, you have to use “grep” with the “-r” option (for recursive)

$ grep -R <expression> <path>

For example, to search for all files containing the word “log” in the /var/log directory, you would type

$ grep -R "log$" /var/log

Using this command, it is very likely that you will see a lot of entries with permission denied.

In order to ignore those permission denied entries, redirect the output of your command to /dev/null

$ grep -R "log$" /var/log 2> /dev/null

recursive

In order to find text recursively, you can also use the “-d” option with the “recurse” action.

$ grep -d recurse "log$" /var/log

Searching text recursively can be pretty handy when you are trying to find specific configuration files on your system.

Printing line numbers with grep

As you can see in our previous examples, we were able to isolate lines matching the pattern specified.

Over, if files contain thousands of lines, it would be painful identifying the file but not the line number.

Luckily for you, the grep option has an option in order to print line numbers along with the different matches.

To display line numbers using grep, simply use the “-n” option.

$ grep -n <expression> <path>

Going back to our user list example, if we want to know on which line those entries are, we would type

$ grep -n -E "root|bob" /etc/passwd

line-numbers

Find text with grep using case insensitive option>

In some cases, you may not be sure if the text is written with uppercase or lowercase letters.

Luckily for you, the grep command has an option in order to search for text in files using a case insensitive option.

To search for text using the case insensitive option, simply use the “-i” option.

$ grep -i <expression> <path>

case-insensitive

Exclude patterns from grep

Grep is used as a way to identify files containing a specific files, but what if you wanted to do the exact opposite?

What if you wanted to find files not containing a specific string on your Linux system?

This is the whole purpose of the invert search option of grep.

To exclude files containing a specific string, use “grep” with the “-v” option.

$ grep -v <expression> <file|path>

As a little example, let’s say that you have three files but two of them contain the word “log”.

In order to exclude those files, you would have to perform an invert match with the “-v” option.

invert-grep

Show filenames using grep

In some cases, you are not interested in finding text inside files, but only in their filenames.

In order to print only filenames, and not the filename with the actual output, use the “-l” option.

$ grep -l <expression> <path>

Using our previous example, we would not get the content of the file, but only the filename.

grep-l

Conclusion

In this tutorial, you learnt how you can easily find text in files on Linux.

You learnt that you can use many different options : basic regular expressions or more advanced (extended) regular expressions if you are looking to match IP addresses or phone numbers for example.

You also discovered that you can perform inverse lookups in order to find files not matching a specific pattern on your system.

If you are interested in Linux system administration, we have a complete section dedicated to it on the website, so make sure to have a look!

How To Archive and Compress Files on Linux

As a system administrator, you may have downloaded some archives that you need to extract in order to reveal their files.

You may be also backing up an entire database made of a wide variety of small files that you want to aggregate in one single archive.

Archiving and compressing files are common operations in the Unix world, done by system administrators on a very regular basis.

Luckily for you, Linux exposes a set of different commands in order to archive, compress, uncompress and extract files from an archive.

In this tutorial, you will learn more about the tar command as well as the different compression methods that can be used in order to save space on Linux.

Ready?

Archive files on Linux using tar

Tar is a very popular command among system administrators.

Sometimes referred as tarball, tar was historically used to write data to devices that did not have file systems at the time.

As a consequence, the tar command was introduced in 1979 in order to replace the “tp” program that was used back then.

Nowadays, the tar command is widely used in order to archive files (meaning putting files together in a single archive).

To archive files on Linux using tar, run “tar” with the “cvf” options.

$ tar -cvf archive.tar file1 file2 directory1 directory2

file1/
file2/
directory1/
directory2/

In this case, we used three different options :

  • -c : for create archive, a pretty self-explanatory option if you want to create a new archive made from the files selected;
  • -v : for verbose, this is the reason why the command displays the files added to the archive when executing it;
  • -f : for file, this option is used in order to specify the filename of the archive we want to create (in this case archive.tar)

Those options are probably the most important options for archiving files on Linux.

When running the tar command with the “-f” flag, a new archive was created in your current working directory.

$ ls -l
total 20
-rw-rw-r-- 1 schkn schkn 10240 Nov  9 10:41 archive.tar
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory1
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory2
-rw-rw-r-- 1 schkn schkn     0 Nov  9 10:41 file1
-rw-rw-r-- 1 schkn schkn     0 Nov  9 10:41 file2

As you can see, the size of the archive is bigger than the sum of the files in it.

Why?

Creating a tar archive does not simply put files and directories in a big box : an archive is also a special file made of special file headers that may take a substantial amount of space.

As a consequence, your archive is way bigger than the sum of the files in it.

This is a very important fact because we are able to understand that archiving files does not mean that your files are compressed in it.

In order to compress files when archiving, you need to provide other options to the tar command.

File compression will be explained in the next chapters.

Extract files using tar on Linux

Now that you have created an archive file, you may want to extract the files located in your archive.

To extract files using the tar command, append the “-x” option instead of the initial “-c” option.

$ tar -xvf archive.tar

file1
file2
directory1/
directory2/

Note that extracting your files does not mean that the archive will be deleted from your current working directory.

$ ls -l

total 28
-rw-rw-r-- 1 schkn schkn 10240 Nov  9 12:01 archive.tar
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory1
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory2
-rw-rw-r-- 1 schkn schkn     0 Nov  9 12:00 file1
-rw-rw-r-- 1 schkn schkn     0 Nov  9 10:41 file2

When extracting files on Linux, there a little gotcha that you need to be aware of.

If a file on the current working directory has the same name as a file inside the archive, the content of the file in the working directory will be replaced with the one from the archive.

In order to illustrate it, add some content to one of your file, extract your files and re-inspect the content of your file again.

$ echo "Added some content to the file" > file1

$ tar -xvf archive.tar

$ cat file1
<empty>

Comparing local files with archive files

In order to prevent data to be erased during the process, the tar command can compare files located in your current working directory with files in your archive.

Back to the example we discussed earlier, let’s add some content back to the “file1” file.

$ echo "Added some content to the file" > file1

In order to compare files with tar, use the “-d” option.

$ tar -dvf archive.tar

file1
file1: Mod time differs
file1: Size differs
file2
directory1/
directory2/

As you can see, tar will compare timestamps and more specifically the latest modification date of the file.

If the modification date of the local file is more recent than the one from the archive file, the tar command will display a notice showing that the modification time differs.

Similarly, tar can inspect file sizes and highlight size differences between your files.

In order to avoid erasing your files, you can use the star command which is a great alternative to the existing tar command.

Prevent file overwriting using star

By default, the star utility might not be installed on your system.

In order to install the star utility, run the YUM utility

$ sudo yum install star

Then, in order to archive files with star, simply run “star” with the “-c” option.

$ star -c -f=archive.tar file1 file2

Then, you can use the gzip or gunzip utility in order to compress your new archive.

$ gzip archive.tar

As a consequence, the initial tar file will be transformed into a tar.gz archive.

Now if you were to create a file with the exact same name, the star utility would not overwrite it by default.

$ echo "This is some content" > file1

$ gzip -d archive.tar.gz

$ star -x -f=archive.tar
star: current 'file1' newer.
star: current 'file2' newer.
star: 1 blocks + 0 bytes (total of 10240 bytes = 10.00k).

$ cat file1
This is some content

Quite handy when you are afraid of losing your content!

Compressing files using gzip on Linux

Now that you have your tar archive ready, the next step is to compress it in order to reduce its size.

For that, we are first going to use the gzip utility.

By default, the gzip utility should be installed, but if this is not the case, make sure to install it depending on your distribution.

$ sudo apt-get install gzip

$ sudo yum install gzip

Now that gzip is installed, run “gzip” and pass the archive you just created as an argument.

$ gzip archive.tar

Running the gzip command will create a tar.gz file in the current working directory.

Most importantly, the initial tar file will be upgraded to a tar.gz so you won’t have the initial archive anymore.

$ ls -l
total 12
-rw-rw-r-- 1 schkn schkn  184 Nov  9 10:41 archive.tar.gz
drwxrwxr-x 2 schkn schkn 4096 Nov  9 10:41 directory1
drwxrwxr-x 2 schkn schkn 4096 Nov  9 10:41 directory2
-rw-rw-r-- 1 schkn schkn    0 Nov  9 10:41 file1
-rw-rw-r-- 1 schkn schkn    0 Nov  9 10:41 file2

As you can see, the file size was dramastically reduced from 10 Kb to a stunning 184 bytes, gzip reduced the filesize by over 98%.

However, if you don’t want to use the gzip utility, you can also compress files using the tar command with options.

Do you think it can improve the compression rate?

Compressing files on Linux using tar

As mentionned in the first section, the tar command can be used in order to archive and compress files in one line.

In order to compress files with tar, simply add the “-z” option to your current set of options.

$ tar -cvzf archive1.tar.gz file1 file2 directory1 directory2

Similarly to the first tar command that you have run, a new compressed archive file will be created in your current working directory.

To inspect files created, simply run the “ls” command again.

$ ls -l
total 28
-rw-rw-r-- 1 schkn schkn   184 Nov  9 10:41 archive.tar.gz
-rw-rw-r-- 1 schkn schkn   172 Nov  9 11:10 archive1.tar.gz
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory1
drwxrwxr-x 2 schkn schkn  4096 Nov  9 10:41 directory2
-rw-rw-r-- 1 schkn schkn     0 Nov  9 10:41 file1
-rw-rw-r-- 1 schkn schkn     0 Nov  9 10:41 file2

Now as you can see, the compressed archive created is slightly lighter than the one created with gzip.

Compressing files using bzip2

Most of the time, the gzip command is used in order to compress files or archives.

However, this is not historically the only compression method available in software engineering : you can also use bzip2.

The main difference between gzip and bzip2 is in the fact the gzip uses the LZ77 compression algorithm while bzip2 uses the Burrows-Wheeler algorithm.

Bzip2 is known to be quite slower than the gzip algorithm, however it can be handy in some cases to know how to compress using bzip2.

To compress files using bzip2, simply run “bzip2” with the filename that you want to compress.

$ bzip2 archive.tar

In order to decompress files compressed using bzip2, simply append the “-d” option to your command.

$ bzip -d archive.tar.bz2

Alternatively, you can create bz2 archives using the tar command and by specifying the “-j” option.

$ tar -cjf archive.tar.gz2 file1 file2

Using tar, you have the option to compress using a wide panel of different compression methods :

  • -j : compress a file using the bz2 compression method;
  • -J : uses the xz compression utility;
  • –lzip : uses the lzip compression utility;
  • –lzma : uses the lzma compression utility;
  • –lzop : uses lzop to compresss files
  • -z : equivalent to the gzip or gunzip utility.

Conclusion

In this tutorial, you learnt how you can archive and compress files using the tar utility on Linux.

You also learnt about the different compression methods available and how they can be used in order to reduce the size of your files and directories.

If you are curious about Linux system administration, we have a complete section dedicated to it on the website, so make sure to have a look.