Avoiding security holes when developing an application - Part 1

ArticleCategory:

Software Development

AuthorImage:

TranslationInfo:

Original in fr Frédéric Raynal, Christophe Blaess, Christophe Grenier

AboutTheAuthor:

Christophe Blaess is an independent aeronautics engineer. He is a Linux fan and does much of his work on this system. He is in charge of the coordination for the man pages translation published by the Linux Documentation Project.

Christophe Grenier is a 5th year student at the ESIEA, where he works as a sysadmin too. He has a passion for computer security.

Frédéric Raynal uses Linux, certified without (software or other) patents. Apart from that, you must see Dancer in the Dark : besides Björk who is great, this movie can't leave you unconcerned (I can't say more without unveiling the end, both tragic and splendid).

Abstract

This article is the first one in a series about the main security holes that can usually to appear within an application. Along these articles, we'll show the ways to avoid them by changing a little the development habits.

ArticleIllustration

ArticleBody

Introduction

It doesn't take more than two weeks before a major application, part of most Linux distributions, presents a security hole, allowing, for instance, a local user to become root. Despite the great quality of most of this software, ensuring the security of a program is a hard job : it must not allow a bad guy to benefit illegally from system resources. The availability of application source code is a good thing, much appreciated by programmers, but the smallest defect in a software becomes visible to everyone. Furthermore, the detection of such defects comes at random and people doing that sort of things do not always act with good intentions.

From the sysadmin side, daily work consists of reading the lists concerning security problems and updating immediately the involved packages. For a programmer it can be a good lesson to try out such security problems. Avoiding security holes from the beginning is preferred. We'll try to define some "classic" dangerous behaviors and provide solutions to reduce the risks. We won't talk about network security problems since they often come from configuration mistakes (dangerous cgi-bin scripts, ...) or from system bugs allowing DOS (Denial Of Service) type attacks to prevent a machine from listening to its own clients. These problems concern the sysadmin or the kernel developers, but the application programmer too, as soon as he takes into account external data. For instance, pine, acroread, netscape, access,... on some versions under some conditions allowed remote access or information leaks. As a matter of fact secure programming is everyone's concern.

This set of articles shows methods which can be used to damage an Unix system. We could only have mentioned them or said a few words about them, but we prefer open explanations to make people understand the risks. Thus, when debugging a program or developing your own, you'll be able to avoid or correct these mistakes. For each discussed hole, we will take the same approach. We'll start detailing the way it works. Next, we will show how to avoid it. For every example we will use security holes still presents in wide spread software.

This first article talks about the needed basics for the understanding of security holes, that is the privilege notion and the Set-UID or Set-GID bit. Next, we analyse the holes based on the system() function, since they are easier to understand.

We will often use small C programs to illustrate what we will be talking about. However, the approaches mentioned in these articles are applicable to other programming languages : perl, java, shell scripts... Some security holes depend on a language, but this is not true for all of them as we will see it with system().

Privileges

On a Unix system, users are not equals, neither are the applications. The access to the file system nodes - and accordingly the machine peripherals - relies on a strict identity control. Some users are allowed to do sensitive operations to maintain the system in good condition. A number called UID (User Identifier) allows the identification. To make things easier, a user name corresponds to this number, the association is done in the /etc/passwd file.

The root user, with default UID of 0, can access everything in the system. He can create, modify, remove every system node, but he can as well manage the physical configuration of the machine, mounting partitions, activating network interfaces and changing their configuration (IP address), or using system calls such as mlock() to act on physical memory, or sched_setscheduler() to change the order mechanism. In a future article we will study the Posix.1e features which allow to limit a bit the privileges of an application executed as root, but for now, let's assume the super-user can do everything on a machine.

The attacks we will mention are internal ones, that is an authorized user on a machine tries to get privileges he doesn't have. On the other hand, the network attacks are external ones, coming from people trying to connect to a machine where they are not allowed to. To get the privileges of another users means everything will be done under the name, the UID of that user, and not under the proper username. Of course, a cracker tries to get the root ID, but many others user accounts are of interest, either because they give access to system information (news, mail, lp...) or because they allow reading private data (mail, personal files, etc) or they can be used to hide illegal activities such as attacks towards other sites.

To use privileges reserved for another user, without being able to log under his identity, one must at least have the opportunity to talk to an application running under the victim's UID. When an application - a process - runs under Linux, it has a well defined identity. First, a program has an attribute called RUID (Real UID) corresponding to the user ID who launched it. This data is managed by the kernel and can usually not change. A second attribute completes this information : the EUID field (Effective UID) corresponding to the identity the kernel takes into account when managing the access rights (opening files, reserved system-calls).

To run an application with an Effective UID (its privileges) different from its Real UID (the user who launched it), it's executable file must have a specific bit called Set-UID. This bit is found in the file permission attribute (like user's execute, read, write bits, group members or others) and has the octal value of 4000. The Set-UID bit is represented with an s when displaying the rights with the ls command :

>> ls -l /bin/su
-rwsr-xr-x  1 root  root  14124 Aug 18  1999 /bin/su
>>

The command "find / -type f -perm +4000" displays a list of the system applications having their Set-UID bit set to 1. When the kernel runs an application with the Set-UID bit set to 1, it uses the owner's identity as EUID for the process. On the other hand, the RUID doesn't change and corresponds to the user who launched the program. Talking about /bin/su for instance, every user can have access to this command, but it runs under its owner's identity (root), accordingly having every privileges on the system. Needless to say one must be very careful when writing a program with this attribute.

Each process also has an Effective group ID, EGID, and a real identifier RGID. The Set-GID bit (2000 in octal) in the access rights of an executable file, asks the kernel to take the owner's group of the file as EGID and not the one of the group having launched the program. A curious combination sometimes appears, with the Set-GID set to 1 but without the group execute bit. As a matter of fact, it's a convention having nothing to do with privileges related to applications, but indicating the file can be blocked with the function fcntl(fd, F_SETLK, lock). Usually an application doesn't use the Set-GID bit, but it does happen sometimes, some games, for instance, use it to save the best scores into a system directory.

Type of attacks and potential targets

There are various types of attacks against a system. Today we study the mechanisms to execute a external command from within and application. This is usually a shell running under the identity of the owner of the application. A second type of attack relies on buffer overflow giving the attacker the possibility zo run personal code instructions. Last, the third main type of attack is based on race condition, lapse of time between two instructions in which a system component is changed (usually a file) while the application considers it unchanging.

The two first types of attacks often try to execute a shell with the application's owner privileges, while the third one is rather targeted to get write access to protected system files. The read access is sometimes considered as a system security weakness (personal files, emails, password file /etc/shadow, and pseudo kernel configuration files in /proc.

The targets of security attacks are mostly the programs having a Set-UID (or Set-GID) bit on. However, this also concerns every application running under a different ID than the one of its user. The system daemons represent a big part of these programs. A daemon is an application usually started at boot time, running in the background without any control terminal, and doing privileged work for any user. For instance, the lpd daemon allows every user to send documents to the printer, sendmail receives and redirects electronic mail, or apmd asks the Bios for the battery status of a laptop. Some daemons are in charge of communication with external users through the network (Ftp, Http, Telnet... services). A server called inetd manages the connection.

We can then conclude that a program can be attacked as soon as it talks - even briefly - to a user different from the one who started it. If the design of an application owns such a feature, you must be careful while developing and keep in mind the risks presented by the functions we will study here.

Changing privilege levels

When an application runs with an EUID different from its RUID, it's to provide its user with privileges he doesn't have (file access, reserved system calls...). However this is only needed punctually, for instance when opening a file, otherwise the application is able to cope with its user's privileges. It's possible to temporarily change an application EUID with the system-call :

  int seteuid (uid_t uid);

A process can always change its EUID value giving it the one of its RUID. In that case, the old UID is kept in a saved field called SUID (Saved UID) different from SID (Session ID) used for control terminal management. It's always possible to get the SUID back to use it as EUID. Of course, a program having a null EUID (root) can change at will both its EUID and RUID (it's the way /bin/su works).

To reduce the risks of attacks, it's suggested to change the EUID and use the RUID of the users instead. When a portion of code needs privileges corresponding to those of the file's owner, it's possible to put the Saved UID into the EUID. Here is an example :

  
  uid_t e_uid_initial;
  uid_t r_uid;
    
  int
  main (int argc, char * argv [])
  {
    /* Saves the different UIDs */
    e_uid_initial = geteuid ();
    r_uid = getuid ();

    /* limits access rights to the ones of the 
     * user launching the program */
    seteuid (r_uid);
    ...
    privileged_function ();
    ...
  }
  
  void
  privileged_function (void)
  {
    /* Gets initial privileges back */
    seteuid (e_uid_initial);
    ...
    /* Portion needing privileges */
    ...
    /* Back to the rights of the runner */
    seteuid (r_uid);
  }

This way to work is much more secure than the opposite one, too often seen, consisting in using the initial EUID and then temporarily reducing the privileges just before doing a "risky" operation. However this privilege reduction is useless against buffer-overflow attacks. As we'll see in a next article, these attacks intend to ask the application to execute personal instructions and can contain the system-calls needed to make the privilege level higher. Nevertheless, this approach protects from external commands and from most of the race conditions.

Running external commands

An application often needs to call an external system service. A well known example concerns the mail command to manage an electonic mail (running report, alarm, statistics, etc) without requiring a complex dialog with the mail system. The easiest solution is to use the library function :

  int system (const char * command)

Dangers of the system() function

This function is rather dangerous : it calls the shell to execute the command sent as an argument. The shell behavior depends on the choice of its user. A typical example comes from the PATH environment variable. Let's suppose an application calling the mail function. For instance, the following program sends its source code to the user who launched it :

/* system1.c */

#include <stdio.h>
#include <stdlib.h>

int
main (void)
{
  if (system ("mail $USER < system1.c") != 0)
    perror ("system");
  return (0);
}

Let's say this program is Set-UID root :

>> cc system1.c -o system1
>> su
Password:
[root] chown root.root system1
[root] chmod +s system1
[root] exit
>> ls -l system1
-rwsrwsr-x  1 root  root  11831  Oct 16  17:25 system1
>>

To execute this program, the system runs a shell (with /bin/sh) and with the -c option, it tells it the instruction to invoke. Then the shell goes through the directory hierarchy according to the PATH environment variable to find an executable called mail. Then, the user only has to change this variable content before running the main application. For example :

  >> export PATH=.
  >> ./system1

tries to find the mail command within the current directory. Enough then, to create there an executable file (for instance, a script running a new shell) and to call it mail and the program is then executed with the main application owner's EUID! Here, our script runs /bin/sh. However, since it's executed with a redirected standard input (like the initial mail command), we must get it back in the terminal. We then create the script :

#! /bin/sh
# "mail" script running a shell
# getting its standard input back.
/bin/sh < /dev/tty

Here is the result :

>> export PATH="."
>> ./system1
bash# /usr/bin/whoami
  root
bash#

Of course, the first solution consists in giving the full path of the program, for instance /bin/mail. Then a new problem appears : the application relies on the system installation. If /bin/mail is usually available on every system, where is GhostScript, for instance? (is it in /usr/bin, /usr/share/bin, /usr/local/bin ?). On the other hand, another type of attack becomes possible with some old shells : the use of the environment variable IFS. The shell uses it to parse the words in the command line. This variable holds the separators. The defaults are the space, the tab and the return. If the user adds the slash /, the command "/bin/mail" is understood by the shell as "bin mail". An executable file called bin in the current directory can be executed just by setting PATH, as we have seen before, and allows to run this program with the application EUID.

Under Linux, the IFS environment variable is not a problem anymore since bash completes it with the default characters on startup (so does pdksh). But, with application portability in mind, you must be aware that some systems can be less secure with regards to this variable.

Some others environment variables may cause unexpected problems. For instance, the mail application allows the user to run a command while composing a message using an escape sequence "~!". If the user writes the string "~!command" at the beginning of the line, the command is run. The program /usr/bin/suidperl used to make perl scripts working Set-UID, when detecting a problem, calls /bin/mail to send a message to root. The application being Set-UID root, the call to /bin/mail is done under this identity. In the message sent to root, the name of the faulty file is present. An user can then create a file where the filename contains a carriage return followed by a ~!command sequence and another carriage return. If a perl script calling suidperl fails on a low-level problem related to this file, a message is sent under the root identity, containing the escape sequence from the mail application.

This problem shouldn't exist since the mail program is not supposed to accept escape sequences when run automatically (not from a terminal). Unfortunately, an undocumented feature of this application (probably left from debugging), allows the escape sequences as soon as the environment variable interactive is set. The result? A security hole easily exploitable (and widely exploited) in an application supposed to improve system security. The mistake is shared. First, /bin/mail holds an undocumented option especially dangerous since it allows code execution only checking the data sent, what should be a priori suspicious for a mail utility. Second, even if the /usr/bin/suidperl developers were not aware of the interactive variable, they shouldn't have left the execution environment as it was when calling an external command, especially when writing this program Set-UID root.

As a matter of fact, Linux ignores the Set-UID and Set-GID bit when executing scripts (read /usr/src/linux/fs/binfmt_script.c and /usr/src/linux/fs/exec.c). Some tricks allow to bypass this rule, like Perl does with its own scripts using /usr/bin/suidperl to take these bit into account.

Solutions

It isn't always so easy to find a replacement for the system() function. The first variant is to use system-calls such as execl() or execle(). However, it'll be quite different since the external program is not anymore called as a subroutine, but the invoked command replaces the current process. You must add a process duplication and parse the command line arguments. Thus the program :

  if (system ("/bin/lpr -Plisting stats.txt") != 0) {
    perror ("Printing");
    return (-1);
  }

becomes :

pid_t pid;
int   status;
  
if ((pid = fork()) < 0) {
  perror("fork");
  return (-1);
}
if (pid == 0) {
  /* child process */
  execl ("/bin/lpr", "lpr", "-Plisting", "stats.txt", NULL);
  perror ("execl");
  exit (-1);
}
/* father process */
waitpid (pid, & status, 0);
if ((! WIFEXITED (status)) || (WEXITSTATUS (status) != 0)) {
  perror ("Printing");
  return (-1);
}

Obviously, the code gets heavier! In some situations, it becomes quite complex, for instance, when you must redirect the application standard input such as in :

system ("mail root < stat.txt");

That is, the redirection defined by < is done from the shell. You can do the same, using a complex work with sequences such as fork(), open(), dup2(), execl(), etc. In that case, an acceptable solution would be using the system() function, but configuring the whole environment.

Under Linux, the environment variables are stored in the form of a pointer to a table of characters : char ** environ. This table ends with NULL. The strings are of the form "NAME=value".

We start removing the environment using the Gnu extension :

    int clearenv (void);

or forcing the pointer

    extern char ** environ;

to take the NULL value. Next the important environment variables are initialized, using controlled values, with the functions :

    int setenv (const char * name, const char * value, int remove)
    int putenv(const char *string)

before calling the system() function. For example :

    clearenv ();
    setenv ("PATH", "/bin:/usr/bin:/usr/local/bin", 1);
    setenv ("IFS", " \t\n", 1);
    system ("mail root < /tmp/msg.txt");

If needed, you can get the content of some useful variables back before removing the environment (HOME, LANG, TERM, TZ,etc.). The content, the form, the size of these variables must be strictly checked. It is important that you remove the whole environment before redefining the needed variables. The suidperl security hole wouldn't have appeared if the environment would have been properly removed.

Analogus, protecting a machine on a network first implies denying every connection. Next, the required or useful services are activated. In the same way, when programming a Set-UID application, the environment must be cleared and then filled with required variables.

Verifying a paramater format is done by comparing the expected value to the allowed formats. If the comparison succeeds the parameter is validated. Otherwise, it is rejected. If you run the test using a list of invalid expressions of the format, the risk of leaving a malformed value increases and that can be a disaster for the system.

We must understand what is dangerous with system() is as well dangerous too for some derived functions such as popen(), or with system-calls such as execlp() or execvp() taking into account the PATH variable.

Commands indirect execution

To improve a programs ergonomy, it's easy to leave the user the ability of configuring most of the software behavior, using macros for instance. To manage variables or generic patterns as the shell does, there is a powerful function called wordexp(). You must be very careful with it, since sending a string like $(commande) allows executing the mentioned external command. Enough to give it the string "$(/bin/sh)" to get a Set-UID shell. To avoid such a thing, wordexp() has an attribute called WRDE_NOCMD deactivating the $( ) sequences interpretation.

When invoking external commands you must be careful with not calling an utility providing an escape mechanism towards a shell (like the vi :!command sequences for instance). It's difficult to list them all, some applications are obvious (text editors, file managers...) others are harder to detect (as we have seen with /bin/mail) or have dangerous debugging modes.

Conclusion

This article illustrates various aspects :

Everything external to a Set-UID root program must be validated! This concerns as well the environment variables as the parameters given to the program (command line, configuration file...);
Privileges have to be reduced as soon as the program starts and should only be increased very briefly when there are no other means;
The "depth of security" is essential : every protection decision reduces the number of people who to break them.

The next article will talk about memory, its organization, function calls... before reaching the buffer overflows. We also will see how to buid a shellcode.