------------------------------------------------- Strings without NUL and Avoiding buffer overflows ------------------------------------------------- -Ian! D. Allen - idallen@idallen.ca Background: Read the news: Every week some Internet client or server software is compromised by a "buffer overflow", where data is written off into unallocated memory and the resulting program fault lets the attacker take over the machine. Internet-facing programs have to be robust and well-written. An Internet-visible server hands some amount of control of your machine to anyone anywhere on the planet who wants to connect to it. The slightest programming error on your part will be used to take down your server or compromise it so that it can be used to attack others. My goal is to help you to write small but solid Internet client/server programs that cannot be exploited by crackers. That means zero tolerance for memory errors and buffer overflows. Handling strings that don't end in NUL '\0' ------------------------------------------- If you use the low-level Unix system routines read() or recv() (or their cover functions) the buffers you get back don't have NUL ('\0') bytes on the end. Instead, you get back the length of the data in the buffer. This is good, since it means your programs can handle binary data that might contain any byte, including NUL, safely. You can ensure that your buffers are large enough to handle the number of bytes read. Data without NUL bytes or that might contain embedded NUL bytes cannot be used by any of the string handling library functions that expect NUL bytes on the end of strings: strcpy, strcmp, index, strchr, printf, etc. This affects all of the string handling routines, meaning you can *not* use such functions as strlen(), strcpy(), or printf() on binary data or on data that might be missing the trailing NUL byte. The string routines will keep looking through memory until they find a NUL byte; so, if you are handling data that may not terminate with a NUL byte, your program will eventually fault and die due to a memory violation. This may give a remote attacker control of your program and your machine. If you aren't handling binary data (no embedded NUL bytes), but your data is missing NUL byte at the end, here are some things you can do: * specify a printf/sprintf format string length parameter to pick off a length of bytes where the NUL might be missing: printf("%.*s",len,buf); /* the "*" picks up the current value of "len" */ * Use strncpy() or memcpy() or memmove() instead of strcpy(). (never use the old deprecated bcopy() or bzero() functions) * Use memchr() instead of index() or strchr(). Handling buffers that contain binary data (embadded NULs) --------------------------------------------------------- If you are processing non-ASCII or binary data, the data may contain embedded NUL bytes that make it impossible to use any of the string functions, such as strlen(), or the standard I/O formatted output functions, such as printf(). All these functions rely on strings ending at the first NUL byte, which is not true for binary data. Binary data can be read/written safely using the low-level Unix read() and write() system calls, and by the buffered standard I/O functions fread() and fwrite(). You cannot use printf() or fprintf() to write binary data. Binary data must be copied using memcpy() or memmove(). You cannot use strncpy() or strncat(). Limit the number of bytes copied -------------------------------- A big problem with functions such as sprintf() or strcpy() is that they don't have any way for you to specify the size of the *output* buffer, and thus it isn't safe to use unless the format string, input data, and all the rest of your surrounding code is safe (and that's risky to assume). This kind of sprintf() programming is wrong: buf1[256]; buf2[256]; ... len = sprintf(buf2, "%s: %s", somestring, buf1); /* OVERFLOW */ In the above incorrect code, you must make buf2 large enough to hold all of buf1 plus the length of whatever stuff might be in "somestring", plus the few bytes between the strings. How big should that be? You can't easily make *sure* that, now and through all future code and format modifications: strlen(somestring)+strlen(buf1)+format < sizeof(buf2) That makes sprintf() awkward to use safely - it doesn't know when to stop. If "somestring" gets longer, or if you make the format string a bit longer, it will overflow buf2. Your code is waiting for a buffer overflow to happen, now or in future. You cannot risk this in Internet-facing code. You might link the size of buf2 to be larger than buf1 to preclude future maintenance problems (a very good idea, though not sufficient): buf1[256]; buf2[sizeof(buf1)+80]; ...but how do you know that "+80" will always be enough? You can't know. For buffer safety (not overflowing the output buffer), you must replace sprintf() with snprintf(), and check to make sure that all the data fit in the given output buffer size. (Read the man page on how you will know if snprintf() truncated the output!) See also the C FAQ: http://c-faq.com/ or http://www.faqs.org/faqs/C-faq/faq/ Section 12.21 deals with the sprintf() problem and snprintf() solution. To limit the amount of data moved by printf/sprintf, you can also replace all your %s formats with %.*s formats, so you can limit how much data they pick up. You will still have to find a way to detect that %.*s didn't read all the data, and you still have to make sure the output buffer will hold the sum of all the data copied. Programs that operate over the Internet *MUST NOT* allow buffer overflows. Internet-facing programs must not be written so that simple modifications to the code (software maintenance) will trigger buffer overflows. Summary ------- Never use strcpy(), strcat(), sprintf(), index(), or strchr() on buffers that might not contain a trailing NUL byte, or that might contain embedded NUL bytes (binary data). For Internet-facing programs, avoid these functions completely. * Use strncpy() or memcpy() or memmove() instead of strcpy(). * Use memchr() instead of index() or strchr(). * Use snprintf() and never sprintf() * Use fgets() and never use gets() * Stop using the deprecated bzero() and bcopy()