------------------------------------------------- Strings without NUL and Avoiding buffer overflows ------------------------------------------------- -Ian! D. Allen - idallen@idallen.ca Background: Read the news. Every week some Internet client or server software is compromised by a "buffer overflow", where data is written off into memory and the resulting fault lets the attacker take over the machine. Internet-facing programs have to be robust and well-written. An Internet-visible server hands some amount of control of your machine to anyone anywhere on the planet who wants to connect to it. The slightest programming error on your part will be used to take down your server or compromise it so that it can be used to attack others. My goal is to help you to write small but solid Internet client/server programs that cannot be exploited by crackers. That means zero tolerance for memory errors and buffer overflows. Handling strings that don't end in NUL '\0' ------------------------------------------- If you use the low-level Unix system routines read() or recv() (or their cover functions) the buffers you get back don't have NUL ('\0') bytes on the end. Instead, you get back the length of the data in the buffer. This is good, since it means your programs can handle binary data that might contain any byte, including NUL, safely. Buffers without NUL bytes or that might contain embedded NUL bytes cannot be used by any of the string handling library functions unless the functions also take a "length" parameter. This affects all of the string handling routines, meaning you can *not* use such functions as strlen(), strcpy(), or printf(). Most of the string routines will keep looking through memory until they find a NUL byte, even if that means causing your program to fault and die. You can specify a printf/sprintf format string to pick off a length of bytes where the NUL might be missing: printf("%.*s",len,buf); /* the "*" picks up the current value of "len" */ * Use strncpy() or memcpy() or memmove() instead of strcpy(). * Use memchr() instead of index() or strchr(). Handling buffers that contain binary data (embadded NULs) --------------------------------------------------------- If you are processing non-ASCII or binary data, the data may contain embedded NUL bytes that make it impossible to use any of the string functions or standard I/O formatted output functions such as printf(). All these functions rely on strings ending at the first NUL byte, which is not true for binary data. Binary data can be read/written safely using the buffered standard I/O functions fread() and fwrite(). You cannot use printf() or fprintf(). Binary data must be copied using memcpy() or memmove(). You cannot use strcpy() or strcat(). Limit the number of bytes copied -------------------------------- A big problem with sprintf() is that it doesn't have any way for you to specify the size of the *output* buffer, and thus it isn't safe to use unless the format string and all the rest of your surrounding code is safe (and that's risky to assume). This kind of sprintf() programming is wrong: buf1[256]; buf2[256]; ... len = sprintf(buf2, "%s: %s", somestring, buf1); /* OVERFLOW */ You must make buf2 large enough to hold all of buf1 plus the length of whatever stuff might be in "somestring", plus the few bytes between the strings. How big should that be? You can't easily make *sure* that, now and through all future code and format modifications: strlen(somestring)+strlen(buf1)+format < sizeof(buf2) That makes sprintf() awkward to use safely - it doesn't know when to stop. If "somestring" gets longer, or if you make the format string a bit longer, it will overflow buf2. Your code is waiting for a buffer overflow to happen, now or in future. You might link the size of buf2 to be larger than buf1 to preclude future maintenance problems (a very good idea, though not sufficient): buf1[256]; buf2[sizeof(buf1)+80]; ...but how do you know that "+80" will always be enough? You can't know. For buffer safety (not overflowing the output buffer), you must replace sprintf() with snprintf(), and check to make sure that all the data fit in the given output buffer size. (Read the man page on how you will know if snprintf() truncated the output!) See also the C FAQ: http://c-faq.com/ or http://www.faqs.org/faqs/C-faq/faq/ Section 12.21 deals with the sprintf() problem and snprintf() solution. To limit the amount of data moved by printf/sprintf, you can also replace all your %s formats with %.*s formats, so you can limit how much data they pick up. You will still have to find a way to detect that %.*s didn't read all the data, and you still have to make sure the output buffer will hold the sum of all the data copied. Programs that operate over the Internet *MUST NOT* allow buffer overflows. They must not be written so that simple modifications to the code (software maintenance) will trigger buffer overflows. Summary ------- Never use strcpy(), strcat(), sprintf(), index(), or strchr() on buffers that might not contain a trailing NUL byte, or that might contain an embedded NUL bytes (binary data). For Internet-facing programs, avoid these functions completely. * Use strncpy() or memcpy() or memmove() instead of strcpy(). * Use memchr() instead of index() or strchr(). * Use snprintf() and never sprintf() * Use fgets() and never use gets()