Searching for and finding files by name, size, userid, modify time, etc.

Ian! D. Allen – www.idallen.com

Fall 2016 - September to December 2016 - Updated 2018-10-10 00:34 EDT

1 Searching for and finding filesIndexup to index

How can we look for a file name?
What if we don’t know which directory that file is located in?
Can we start a fgrep at the root and ask the fgrep command to look in all subdirectories?

People confuse fgrep, which looks for text inside files (and doesn’t look at the file names), with find, which finds files by name (and doesn’t look inside the files).

The fgrep (and grep and egrep) commands look for patterns or text inside files whose names you already know. They aren’t directly useful for finding and generating the names of those pathnames. (To use fgrep to find the full name of a pathname, you would first have to use some other command to generate a list of all pathnames and feed that to fgrep in a pipeline as standard input – see the example below.)

You can tell fgrep to search the contents of an entire directory tree of files by turning on the fgrep “recursive” option; but, that won’t help you find the name of a pathname in that directory. The fgrep command searches content inside files, it doesn’t find the names of files.

To find pathnames by name you can use the find command. If the pathname has existed for some time and is saved in the right database, you might be able to use the faster locate or slocate commands to search by name.

To find pathnames by anything other than name, e.g. size, or owner, or modify date (etc.), use the find command with the right expression. See below.

2 Five common ways to use the find commandIndexup to index

The Usage line for find given below is abbreviated from man find:

Usage: find [options...] [startdir...] [expression]

The [startdir...] is an optional list of starting directories in which find will do the search, instead of using the current directory.

You rarely need to use any options, so the first thing following the find command name is usually the one directory or list of directories in which to look. The current directory is the default.

The find command has its own huge set of expressions for finding pathnames precisely and efficiently. The expressions follow the starting directories on the command line.

Below are five important uses of find, each explained in detail. You can find:

  1. all pathnames under a list of starting directories
  2. only pathnames containing a particular basename pattern
  3. only pathnames owned by a particular userid
  4. only pathnames modified within some number of days
  5. only pathnames with a size greater than some number
1.  find  [startdir...]                    -print
2.  find  [startdir...]  -name 'basename'  -print
3.  find  [startdir...]  -user 'userid'    -print
4.  find  [startdir...]  -mtime -30        -print
5.  find  [startdir...]  -size +100M       -print

The optional startdir... list in which to search comes first, followed by the optional expression that says what to find. If you don’t specify any startdir, find uses . (the current directory).

The optional expression must follow the starting directories. The expression limits the pathnames that are found or changes the output format. It consists of keywords, each preceded by dashes and usually followed by some argument, e.g. -name 'basename', -size +100M, -print, or -ls.

  1. Without any expression, find finds all pathnames. With modern versions of find, you can omit the -print keyword; it’s the default behaviour.

  2. The -name expression allows you to give a matching pattern that is the basename, found in any directory, starting from each of the starting_directories. The basename patterns can include shell-GLOB-style path metacharacters such as * and ?, and the patterns must be quoted to protect them from GLOB expansion by the shell, e.g. find -name '*.txt'

  3. The -user expression allows you to give a userid that must be the owner of the pathnames found, e.g. find -user 'root'

  4. The -mtime expression allows you to give a modify time expression. Using +10 means “older than 10 days” and -5 means “younger than 5 days, etc., e.g. find -mtime +365

  5. The -size expression allows you give a size number, to match pathnames based on their rounded-up size. The size can have various size multipliers such as M for “MegaByte”, and the actual size of the file is rounded up to that multiplier before comparing. A leading minus on the number means “less than” and a leading plus on the number means “greater than” the given rounded size. The expression -size -100k means “rounded-up size less than 100 kilobytes”. Using -size 0 is a useful expression, to find pathnames that are empty (zero size), and -size +0 finds pathnames that are not empty (size greater than zero).

    Note that the size rounding means that -size -1M only matches zero-size files (because even a one-byte file rounds up to 1M and doesn’t match)! If you really want to see all files smaller than 1M, you have to avoid the rounding and use 1024x1024 characters: -1048576c

See the man page for more help, and search the net for examples. For example, the -type f and -type d expressions are useful for finding only file names or only directory names:

$ find . -type f
$ find . -type d

The find command has expressions that can find pathnames based on any combination of any of the attributes you see in the output of ls -dils.

3 Using multiple expressionsIndexup to index

You can use multiple expressions, and the pathnames found must meet all the conditions of the expressions used, e.g.

$ find /bin /etc/ -name '*word' -user 'root' -size +1k

With more syntax, you can also have find show pathnames that match one expression or another expression, or any Boolean combination of expressions. See the man page.

4 Showing detailed output using -lsIndexup to index

The find command can output detailed attribute information about the pathnames it displays using the -ls expression instead of using the default -print expression:

$ find . -ls

The detailed output is similar to the output you would get if you typed ls -dils for the displayed names:

$ find /etc/passwd -ls
2101779 4 -rw-r--r-- 1 root root 2879 Oct 4 10:59 /etc/passwd

$ ls -dils /etc/passwd
2101779 4 -rw-r--r-- 1 root root 2879 Oct 4 10:59 /etc/passwd

This is the option to use if you want to display attribute information about the pathnames as well as the names.

5 Examples of uses of find, including World-WritableIndexup to index

You can try these examples. Some will produce error messages as well as pathnames, since you don’t have permission to search all the system directories. Ignore the errors (or redirect standard error to /dev/null); look at the results:

$ find /bin -name '*sh'
/bin/bash
/bin/dash
/bin/static-sh
/bin/sh
/bin/rbas

$ find /bin -type f -size +500k
/bin/bash
/bin/busybox

$ find /tmp -maxdepth 1 -user root -type d
/tmp
/tmp/.X11-unix
/tmp/.ICE-unix
/tmp/ssh-mZgPJ11302

In all the examples below, in modern versions of find, you can leave off the default -print action:

If you use a pattern in the -name expression, remember to quote the pattern to protect any GLOB pattern characters from expansion by the shell! The characters need to be quoted to be GLOB-expanded by find, not by the shell.

6 Using fgrep on the output of findIndexup to index

While find has powerful pattern matching expressions, some people prefer to pipe the pathname output of find into one of the grep family of text searching programs because they better know how to use grep regular expression pattern matching.

You can generate a list of all pathnames under a given directory using the find command and then use grep on that piped output, e.g.

$ find
... all pathnames under the current directory list here ...

$ find "$HOME"
... all pathnames under your HOME directory list here ...

$ find /bin | wc -l                    # count pathnames under /bin
107

$ find /bin | fgrep 'sh'               # only pathnames containing 'sh'
/bin/bash
/bin/dash
/bin/static-sh
/bin/sh
/bin/rbash
/bin/sh.distrib

$ find /bin | grep 'sh$'               # only pathnames ending in 'sh'
/bin/bash
/bin/dash
/bin/static-sh
/bin/sh
/bin/rbash

The grep program uses a pattern matching language similar to GLOB patterns but more powerful called Regular Expressions. Don’t use grep to look for text until you become familiar with this pattern matching language; use the safer fgrep command instead.

6.1 NOT running find on the ROOT directoryIndexup to index

Yes, you can do find / (find, starting at the top-most ROOT directory) and it will generate a list of all the pathnames on the whole machine that you have permissions to see – tens of thousands of them. This will take a long time. (You will also see many error messages about permissions, since your userid does not have permissions to look in every directory on the whole system.) Don’t run find / on a shared computer unless you really have to. (But feel free to try it on your own machine!)

7 Finding files using the locate or slocate commandsIndexup to index

Many Unix systems run a weekly or nightly find / late at night and save the results in a small database. The locate or slocate commands can quickly search that saved database for you much faster than find, even using a file GLOB pattern, e.g.

$ locate passwd | less
... see all the names containing the string "passwd" here ...

$ locate '/etc/*passwd*'
/etc/passwd
/etc/passwd-
/etc/cron.daily/passwd
/etc/dovecot/conf.d/auth-passwdfile.conf.ext
/etc/init/passwd.conf
/etc/init.d/passwd
/etc/pam.d/chpasswd
/etc/pam.d/passwd
/etc/security/opasswd

Note the use of quotes to stop the shell from interpreting the GLOB pattern. (We want the locate command to process the GLOB pattern against the pathnames in the database; we do not want the shell to process the GLOB pattern against current pathnames in the file system before it calls the locate command.)

If you are looking for a pathname that has been around for a while and is entered into the database, the locate database lookup is much, much faster than a huge ROOT find /. If you are looking for a new pathname that isn’t in the locate database yet, only find will find it for you.

Author: 
| Ian! D. Allen, BA, MMath  -  idallen@idallen.ca  -  Ottawa, Ontario, Canada
| Home Page: http://idallen.com/   Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom:  http://eff.org/  and have fun:  http://fools.ca/

Plain Text - plain text version of this page in Pandoc Markdown format

Campaign for non-browser-specific HTML   Valid XHTML 1.0 Transitional   Valid CSS!   Creative Commons by nc sa 3.0   Hacker Ideals Emblem   Author Ian! D. Allen