% Using find -exec or xargs to process pathnames with other commands % Ian! D. Allen -- -- [www.idallen.com] % Winter 2016 - January to April 2016 - Updated 2019-01-06 04:26 EST - [Course Home Page] - [Course Outline] - [All Weeks] - [Plain Text] Using the pathnames found by `find` =================================== ----------------------------------------------------------------------------- **This is optional material for CST8207** ----------------------------------------------------------------------------- The Problem: > The `find` command is showing me pathnames. I could use the mouse to > copy-and-paste these pathnames into many `cp` commands, but surely there > must be a way to automate this? Can the `cp` command select file names the > same way that `find` can? The idea of Unix/Linux is that every command does one thing well, so they don't put features of `find` into `cp`. You use `find` to generate the names and you use `cp` to copy the names. The trick is getting the names generated by `find` to be used by `cp`. For an introductory assignment, I don't expect more knowledge than copy and paste using your mouse, but that's not how a real sysadmin would do it. Here are some optional hints on how a real sysadmin would get the pathnames copied without using a mouse or copy-and-paste. Method One -- `find -exec` -------------------------- The designers of the `find` command built in a mechanism to run a command using the pathnames that `find` finds. It's the `-exec` option. Go read `man find` and look at how `-exec` works. The man page for `find` has one example in the `EXAMPLES` section of the man page (along with lots of other uses of `find`) and you can actually use this example to run `file` on a whole bunch of files: find . -type f -exec file '{}' \; You can append the above `-exec` and following arguments to any already-working `find` command you have, replacing the `.` starting point and `-type f` expression in the example with your own starting point and expression to find the pathnames you want. The `find` command line with the above added `-exec` expression will then run `file` on each of the pathnames found by `find`, one at a time. The `find` command will run the `-exec` command once per pathname. The pathname generated by `find` is inserted into the `-exec` command line where that quoted set of braces is. You might be able to see it better if you insert an `echo` in front of the command line being run by `find`, to echo on your screen the command that is being built and executed: find . -type f -exec echo file '{}' \; (Make sure you get this simple `-exec echo file` example working on your own set of pathnames before you try to modify it to do something more complicated such as a file copy.) But of course you don't want to simply run `file` on each pathname; you want to copy each pathname into a single destination directory. I'll leave most of this as an "exercise for the student", with the following hint: - The quoted braces give you the location where `find` will put the source pathname argument to `cp`; what is missing in the above line that uses `file` is the destination directory needed by `cp`. You will have to add the destination directory name in the right place and also change the command name `file` to be the command name `cp` in the above line. Leave the `echo` ahead of the command line you are building until you see `find` generate on your screen the `cp` command lines that you know will work, then take out the `echo` and let `find` run the multiple `cp` commands for you. The above is just one way to automate the copy by having `find` do the work for you. It has the disadvantage that it runs a separate `cp` command for every pathname `find` finds, which is no problem if there are only three pathnames but is a huge problem if there are a million pathnames because `find` will have to run `cp` a million times (and that takes time). Modern versions of `find` have a modified `-exec` statement ending in `+` instead of `;` that can pack multiple file names into the same command execution, reducing the number of times the command has to be executed by increasing the number of pathnames passed to each execution: find . -type f -exec file '{}' + This works similarly to `xargs`, which is described next: Method Two -- `xargs` --------------------- If you have a million files to copy, using `find` with the traditional version of `-exec` is not the way to do it, since you will have to call and run the `cp` command program once per pathname, and that means running `cp` a million times. Even if `cp` did nothing, it would take a long time to re-execute `cp` a million times. We can do this more efficiently. The `cp` command is designed to allow multiple source pathnames if they are all being copied into the same destination directory. We could reduce the number of `cp` commands run if we could put multiple source pathnames into each `cp` command line. If we could fit a million source pathnames on one `cp` command line, we would only need one single `cp` command to do the work. This is a huge savings compared to running `cp` a million times. Alas, most Unix systems have a limit on the total length of a command line. You can't fit a million pathnames on one single `cp` command line. This is why the `xargs` program was written. The `xargs` program reads a (usually large) list of pathnames from standard input. It will read those pathnames and pack a command line with as many of those pathnames as can possibly fit, then call the command, then repeat with another large number of pathnames, and repeat again until all the pathnames are processed. By packing each command line as full of pathnames as it possibly can, it uses the minimum number of commands needed to get the job done. See the `man xargs` and look at the EXAMPLES section for examples using `find` to generate pathnames that get sent into `xargs`. Sysadmin always use the `-print0` option to `find` and the `-0` option to `xargs` so that blanks in pathnames don't cause problems. (See the man pages.) Since `xargs` can only add lists of pathnames to the *end* of a command line (where most commands expect them), this poses a problem for a file copy that expects all the source filenames to *precede* the destination directory name. The maintainers of `cp` invented the `-t` option to `cp` so that you could specify the destination directory *first* on the command line, allowing all the source pathnames to be stacked at the end just the way `xargs` generates them: $ cp -t /tmp file1 file2 file3 # file4 file5 etc... You need to use the `-t` option when you use `cp` inside `xargs` so that the list of source pathnames can appear at the end of the command line. Again, insert `echo` at the start of your `xargs` command lines (and start with only a few pathnames on standard input, not hundreds) until you see echoing on your screen the command lines you know will work. Then take out the `echo` and feed the full list of pathnames. > As described in the previous section, modern versions of `find` have a > modified `-exec` statement ending in `+` instead of `;` that can pack > multiple file names into the same command execution, reducing the number of > times the command has to be executed by increasing the number of pathnames > passed to each execution. Method Three -- Shell Command Substitution: `$(command)` -------------------------------------------------------- The shells have a command substitution feature that lets you take the standard output of any command and insert it into a command line. (See the heading **Command Substitution** in `man bash`, and also previous class notes such as [CST8207 Command Substitution] or [CST8129 Command Substitution].) You might think of using this handy feature to take the standard output of `find` (a list of pathnames) and insert it into a `cp` command line. This command substitution might work, but it has serious limitations: 1. *None of the pathnames can contain any blanks, asterisks, or other shell meta-characters that the shell will expand*. This may be true for the pathnames you substitute today, but it won't always be true! 2. *The total list of pathnames can't exceed the system limit on the length of a command line.* The list might fit today, but it won't always fit! In other words, command substitution only works sometimes, where the other two solutions presented earlier work every time (provided you use `-print0` in your `find` command!). Since sysadmin want solutions that always work and won't mysteriously start failing in the future, avoid using command substitution to naïvely generate pathnames needed by other commands if those pathnames might *ever* contain blanks or other shell meta-characters, or if the list of pathnames might be very large. The embedded blanks and shell meta-characters in the pathnames, or the sheer number of pathnames, will some day cause errors if you rely on command substitution. (With correct use of shell options to turn off file GLOBbing and suppress the splitting of words on blanks, you can *almost* safely write a shell script that does use command substitution and pathnames, but it isn't pretty, doesn't work for file names with newlines in them, and the options used are unsuitable for interactive shell use. It can still stop working if the list of pathnames is longer than is allowed on a command line. Don't do it!) -- | Ian! D. Allen, BA, MMath - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ [Plain Text] - plain text version of this page in [Pandoc Markdown] format [www.idallen.com]: http://www.idallen.com/ [Course Home Page]: .. [Course Outline]: course_outline.pdf [All Weeks]: indexcgi.cgi [Plain Text]: 185_find_and_xargs.txt [CST8207 Command Substitution]: ../../../cst8207/18f/notes/710_command_substitution.html [CST8129 Command Substitution]: ../../../cst8129/05f/notes/command_substitution.txt [Pandoc Markdown]: http://johnmacfarlane.net/pandoc/