Winter 2019 - January to April 2019 - Updated 2019-03-08 04:21 EST
gzip
and gunzip
IndexYou can compress a file using the gzip
command, and the result is a
new binary compressed file with a .gz
suffix added on the end:
$ cp -p /etc/passwd foo
$ gzip foo
$ ls -ls /etc/passwd foo.gz
96 -rw-r--r-- 1 root root 97450 Feb 10 13:08 /etc/passwd
28 -rw-r--r-- 1 idallen idallen 26884 Feb 10 13:08 foo.gz
$ file foo.gz
foo.gz: gzip compressed data, was "foo", from Unix, last modified: Wed Feb 10 13:08:27 2016
The original file is removed after being compressed. The modify time of the original file is preserved.
You can decompress/uncompress the file with gunzip
, which restores
the original file contents and removes the suffix from the name:
$ gunzip foo.gz # "gunzip foo" works too
$ ls -ls foo
96 -rw-r--r-- 1 idallen idallen 97450 Feb 10 13:08 foo
The compressed file is removed after being uncompressed. The modify time of the file is preserved.
The gunzip
command will not uncompress a file by name unless the file
name ends in the .gz
suffix:
$ gzip </etc/passwd >foo
$ file foo
foo: gzip compressed data, last modified: Wed Mar 6 21:13:03 2019, from Unix
$ gunzip foo
gzip: foo: unknown suffix -- ignored
$ mv foo foo.gz
$ gunzip foo.gz
$ ls -l /etc/passwd foo
-rw-r--r-- 1 root root 168835 Mar 6 16:13 /etc/passwd
-rw-rw-r-- 1 idallen idallen 168835 Mar 8 04:03 foo
You can use either command as a filter (reading standard input and writing standard output) if you don’t give it a file name:
$ fgrep 'refused connect' /var/log/auth.log | gzip >bad.txt.gz
$ gunzip <bad.txt.gz | wc
$ gunzip <bad.txt.gz | less
When used as a filter (no file name), the commands cannot actually compress or decompress the original file and remove it because there is no file name. Filter commands simply compress or decompress the data in the input stream; the file is not changed.
zless zfgrep zcat zdiff zgrep
IndexSome helpful z
-commands have been created to directly access compressed
files and save typing gunzip
in a pipe all the time:
$ gunzip <bad.txt.gz | less # hard way to paginate contents
$ zless bad.txt.gz # easy way
$ gunzip <bad.txt.gz | fgrep '.cn' # hard way to fgrep contents
$ zfgrep '.cn' bad.txt.gz # easy way
Since all of the z
-commands are filters (they are small shell scripts),
none of the z
-commands affect the given file. The file is not
decompressed and then removed. Only the file contents are decompressed
and sent to standard output.
See also: zcat zdiff zgrep
bzip2
and bunzip2
IndexThe commands bzip2
and bunzip2
are similar to gzip
and gunzip
but
they use a different, often better, compression algorithm. The default
file extension is .bz2
instead of .gz
:
$ cp /etc/passwd foo
$ bzip2 foo
$ ls -ls /etc/passwd foo.bz2 foo.gz
96 -rw-r--r-- 1 root root 97450 Feb 10 13:08 /etc/passwd
24 -rw-r--r-- 1 idallen idallen 22235 Feb 10 13:08 foo.bz2
28 -rw-r--r-- 1 idallen idallen 26884 Feb 10 13:08 foo.gz
$ file foo.bz2
foo.bz2: bzip2 compressed data, block size = 900k
As with gzip
, the original file is removed after being compressed,
unless the command is used as a filter (without a file name). The modify
time of the original file is preserved.
If you give bunzip2
a file name that does not end in .bz2
, it
decompresses the file into the same file name with .out
appended:
$ bzip2 </etc/passwd >foo
$ file foo
foo: bzip2 compressed data, block size = 900k
$ bunzip2 foo
bunzip2: Can't guess original name for foo -- using foo.out
$ ls -l /etc/passwd foo.out
-rw-r--r-- 1 root root 168835 Mar 6 16:13 /etc/passwd
-rw-rw-r-- 1 idallen idallen 168835 Mar 8 04:05 foo.out
bzless bzfgrep bzcat bzdiff bzgrep
IndexSome helpful bz
-commands have been created to directly access compressed
files and save typing bunzip2
in a pipe all the time:
bzcat bzdiff bzfgrep bzgrep bzless
:
$ bunzip2 <bad.txt.bz2 | less # hard way to paginate contents
$ bzless bad.txt.bz2 # easy way
These helpers have similar names and work the same way as the gzip
helper z
-commands. See the man pages for the other helpers.
tar
file (tarball)IndexRead the mouse-over text in the above
tar
-related comic from the XKCD webcomic.
Long before software package managers such as YUM, RPM, and APT, there
were tar
archives. Originally written as a magnetic Tape ARchiver,
the command is common to every Unix/Linux system. A tar
archive file
is the Unix version of a zip
file. It is one file that contains many
other files inside it. You can download and extract a tar
format
archive file on most any Unix/Linux system back to 1969.
A tar
archive, also called a “tarball”, is a single file that contains
multiple uncompressed files and directories. Unix/Linux software source
is often distributed as a “tarball”.
The syntax of the tar
command is irregular – you don’t have to
put dashes in front of the operation letters (but you can if you like):
Syntax: tar <operation> [options] -f <archive_file> [<pathnames>]
$ tar cf /tmp/my.tar . # create archive of current directory
$ tar -cf stuff.tar *.c # archive all the .c files
$ tar -xvf my.tar # extract everything into current dir
$ tar xvf my.tar mydir # only extract mydir from the archive
The name of the tar
archive can be anything; the suffixes are
there simply for human readers to better know what the files contain.
The archive name must always directly follow the -f
option with no
other option letters in between:
$ tar -tvf my.tar # correct use of -f
$ tar -vft my.tar # WRONG use of -f
$ tar -fvt my.tar # WRONG use of -f
You must always use one of three major operation letters:
-t: list the pathnames in the archive (a table of contents)
-x: extract (all or some) pathnames from the archive
-c: create a new tar archive (erases existing contents!)
You may optionally use some other relevant options:
-f: select the archive pathname (almost always used; must be last option)
-p: preserve permissions when extracting
-v: verbose (more messages about what is happening, or more detail)
-z: the entire archive is gzip compressed (or uncompressed if extracting)
-j: the entire archive is bzip2 compressed (or uncompressed if extracting)
The -f
archive pathname option is almost always used, unless you
happen to own a tape drive! Always use -f
and an archive file name.
The archive file name must immediately follow the -f
option with no
other option letters in between, i.e. tar -tvf my.tar
The -v
“verbose” option above lists all the file names as they are
put into an archive file, or as they are extracted. This is useful for
debugging, but isn’t usually used for a production system where you know
exactly what is going into the archive; leave it out for normal use.
If an uncompressed tarball file is damaged, the damage may affect only some of the files in the tarball and the other files, even files stored after the damage point, may still be recoverable.
tarball.tar.gz
and tarball.tar.bz2
IndexA compressed tarball is simply a single tarball file that has been
compressed with either gzip
or bzip2
. The compression compresses
the entire tarball, not the individual files inside the tarball.
A tarball file may be first created and then compressed as a whole
using either the gzip
or bzip2
file compression commands:
$ tar -cf tarball.tar *.c # create archive named tarball.tar
$ gzip tarball.tar # compress into tarball.tar.gz
$ tar -cf tarball.tar *.c # create archive named tarball.tar
$ bzip2 tarball.tar # compress into tarball.tar.bz2
Modern versions of tar
have an option letter that does this compression
for you (less typing). A compressed tar
archive can be created and
compressed in one step by an option to the tar
command itself:
$ tar -czf tarball.tar.gz *.c # create and gzip compress into tarball.tar.gz
$ tar -cjf tarball.tar.bz2 *.c # create and bzip2 compress into tarball.tar.bz2
You generate a table of contents, or extract all the files, using the appropriate de-compression option depending on if and how the tarball file was compressed:
$ tar -tf tarball.tar # table of contents if uncompressed
$ tar -tzf tarball.tar.gz # table of contents if gzip compressed
$ tar -tjf tarball.tar.bz2 # table of contents if bzip2 compressed
$ tar -xf tarball.tar # extract contents (uncompressed)
$ tar -xzf tarball.tar.gz # extract contents (gzip compressed)
$ tar -xjf tarball.tar.bz2 # extract contents (bzip2 compressed)
The tar
command doesn’t care what you name your archive file. The
gzip
compressed tarballs usually have names ending with *.tar.gz
or
*.tgz
and bzip2
compressed tarballs usually have names ending with
*.tar.bz2
or *.tb2
.
Modern versions of the
tar
command automatically recognize existing compressed archives and thus don’t require the extraz
orj
option letters to read compressed archives. You still need the appropriate letter to create a new compressed archive file.
If a compressed tarball file is damaged, all the files following the damage point cannot be decompressed and are usually unrecoverable.
tar
to archive or restore a directoryIndexThe tar
command will automatically recursively archive entire
directories into a tarball if you give it directories. Software is
often distributed as a tarball file.
$ cd # go to my home directory
$ tar czf /tmp/homedir.tar.gz . # archive current directory into a file
Do not place the output tarball file in any of the directories being
used as input to tar
!
When you have a tarball, you can then extract it into the current directory:
$ mkdir /some/backupdir
$ cd /some/backupdir
$ tar xzpf /tmp/homedir.tar.gz # extract the whole archive into current directory
The p
option preserves the modes (permissions) of the files as they
are extracted.
tar
to copy a directoryThis legacy use of tar
to copy an entire directory has been replaced by
cp -a
or the rsync
command.
You can do a directory copy with tar
using a pipe instead of an output
file by using the special file name -
that stands for either standard
output (when creating) or standard input (when extracting):
$ cd
$ tar cf - . | ( cd /some/backupdir && tar xpf - ) # local copy
$ tar cf - . | ( ssh otherhost 'cd /some/dir && tar xpf -' ) # remote host copy
The above uses of tar
to copy a directory have been largely supplanted
by the -a
(archive) option to cp
or by the rsync
command.
zip
and unzip
IndexA ZIP file is a single file containing individually compressed files. (This is not the same format as a compressed tarball, which is a single compressed file containing individual uncompressed files.)
Unix/Linux can also manipulate ZIP format file archives (often used on
Microsoft systems) using zip
and unzip
:
$ touch file1 file2 file3
$ zip foo file1 file2 file3 # create foo.zip with three files
adding: file1 (stored 0%)
adding: file2 (stored 0%)
adding: file3 (stored 0%)
$ ls -l foo.zip
-rw-rw-r-- 1 idallen idallen 436 Mar 9 03:44 foo.zip
$ unzip -l foo.zip # list the contents (do not extract)
Archive: foo.zip
Length Date Time Name
--------- ---------- ----- ----
0 2016-03-09 03:44 file1
0 2016-03-09 03:44 file2
0 2016-03-09 03:44 file3
--------- -------
0 3 files
$ rm file?
$ unzip foo.zip # extract all the files
Archive: foo.zip
extracting: file1
extracting: file2
extracting: file3
Other options can preserve directory hierarchy and do other things. See the man page.
If a ZIP file is damaged, the damage usually affects only some of the files in the ZIP file and the other files, even files stored after the damage point, may still recoverable.
zip
file), or does tar
archive
together all the files first (uncompressed) and then compress the
whole archive?zip
file or a compressed
tar
file, and why? (Hint: Consider archiving 1000 copies of the
same file.)zip
file or a compressed
tar
file, and why?diff
IndexThe diff
command compares two files: diff file1 file1
vimdiff
and gvimdiff
diff3
meld
Student Tammy Rediger (17F) tells me that “the program 7zip does work
with .gz
, .bzip2
and .tar
files” under Microsoft Windows.