Updated: 2017-01-20 00:48 EST
/home/alex/foobar
/home/alex/literature/barfoo
In Unix/Linux, a file is a sequence of bytes without structure. Any necessary structure (e.g. for a database) is added by the programs that manipulate the data in the file. Linux itself doesn’t know about the internal structure of a database file – all it does is return bytes.
Unix/Linux tries its best to treat every device attached to it as if it were a list of bytes. Therefore, everything, including network cards, hard drives, partitions, keyboards, printers, and plain files are treated as file-like objects and each has a name in the file system.
/dev/mem
./dev/sda
./dev/tty1
.$ ls -li /dev/mem /dev/sda /dev/tty1
5792 crw-r----- 1 root kmem 1, 1 Oct 13 02:30 /dev/mem
888 brw-rw---- 1 root disk 8, 0 Oct 13 02:30 /dev/sda
5808 crw-rw---- 1 root tty 4, 1 Oct 13 02:31 /dev/tty1
Most input and output devices and directories are treated as files in Linux. If you have sufficient permissions, you can directly read all these devices using their file system names. Recent versions of Unix/Linux have evolved directories into non-readable (non-file) objects.
As with most things computer-related, things in the file system are not stored by name, they are stored by number. Linux stores the data and information about each disk object (e.g. a file or a directory) in a numbered data structure called an “index node” or inode.
Each inode is identified by a unique inode number that can be shown using the -i
option to the ls
command:
$ ls -l -i /usr/bin/perl*
266327 -rwxr-xr-x 2 root root 10376 Mar 18 2013 /usr/bin/perl
266327 -rwxr-xr-x 2 root root 10376 Mar 18 2013 /usr/bin/perl5.14.2
266331 -rwxr-xr-x 2 root root 45183 Mar 18 2013 /usr/bin/perlbug
266328 -rwxr-xr-x 1 root root 224 Mar 18 2013 /usr/bin/perldoc
266329 -rwxr-xr-x 1 root root 125 Mar 18 2013 /usr/bin/perldoc.stub
266330 -rwxr-xr-x 1 root root 12318 Mar 18 2013 /usr/bin/perlivp
266331 -rwxr-xr-x 2 root root 45183 Mar 18 2013 /usr/bin/perlthanks
The program /usr/bin/perl
, above, is not stored on disk with its name perl
; it is stored somewhere else, under inode number 266327
. Unix/Linux directories are what map file system names (e.g. perl
) to inode numbers (e.g. 266327
). In the example above, you can see that file /usr/bin/perl
is really inode number 266327
(and that another name perl5.14.2
leads to the same inode!). When you access the perl
program, the system finds the perl
name in a directory, paired with the inode number 266327
that holds the actual data, and then the system has to go elsewhere on disk to that inode number to access the data for the perl
program. File data is stored under inode numbers, not under names.
Every file has its name entered in a directory and is assigned a unique inode number. Each file name can be mapped to only one single inode number, but one inode number may have many names (as is the case with perl
, above).
Inode numbers are specific to a file system inside a disk partition. Every file on a file system (in that partition) has a unique inode number. Numbering is done separately for each file system, so different disk partitions may have file system objects with the same inode numbers.
Every Linux file system is created new with a large set of available inodes. You can list the free inodes using df -i
. Older types of file systems can never make more inodes, even if there is lots of disk space available; when all the inodes are used up, the file system can create no more files until some files are deleted to free some inodes.
Most diagrams showing file systems and links in Unix texts are wrong and range from confusing to seriously misleading. Here’s the truth, complete with an ASCII-art file system diagram below.
Names for inodes (names for files, directories, devices, etc.) are stored on disk in directories. Only the names and the associated inode numbers are stored in the directory; the actual disk space for whatever data is being named is stored in the inode, not in the directory. The names and numbers are kept in the directory; the names are not kept with the data.
In the directory, beside each name, is the index number (inode number) indicating where to find the disk space used to actually store the thing being named. You can see this name-inode pairing using ls -i
:
$ ls -i /usr/bin/perl*
266327 /usr/bin/perl 266329 /usr/bin/perldoc.stub
266327 /usr/bin/perl5.14.2 266330 /usr/bin/perlivp
266331 /usr/bin/perlbug 266331 /usr/bin/perlthanks
266328 /usr/bin/perldoc
The crucial thing to know is that the names and the actual storage for the things being named are in separate places. Most texts make the error of writing Unix file system diagrams that put the names right on the things that are being named. That is misleading and the cause of many misunderstandings about Unix/Linux files and directories. Names exist one level above (separate from) the items that they name:
WRONG - names on things RIGHT - names above things
======================= ==========================
R O O T ---> [etc,bin,home] <-- ROOT directory
/ | \ / | \
etc bin home ---> [passwd] [ls,rm] [abcd0001]
| / \ \ | / \ |
| ls rm abcd0001 ---> | <data> <data> [.bashrc]
| | | |
passwd .bashrc ---> <data> <data>
Directories are lists of names and numbers, as shown by the square-bracketed lists in the diagram on the right, above. (The actual inode numbers are omitted from this small diagram.) The name of each thing (file, directory, special file, etc.) is kept in a directory, separate from the storage space for the thing it names. This allows inodes to have multiple names and names in multiple directories; all the names can refer to the same storage space by using the same inode number.
In the correct diagram on the right, the directories give names to the objects below them in the tree. The top directory on the right is the ROOT directory inode, containing the list of names etc
, bin
, and home
(and others). Because there is no name level above the ROOT directory to give it a name, the ROOT directory has no name!
The line leading downwards from the name bin
in the ROOT directory indicates that the name bin
is paired with an inode number that is another directory inode containing the list of names in the bin
directory, including names ls
and rm
(and others). The line leading down from ls
in the bin
directory inode leads to the data inode for the file /bin/ls
. There is no name kept with the data inode – the name is up in the directory above it.
The ROOT inode has no name because there is no directory above it to give it one! Every other directory has a name because there is a directory inode above it that contains its name.
The actual data for each Unix file or directory stored on disk is managed by numbered on-disk data structures called “inodes” (index nodes). One inode is allocated for each file and each directory. Unix inodes have unique numbers, not names, and it is these numbers that are kept in directories alongside the names. The -i
option to ls
shows these inode numbers.
A Unix inode manages the disk storage space for a file or a directory. The inode contains a list of pointers to the disk blocks that belong to that file or directory. The larger the file or directory, the more disk block pointers it needs in the inode. Also stored in the inode are the attributes of the file or directory (permissions, owner, group, size, access/modify times, etc.); but, not the name of the file or directory. Inodes have only numbers, attributes, and disk blocks – not names. The names are kept separately, in directories.
Everything in a Unix file system has a unique inode number that manages the storage for that thing: every file, directory, special file, etc. Files and directories are both managed with inodes.
File system names are stored in directory inodes. The names are not kept in the same inodes with the things that they name. The name of a file or directory is not kept in the inode with the file attributes or pointers to disk blocks; the name is kept in a directory somewhere else.
Directories are what give names to inodes on Unix. Directories can be thought of as “files containing lists of names and inode numbers”. Files have disk blocks containing file data; directories also have disk blocks; but, the blocks contain lists of names and inode numbers.
Like most other inodes, directory inodes contain attribute information about the inode (permissions, owner, etc.) and one or more disk block pointers in which to store data; but, what is stored in the disk blocks of a directory is not file data but directory data (names and inode numbers).
A Unix directory is simply a list of pairs of names and associated inode numbers. That is all – the disk blocks of Unix directories contain only names and inode numbers. The rest of the attribute information about an item named in a directory (the type, permissions, owner, etc.) is kept with the inode associated with the name. You must use the inode number from the directory to find the inode on disk to read its attribute information; reading the directory only tells you the name and inode number. (Some modern Unix/Linux file systems also cache a second copy of the inode type in the directory to speed up common file system browsing operations.)
Reading a Unix directory tells you only some names and inode numbers; you know nothing about the types, sizes, owners, or modify times of those inodes unless you actually go out to the separate inode on disk and access them to read the attributes. Without actually accessing the inode, you can’t know most of the attributes of the file system object; you can’t even know if the inode is a file inode or a directory inode.
To find out attribute information of some file system object, which is stored with the inode, not in the directory, you must first use the inode number associated with the object to find the inode of the item and look at the item’s attributes. This is why ls
or ls -i
are much faster than ls -l
:
ls
or ls -i
only need to read the names and inode numbers from the directory – no additional inode access is needed because no other attributes are being queried. Reading the one directory inode is sufficient.ls -l
has to display attribute information, so it has to do a separate inode lookup to find out the inode attribute information for every inode in the directory. A directory with 100 names in it requires 100 separate inode lookups to fetch the attributes.No attribute information about the things named in the directory is kept in the directory (except on those modern file systems where caching is enabled). The directory only contains pairs of names and inode numbers.
To find a thing by name, the system goes to a directory inode, looks up the name in the disk space allocated to that directory, finds the inode number associated with the name, then goes out to the disk a second time and finds that inode on the disk. If that inode is another directory, the process repeats from left-to-right along the pathname until the inode of the last pathname component (on the far right in the pathname) is found. Then the disk block pointers of that last inode can be used to find the data contents of the last pathname component.
(The storage for each directory is itself managed by an inode, so the inode for the directory itself contains attribute information about the directory, not about the things named in the directory. Use ls -ld
to see the attributes of the directory inode itself.)
The name and inode number pairing in a Unix directory is the only connection between a name and the thing it names on disk. The name is kept separate from the data belonging to the thing it names (the actual inode on disk). If a disk error damages a directory inode or the directory disk blocks, file data is not usually lost; since, the actual data for the things named in the directory are stored in inodes separate from the directory itself. If a directory is damaged, only the names of the things are lost and the inodes become “orphan” inodes without names. The storage used for the things themselves is elsewhere on disk and may be undamaged. You can run a file system recovery program such as fsck
to recover the data (but not the names).
The name of an item (file, directory, etc.) and its inode number are kept in a directory. The directory storage for that name and number is managed by its own inode that is separate from the inode of each thing in the directory. The name and number are stored in the directory inode; the data for the item named is stored in its own inode somewhere else.
Because (1) a file is managed by an inode with a unique number, (2) the name of the file is not kept in that inode, and (3) directories pair names with inode numbers, a Unix file (inode) can be given multiple names by having multiple name-and-inode pairs in one or more directories.
Inode 123 may be paired with the name cat
in one directory and the same 123 may be paired with the name dog
in the same or a different directory. Either name leads to the same 123 file inode and the same data and attributes. Though there appear to be two different files cat
and dog
in the directory, the only thing different between the two is the name – both names lead to the same inode and therefore to the same data and attributes (permissions, owner, etc.).
ln
creates, rm
removes only a nameMultiple names for the same inode are called “hard links”. The ln
command can create a new name (a new hard link) in a directory for an existing inode. The system keeps a “link count” in each inode that counts the number of names each inode has been given. The rm
command removes a name (a hard link) from a directory, decreasing the link count. When the link count for an inode goes to zero, the inode has no names and the inode is recycled and all the storage and data used by the item is released.
The rm
command does not remove files; it removes names for files. When all the names are gone, the system removes the file and releases the space.
When you look at a Unix pathname, remember that that the slashes separate names of pathname components. All the components to the left of the rightmost slash must be directories, including the “empty” ROOT directory name to the left of the leftmost slash. For example:
/home/alex/foobar
In the above example, there are three slashes and therefore four pathname components. The “empty” name in front of the first slash is the name of the ROOT directory. The ROOT directory doesn’t have a name. (Some books get around this by calling the ROOT directory “slash” or /
. That is wrong. ROOT doesn’t have a name – slashes separate names.)
home
directory.home
directory is the name of the alex
directory.alex
directory is the name of the foobar
file.The last (rightmost) component of a pathname can be a file or a directory (or other); for this example, let’s assume foobar
is a file name.
Below is a file system diagram written correctly, with the names for things shown one level above the things to which the names actually refer. Each box represents an inode; the inode numbers for the box are given beside the box, on the left. Inside the directory inodes you can see the pairing of names and inode numbers. (These inode numbers are made up – see your actual Unix system for the real inode numbers.) One of the inodes, #12
, is not a directory; it is an inode for a file and contains the file data. The downward arrows trace two paths (hard links) to the same #12
file data, /home/alex/foobar
and /home/alex/literature/barfoo
:
We will trace the inodes for two pathnames in the diagram below:
/home/alex/foobar
/home/alex/literature/barfoo
Follow the downward-pointing arrows:
+----+-----+-----------------------------------------+
#2 |. 2 |.. 2 | home 5 | usr 9 | tmp 11 | etc 23 | ... |
+----+-----+-----------------------------------------+
| The inode #2 above is the ROOT directory. It has the
| name "home" in it. The *directory* "home" is not
| here; only the *name* is here. The ROOT directory
| itself does not have a name!
V
+----+-----+---------------------------------------------------+
#5 |. 5 |.. 2 | alex 31 | leslie 36 | pat 39 | abcd0001 21 | ... |
+----+-----+---------------------------------------------------+
| The inode #5 above is the "home" directory. The name
| "home" isn't here; it's up in the ROOT directory,
| above. This directory has the name "alex" in it.
V
+----+-----+---------------------------------------------------+
#31 |. 31|.. 5 | foobar 12 | temp 15 | literature 7 | demo 6 | ... |
+----+-----+---------------------------------------------------+
| The inode #31 above is |
| the "alex" directory. The |
| name "alex" isn't here; |
| it's up in the "home" |
| directory, above. This |
| directory has the names |
| "foobar" and "literature" |
| in it. |
| V
+----+-----+--|-------------------------------------------+
#7 |. 7 |.. 31| | barfoo 12 | morestuf 123 | junk 99 | ... |
+----+-----+--|-------------------------------------------+
| | The inode #7 above is the "literature" directory.
| | The name "literature" isn't here; it's up
| | in the "alex" directory. This directory has
| | the name "barfoo" in it.
| |
V V
*-----------* This inode #12 on the left is a file inode.
| file data | It contains the data blocks for the file.
#12 | file data | This file happens to have two names, "foobar"
| file data | and "barfoo", but those names are not here.
*-----------* The names of this file are up in the two
directories that point to this file, above.
The pathname /home/alex/foobar
starts at the nameless ROOT directory, inode #2
. It travels through two more directory inodes and stops at file inode #12
. Using all four inode numbers, /home/alex/foobar
could be written as #2->#5->#31->#12
.
The pathname /home/alex/literature/barfoo
starts at the ROOT inode and travels through three more directory inodes. It stops at the same #12
file inode as /home/alex/foobar
. Using all five inode numbers, /home/alex/literature/barfoo
could be written as #2->#5->#31->#7->#12
.
Thus, /home/alex/foobar
and /home/alex/literature/barfoo
are two pathnames leading to the same inode #12
file data. The names foobar
and barfoo
are two names for the same file and are called “hard links”.
/home/alex/foobar
Let’s examine each of the above inodes.
The box below represents the layout of names and inode numbers inside the actual disk space given to the nameless ROOT directory, inode #2
:
+----+-----+-----------------------------------------+
#2 |. 2 |.. 2 | home 5 | usr 9 | tmp 11 | etc 23 | ... |
+----+-----+-----------------------------------------+
The above ROOT directory has the name home
in it, paired with inode #5
. The actual disk space of the directory home
is not here; only the name home
is here, alongside of its own inode number #5
. To read the actual contents of the home
directory, you have to find the disk space managed by inode #5
somewhere else on disk and look there.
The above ROOT directory pairing of home
with inode #5
is what gives the home
directory its name. The name home
is separate from the disk space for home
. The ROOT directory itself does not have a name; because, it has no parent directory to give it a name!
The ROOT directory is the only directory that is its own parent. If you look at the ROOT directory above, you will see that both the name .
and the name ..
in this ROOT directory are paired with inode #2
, the inode number of the ROOT directory. Following either name .
or ..
will lead to inode #2
and right back to this same ROOT inode.
Let us move to the storage space for the home
directory at inode #5
.
The box below represents the layout of names and inode numbers inside the actual disk space given to the home
directory, inode #5
:
+----+-----+---------------------------------------------------+
#5 |. 5 |.. 2 | alex 31 | leslie 36 | pat 39 | abcd0001 21 | ... |
+----+-----+---------------------------------------------------+
The name home
for this inode isn’t in this inode; the name home
is up in the ROOT directory. This home
directory has the name alex
in it, paired with inode #31
. The directory alex
is not here; only the name alex
is here. To read the alex
directory, you have to find inode #31
on disk and look there. (In fact, until you look up inode #31
and find out that it is a directory, you have no way of even knowing that the name alex
is a name of a directory!)
Let us move to the storage space for the alex
directory at inode #31
.
The box below represents the layout of names and inode numbers inside the actual disk space given to the alex
directory, inode #31
:
+----+-----+---------------------------------------------------+
#31 |. 31|.. 5 | foobar 12 | temp 15 | literature 7 | demo 6 | ... |
+----+-----+---------------------------------------------------+
The name alex
for this inode isn’t in this inode; the name alex
is up in the home
directory. This alex
directory has the name foobar
in it, paired with inode #12
. The file foobar
is not here; only the name foobar
is here. To read the data from file foobar
, you have to find inode #12
on disk and look there. (In fact, until you look up inode #12
and find out that it is a plain file, you have no way of even knowing that the name foobar
is a name of a plain file!)
Let us move to the storage space for the foobar
file at inode #12
.
The box below represents the actual disk space given to the foobar
file, inode #12
:
*-----------*
#12 | file data |
*-----------*
The name foobar
for this inode isn’t in this inode; the name foobar
is up in the alex
directory. This foobar
inode is a file inode, not a directory inode, and the attributes of this inode will indicate that.
The inode for a file contains pointers to disk blocks that contain file data, not directory data. There are no special directory names .
and ..
in files. There are no names here at all; the disk block pointers in this inode point to just file data (whatever is in the file).
This completes the inode trace for /home/alex/foobar
: #2->#5->#31->#12
/home/alex/literature/barfoo
Let’s now trace the inode path for the name /home/alex/literature/barfoo
. This pathname is a “hard link” to /home/alex/foobar
; both the foobar
and barfoo
names point to the same inode number. Let’s see how:
The trace from ROOT through /home/alex
is the same as before. Things change in our second trace because of /home/alex/literature
. If we look at the alex
directory inode #31
we see that the name literature
is paired with inode #7
:
+----+-----+---------------------------------------------------+
#31 |. 31|.. 5 | foobar 12 | temp 15 | literature 7 | demo 6 | ... |
+----+-----+---------------------------------------------------+
The alex
directory inode #31
above says to follow the trail to the literature
name we must go to inode #7
. (We won’t know whether the #7
inode for literature
is a file or a directory until we get there!)
The box below represents the layout of names and inode numbers inside the actual disk space given to the literature
directory, inode #7
, which turns out to be a directory:
+----+-----+---------------------------------------------+
#7 |. 7 |.. 31| barfoo 12 | morestuf 123 | junk 99 | ... |
+----+-----+---------------------------------------------+
The name literature
for this inode isn’t in this inode; the name literature
is up in the alex
directory inode #31
. This literature
directory inode #7
has the name barfoo
in it, paired with inode #12
. The actual data for the thing that is barfoo
is not here; only the name barfoo
is here. You will recall that we have seen inode #12
in the previous trace.
Above, in the alex
directory (inode #31
), inode #12
was also paired with the name foobar
. In the literature
directory (inode #7
), inode #12
is paired with the name barfoo
. Inode #12
has two different names; names foobar
and barfoo
are both hard links to the same inode #12
:
$ ls -i /home/alex/foobar /home/alex/literature/barfoo
12 /home/alex/foobar 12 /home/alex/literature/barfoo
Two names means the “link count” of inode #12
is set to “two”. Both names lead to the same #12
inode and thus to the same data and same attributes. This is one single file with two names. A change to the file data using the name foobar
changes the data in inode #12
. That changes file data for the name barfoo
too; because, foobar
and barfoo
are two names for the same #12
inode storage – they are two names that point to the same storage inode.
Everything about data inode #12
except its name is kept with the inode. The only thing different in a long listing of foobar
and barfoo
will be the names; everything else (file type, permissions, owner, group, link count, size, modification times, etc.) is part of inode #12
and must therefore be identical for the two names. Neither name is more “original” than the other; both names have equal status. To release the #12
inode storage, you have to delete both names (so the link count drops to zero).
Let’s use the above inode data to follow a valid path such as:
/home/alex/literature/barfoo
Start on the left and walk the tree to the right. To be a valid Unix path, everything to the left of the rightmost slash must be a directory. (Thus, ROOT, home
, alex
, and literature
must be directories, if this is a valid pathname.)
Start with the nameless ROOT directory in front of the first slash (ROOT doesn’t have a name, since it does not appear in any parent directory) and look for the first pathname component (home
) inside that directory (inside inode #2
).
Let’s trace the pathname:
Look in the ROOT directory (located in inode #2
) for the name of the first pathname component: home
. We find the name home
inside the ROOT directory, paired with inode #5
. Go back out to the disk to find inode #5
that is the actual home
directory.
Note how the names are separate from the things they name. The actual directory inode
#5
of thehome
directory is not the same as the inode#2
of the ROOT directory that contains the directory namehome
. The name is stored in a different place (#2
) than the thing it names (#5
).
In inode #5
, the directory that has the name home
, look for the name alex
. We find alex
paired with inode #31
. Go back out to the disk to find inode #31
that is the actual alex
directory. Again, the name alex
is contained in directory inode #5
(home
) and that name is stored separately from inode #31
that is the actual alex
directory.
In inode #31
, the directory that has the name alex
, look for the name literature
. We find literature
paired with inode #7
. Go back out to the disk to find inode #7
that is the actual literature
directory. Again, the name literature
is contained in directory inode #31
(alex
) and that name is stored separately from the inode #7
that is the actual literature
directory.
In inode #7
, the directory that has the name literature
, look for the name barfoo
. We find it paired with inode #12
. Go back out to the disk to find inode #12
that is the actual data of the file barfoo
. Again, the name barfoo
is contained in directory inode #7
(literature
) and that name is stored separately from the inode #12
that is the actual data of the file. The name of a file is not part of the inode that makes up the actual file data.
You now have found the disk node (inode) that is your file data: inode #12
. The name of this file, barfoo
, is stored up in inode #7
that is the literature
directory. The name is separate from the data it names.
If file data inode #12
has appropriate permission attributes, you can read or write the data in the file. It is the permission attributes on the inode containing the file data that govern what you can do with the data. The permissions on the inode of the directory containing the name of the file (directory inode #7
) don’t control what you can do with the data of the file.
If the any of the inodes of the directories leading down to the file inode #12
don’t give you search permission, you won’t be able to reach the file’s data inode that way and won’t be able to access the file’s data using those directories; but, perhaps some other directories may lead you to the same inode #12
, if the file has another name.
To access and read the data in a file path such as:
/home/alex/literature/barfoo
you need appropriate search permissions on the ROOT directory inode, the home
directory inode, the alex
directory inode, the literature
directory inode, and finally read permissions on the barfoo
file data inode #12
.
It is the barfoo
file data inode #12
permissions that determine whether or not you can read or change the data of the file. Reading or changing the data in the file requires permissions on the inode #12
that contains the data blocks of the file itself.
It is the literature
directory inode permissions (inode #7
) that determine what you can do with the name of the file, because the literature
directory (inode #7
) is where the name barfoo
is kept. Changing, linking to, or removing the name of a file operates on the inode of the directory in which the file name appears; altering the name has nothing to do with reading or changing the inode that contains the data blocks of the file itself.
You can have no permissions on the inode that contains the data blocks of the file itself (it may even be owned by some other user) and still you may be able to rename or remove the name of the file from a directory on whose inode you do have permissions. The name(s) of a file is(are) stored in separate inodes from the data blocks of the file.
Names are separate from the things that they name. The permissions of the names are also separate from the permissions of the data.
Changing a name only requires write/execute permissions on a directory. No permissions are needed on the inode of the thing being renamed. Changing the content of a file only requires write permissions on the data inode of the file itself, not on the directory that holds the name of the file.
Normally when you do ls -l dir
you see the permissions of the contents of the directory, not the directory itself. What command and options are needed to see the access permissions and link count of a directory, instead of the contents of a directory? (RTFM)
When you are inside a directory, what is the name you use to refer to the directory itself? (This name works inside any directory.) What name always refers to the unique parent directory?
How many links (names) does a brand new, empty directory have? Why isn’t it just one link, as it is for a new file? (In other words, why does a new file have one link and a new directory have more than that?)
Why does creating a sub-directory in a directory cause the directory’s link (name) count to increase by one for every sub-directory created? (Recall that a link count is a count of names.)
Why doesn’t the link (name) count of the directory increase when you create files in the directory?
Give the Unix command and its output that shows the inode number and owners of the following directories:
/home
/root
Note: Show only one line of output for each single directory; do not show the contents of the directory. Use a command (and options) that will show only the directory itself, not its contents. (RTFM)