Learn some command line: using du, df, file, find to make your life easier

I love the command line. If the command line were a dog, it would be a hard-headed labrador: big and somewhat intimidating, but really kind of even-tempered and friendly once she gets to know you.

I just compared the command line to my dog Roscoe. I love them both, and they both frustrate me.

I can't do much with Roscoe, but I can help out a bit with the command line. And so allow me to introduce four of my favorite utilities: df, du, file, and find.

Filesystem sizes with df

This one is easy. According to the man page, df stands for, "report file system disk space usage." I say it stands for, "disk free." But what do I know?

$ df -h

The -h tells df to report in human-readable numbers. Here, "human-readable" means "human-readable if you know the difference between G and M and K." You can also use -k (report in kilobytes) or -m (report in megabytes) if you desire. It's all up to you.

df -h gives up something like this:

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             7.4G  4.6G  2.4G  66% /
varrun               1014M  128K 1014M   1% /var/run
varlock              1014M     0 1014M   0% /var/lock
procbususb           1014M  108K 1014M   1% /proc/bus/usb
udev                 1014M  108K 1014M   1% /dev
devshm               1014M     0 1014M   0% /dev/shm
/dev/sda4              61G  7.3G   51G  13% /home
/dev/sda1              40G   17G   23G  43% /media/sda1
/dev/scd0             7.8G  7.8G     0 100% /media/cdrom0

The first column is the device. For disks, this will be something like /dev/sdan, or /dev/hdan, where in a small number. Those other filesystems with names like udev or devshm or varrun are OS-specific. This output was taken from a GNU/Linux box running a 2.6.20 kernel.

The middle three columns show the total size, the amount used, and the amount avialable, just like the title says. The Use% column indicates the total percentage used. Generally, you don't want that to read 100%, except for CDs and DVDs, which will always show 100%. The final column tells you where in your directory hierarchy the filesystem is mounted.

That's if for the Very Short Tour of df.

Directory sizes with du

Suppose df reports a filesystem is full, and you need to find the culprit fast. Let's say for illustrative purposes the filesystem is /home. Here's one of my favorite commands of all time:

$ du -k /home | sort -n

Now, technically that's two commands. du stands for "estimate file space usage," though I hate the word "usage," because "use" will almost always work instead. I like to call it "disk use," for hopefully obvious reasons. The -k specifies reporting in kilobytes, rather than filesystem blocks. You can also use -m, which specifies megabytes, if you like smaller numbers. Do not use the -h option. -h means, "print in human-readable form," which will break our nifty sort operation.

The '|' (official name: "bar thingy") means "pipe." "Pipe" means, "take the output of this command, and pass it to the next command." In even simpler terms, this means "route STDOUT (standard out) of the first program to STDIN (standard in) of the next program."

sort sorts lines of data, just as the name implies. It isn't short for "somehow order random text" or anything like that. It just means, "sort." The -n option specifies to sort as if the first word were a number, rather than to sort it ASCIIbetically. For fun, try the sort without the -n. You'll quickly observe that "1" sorts before "101" which sorts before "2." For our purposes, the -n is quite important.

On my machine, that command gives this output:

4       /home/tony/.config/xfce4/orage
4       /home/tony/.config/xfce4/xfwm4
4       /home/tony/docs/fsm
4       /home/tony/docs/stories/speleology
4       /home/tony/.gimp-2.2/brushes
     .
     .
     .
512564  /home/tony/src
685672  /home/tony/tmp/zips
714508  /home/tony/tmp/iso
789240  /home/tony/tmp/tony
813236  /home/tony/video/roscoe
881512  /home/tony/video/family
1694756 /home/tony/video
3835596 /home/tony/tmp
7442492 /home/tony
7442496 /home

As you can see, I have a lot of stuff in /home/tony/tmp. I would look there for things to remove to free up space.

What kind of file is it?

Unlike some operating systems, GNU/Linux (and Unix-like operating systems in general) don't use filename extensions to determine the type of a file. So, a text file does not have to end in .txt, and a jpeg-encoded image file does not have to end in .jpg. Instead, there is a nifty utility called file that will report the filetype for you.

It's really pretty easy to use:

$ file blah.c
blah.c: ASCII C program text

It's really that simple.

Of coure, it uses magic. /etc/magic. Really. I'm not kidding.

Finding files

Find is one of the unsung heroes of the Free software world. Many do not appreciate the functional finesse, the streamlined beauty of this perfect utility. Find can search for files based on name, on size, on ownership, on permissions, on modification time, on access time, on... well, just about anything. Combined with other utilities, you can search on content or file type.

For instance, to find all files ending in .c:

$ find /home -name \*.c -print

The /home tells find to start the search in the /home directory. The -name *.c specifies the pattern for which to search. The * means "anything," followed by .c, which means just that: search for anything ending in .c. The -print is the "predicate;" that is, the action we wish to perform on the things we find. We can do more than just print out filenames.

This gives the following output:

/home/tony/src/gnome/gnome-columns/src/jewel.c
/home/tony/src/gnome/gnome-columns/src/texture.c
/home/tony/src/gnome/gnome-columns/src/renderable.c
/home/tony/src/gnome/gnome-columns/src/rectangle.c
/home/tony/src/gnome/gnome-columns/src/gnome-columns.c
/home/tony/src/gnome/gnome-columns/src/gameboard.c

(There was really a lot more output, but I wanted to keep the display simple.)

Search for files that have been recently changed:

$ find . -ctime -1 -print

This time I specified the start directory as ".", which is the current directory. I've specified the search criteria as -ctime -1, which means "change time, less than one day ago. Again, I specified print. Here is the output:

.
./blah

It returned only one file, blah. How boring.

Let's do something a little more interesting. Let's look for all PDFs in my home directory:

$ find ~ -exec file {} \; | grep PDF

I commanded find to start in my home directory by using the squiggly, '~'. (Actually, it's called a "tilde.") Then I specified it to execute a command, using -exec file {} \; The -exec is a predicate to cause find to execute a command, in this case, file. The '{}' bit means, "substitute the filename here." When find generates the command, it'll be something like, "file ~/stupidname.ext". The '\;' bit marks the end of the executable command. Then I pipe the output to grep, which prints only the lines containing "PDF".

There are better ways of doing this, especially using a command called xargs, but I don't cotton to those new-fangled methods. Well, I do, but you must first learn to crawl before you can fly on the space shuttle.

Here's the output:

/home/tony/src/beerhacker/Documentation/BeerXML_v2_01.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/ewlbook/pre-rendered/ewlbook.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/ewlbook/pre-rendered/ewlbook.es.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/cookbook/pre-rendered/eflcookbook.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/cookbook/pre-rendered/eflcookbook.fr.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/cookbook/pre-rendered/eflcookbook.es.pdf: PDF document, version 1.3
/home/tony/src/e17/docs/cookbook/pre-rendered/eflcookbook.pt-BR.pdf: PDF document, version 1.3

Finally, let's use find to delete all our old emacs backup files. WARNING! DANGER, WILL ROBINSON! THIS IS VERY DANGEROUS! Be very careful when using find do do file manipulation. Always print out the results of find before executing a dangerous command.

First, do this:

$ find . -name \*~ -print

This prints all the files that end in ~, starting in the current directory. Once you are sure you won't miss these files, do this:

$ find . -name \*~ -exec rm {} \;

That's it! You are now wise in the ways of a couple of minor file utilities. As always, enjoy playing around with them. Be safe. Don't run with scissors, or shave with a rusty razor. Remember that cats have five pointy ends, and that with powerful knowledge comes powerful responsibility. Don't abuse these tools, and they will treat you right until the end of your days.

License

Verbatim copying and distribution of this entire article are permitted worldwide, without royalty, in any medium, provided this notice is preserved.