6.9 Finding the Size of a File
The amount of memory occupied by a file can be obtained by using the -s command. It looks like a file test operator, but in addition to checking for the existence of a file, it returns the size of the file in bytes.
The following program takes a file name as command line argument and if the file exists, returns its size.
Program 6.12
#!/usr/bin/perl
use strict;
my ($file) = @ARGV;
if (-s $file){
print "Size of file $file = " . (-s $file) . "\n";
}
else{
print "File $file does not exist.\n";
}
If there is a file called copy.pl in the current directory, and the program given above is stored in a file called size.pl, the call
%size.pl copy.pl
produces the following output.
Size of file copy.pl = 192
If the file name given is actually a directory, it returns a certain size that the directory occupies. On the system the author is working with, the size of every directory is 1024 bytes. The program does not return the cumulative size of files contained inside the directory.
We now write a program that takes a list of files as command line argument and prints out the names of the files in sorted order from the largest to the smallest.
Program 6.13
#!/usr/bin/perl
use strict;
my (@files) = @ARGV;
my ($file, %size);
foreach $file (@files){
$size{$file} = (-s $file);
}
foreach $file (sort bySize (keys %size)){
printf "%-20s%10d\n", $file, $size{$file};
}
sub bySize{
$size{$b} <=> $size{$a};
}
The program stores the sizes of files in bytes in a hash table %size. The key is the name of a file and the corresponding value is the size of the file. It prints the files sorted in descending order of size. It does so by sorting the keys in %size such that keys of files of larger size occur before keys of files of smaller size. The expression that does this sorting is given below.
sort bySize (keys %size)
Here bySize is a subroutine that takes two predefined arguments $a and $b and returns a value of -1, 0, or 1 depending on whether $a is before, equal, or after $b in sort order. The value returned by bySize is determined by the following statement.
$size{$b} <=> $size{$a};
Here, the ship operator <=> returns -1 if the value corresponding to $b in %size is less than the value corresponding to $a. In other words, the key (or, the name of a file) with smaller size is pushed to the back in the sort order. Similarly, a file name with bigger size is pushed to the front. If two files have the same size, their order is arbitrary.
Assuming the program is stored in the file size1.pl, the output of the program when called as
%size1.pl *
looks something like what is given below.
readdirR2.pl 1509 readdirR3.pl 1121 file-read1.pl 1066 a 1024 b 1024 c 1024 remove 1024 readdirR.pl 882 file-age.pl 861 readdirR1.pl 776 include 732 readdirR0.pl 717 readdirR4.pl 717 tls 680 oldest.pl 483 copy1.pl 438 argv.pl 403 readdir1.pl 383 tls2 362 read-file.pl 314 file-read.pl 313 size1.pl 271 readdir.pl 215 copy0.pl 192 copy.pl 192 size.pl 171 printfile2 162 fileread.pl 160 printfile.pl 152 printfile2.1 127 printfile2.2 51 printfile1.pl 39 junk 0
Note that all the directories, in the version of Linux operating system that the author uses, have size of 1024, as mentioned earlier.
Next, we extend the program to examine directories recursively. The program given below takes a list of files and directories, and returns the list of files and directories sorted by their size. Directories are opened recursively. The program given here is a modification of one of the recursive programs given earlier.
Program 6.14
#!/usr/bin/perl
use strict;
$" = "\n";
my %size;
sub sizeR{
my @FDList = @_;
my $first = shift @FDList;
if (!$first){
return ();
}
elsif (-f $first){
$size{$first} = (-s $first);
sizeR (@FDList);
}
elsif (-d $first){
opendir DIR, $first || warn "Cannot open directory $first: $!";
my @files = readdir DIR ;
closedir DIR;
@files = grep {$_ !~ /^[.]{1,2}$/} @files;
@files = map {"$first/$_"} @files;
$size{$first} = (-s $first);
sizeR (@files);
sizeR(@FDList);
}
}
sub bySize{
$size{$b} <=> $size{$a};
}
####main program###########
my @FDToRead = @ARGV;
if (!@FDToRead)
{@FDToRead = ".";}
sizeR (@FDToRead);
my $file;
foreach $file (sort bySize keys %size){
printf "%-30s%10d\n", $file, $size{$file};
}
This program opens directories recursively by using the subroutine sizeR. Whenever it looks at a file or a directory, it stores the size of the file in the hash %size.
The subroutine sizeR determines the action to take based on the first element of the argument list passed to it. If the argument list is empty, the subroutine terminates. If the first element is a file, it stores its size in the hash table %size. If the first argument is a directory, it stores its size in the hash as well. In addition, it calls the subroutine sizeR recursively twice: once with the list of files in the directory, and then with the list of all but the first element of the list with which
sizeR is originally called. The effect of these two recursive calls is to find the size of all files and directories included recursively.
The program prints the contents of the hash in descending size order at the end. The output of this program for the call
%sizeR.pl *
is something like what is given below.
readdirR2.pl 1509 readdirR3.pl 1121 file-read1.pl 1066 a/ad1/ad1d1 1024 a/ad2 1024 c 1024 a/ad3 1024 remove 1024 a 1024 a/ad1 1024 b 1024 readdirR.pl 882 file-age.pl 861 sizeR.pl 843 readdirR1.pl 776 include 732 sizeR.pl~ 717 readdirR0.pl 717 readdirR4.pl 717 tls 680 oldest.pl 483 copy1.pl 438 argv.pl 403 readdir1.pl 383 tls2 362 remove/rmTeXfiles1.pl 326 remove/rmTeXfiles2.pl 316 read-file.pl 314 file-read.pl 313 remove/rmTeXfiles.pl 274 size1.pl 271 readdir.pl 215 copy.pl 192 copy0.pl 192 size.pl 171 printfile2 162 fileread.pl 160 printfile.pl 152 printfile2.1 127 printfile2.2 51 printfile1.pl 39 a/a2 0 a/a3 0 junk 0 a/ad1/ad1a 0 a/ad1/ad1c 0 a/ad1/ad1b 0 a/a1 0