6.9 Finding the Size of a File

6.9  Finding the Size of a File

  The amount of memory occupied by a file can be obtained by using the -s command. It looks like a file test operator, but in addition to checking for the existence of a file, it returns the size of the file in bytes.

The following program takes a file name as command line argument and if the file exists, returns its size.

 Program 6.12

#!/usr/bin/perl

use strict;

my ($file) = @ARGV;
if (-s $file){
   print "Size of file $file = " . (-s $file) . "\n";
}
else{
    print "File $file does not exist.\n";
}

If there is a file called copy.pl in the current directory, and the program given above is stored in a file called size.pl, the call

 

%size.pl copy.pl

 

produces the following output.

 

Size of file copy.pl = 192

 

If the file name given is actually a directory, it returns a certain size that the directory occupies. On the system the author is working with, the size of every directory is 1024 bytes. The program does not return the cumulative size of files contained inside the directory.

We now write a program that takes a list of files as command line argument and prints out the names of the files in sorted order from the largest to the smallest.

 Program 6.13

#!/usr/bin/perl

use strict;

my (@files) = @ARGV;
my ($file, %size);

foreach $file (@files){
    $size{$file} = (-s $file);
}

foreach $file (sort bySize (keys %size)){
    printf "%-20s%10d\n", $file, $size{$file};
}
    
sub bySize{
    $size{$b} <=> $size{$a};
}

The program stores the sizes of files in bytes in a hash table %size. The key is the name of a file and the corresponding value is the size of the file. It prints the files sorted in descending order of size. It does so by sorting the keys in %size such that keys of files of larger size occur before keys of files of smaller size. The expression that does this sorting is given below.

 

sort bySize (keys %size)

 

Here bySize is a subroutine that takes two predefined arguments $a and $b and returns a value of -1, 0, or 1 depending on whether $a is before, equal, or after $b in sort order. The value returned by bySize is determined by the following statement.

 

$size{$b} <=> $size{$a};

 

 Here, the ship operator <=> returns -1 if the value corresponding to $b in %size is less than the value corresponding to $a. In other words, the key (or, the name of a file) with smaller size is pushed to the back in the sort order. Similarly, a file name with bigger size is pushed to the front. If two files have the same size, their order is arbitrary.

Assuming the program is stored in the file size1.pl, the output of the program when called as

 

%size1.pl *

 

looks something like what is given below.

readdirR2.pl              1509
readdirR3.pl              1121
file-read1.pl             1066
a                         1024
b                         1024
c                         1024
remove                    1024
readdirR.pl                882
file-age.pl                861
readdirR1.pl               776
include                    732
readdirR0.pl               717
readdirR4.pl               717
tls                        680
oldest.pl                  483
copy1.pl                   438
argv.pl                    403
readdir1.pl                383
tls2                       362
read-file.pl               314
file-read.pl               313
size1.pl                   271
readdir.pl                 215
copy0.pl                   192
copy.pl                    192
size.pl                    171
printfile2                 162
fileread.pl                160
printfile.pl               152
printfile2.1               127
printfile2.2                51
printfile1.pl               39
junk                         0

Note that all the directories, in the version of Linux operating system that the author uses, have size of 1024, as mentioned earlier.

Next, we extend the program to examine directories recursively. The program given below takes a list of files and directories, and returns the list of files and directories sorted by their size. Directories are opened recursively. The program given here is a modification of one of the recursive programs given earlier.

 Program 6.14

#!/usr/bin/perl
use strict;
$" = "\n";
my %size;

sub sizeR{
    my @FDList = @_;
    my $first = shift @FDList;

    if (!$first){
        return ();
    }
    elsif (-f $first){
        $size{$first} = (-s $first);
        sizeR (@FDList);
    }
    elsif (-d $first){
        opendir DIR, $first || warn "Cannot open directory $first: $!";
        my @files = readdir DIR ;
        closedir DIR;
        @files = grep {$_ !~ /^[.]{1,2}$/} @files;
        @files = map {"$first/$_"} @files; 
        $size{$first} = (-s $first);
        sizeR (@files);
        sizeR(@FDList);
    }
}

sub bySize{
    $size{$b} <=> $size{$a};
}

####main program###########
my @FDToRead = @ARGV;
if (!@FDToRead)
   {@FDToRead = ".";}

sizeR (@FDToRead);

my $file;
foreach $file (sort bySize keys %size){
    printf "%-30s%10d\n", $file, $size{$file};
}

This program opens directories recursively by using the subroutine sizeR. Whenever it looks at a file or a directory, it stores the size of the file in the hash %size.

The subroutine sizeR determines the action to take based on the first element of the argument list passed to it. If the argument list is empty, the subroutine terminates. If the first element is a file, it stores its size in the hash table %size. If the first argument is a directory, it stores its size in the hash as well. In addition, it calls the subroutine sizeR recursively twice: once with the list of files in the directory, and then with the list of all but the first element of the list with which
sizeR is originally called. The effect of these two recursive calls is to find the size of all files and directories included recursively.

The program prints the contents of the hash in descending size order at the end. The output of this program for the call

 

%sizeR.pl *

 

is something like what is given below.

readdirR2.pl                        1509
readdirR3.pl                        1121
file-read1.pl                       1066
a/ad1/ad1d1                         1024
a/ad2                               1024
c                                   1024
a/ad3                               1024
remove                              1024
a                                   1024
a/ad1                               1024
b                                   1024
readdirR.pl                          882
file-age.pl                          861
sizeR.pl                             843
readdirR1.pl                         776
include                              732
sizeR.pl~                            717
readdirR0.pl                         717
readdirR4.pl                         717
tls                                  680
oldest.pl                            483
copy1.pl                             438
argv.pl                              403
readdir1.pl                          383
tls2                                 362
remove/rmTeXfiles1.pl                326
remove/rmTeXfiles2.pl                316
read-file.pl                         314
file-read.pl                         313
remove/rmTeXfiles.pl                 274
size1.pl                             271
readdir.pl                           215
copy.pl                              192
copy0.pl                             192
size.pl                              171
printfile2                           162
fileread.pl                          160
printfile.pl                         152
printfile2.1                         127
printfile2.2                          51
printfile1.pl                         39
a/a2                                   0
a/a3                                   0
junk                                   0
a/ad1/ad1a                             0
a/ad1/ad1c                             0
a/ad1/ad1b                             0
a/a1                                   0