Hi.
I may be comparing apples(MS systems) to oranges (*nix systems), but here is a timing comparison of stat and cmp on a GNU/Linux box, with 2 identical files:
#!/usr/bin/env bash
# @(#) s1 Demonstrate compare timings for stat and cmp.
# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C stat cmp
N=${1-10000}
pl " Input data file f1 f2:"
specimen -3 -n f1 f2 | cut -c1-78
pl " Results, time for $N stat calls:"
rm -f f3
time for ((i=1;i<=$N;i++))
do
s1=$(stat -c%s f1)
s2=$(stat -c%s f2)
if [ "$s1" != "$s2" ]
then
pe "f1" >> f3
fi
done
if [ -e f3 ]
then
pe " Lines in f3: $(wc -l <f3)"
fi
pl " Results, time for $N cmp calls:"
rm -f f3
time for ((i=1;i<=$N;i++))
do
if ! cmp f1 f2
then
pe "f1" >> f3
fi
done
if [ -e f3 ]
then
pe " Lines in f3: $(wc -l <f3)"
fi
exit 0
./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0.8 (lenny)
bash GNU bash 3.2.39
stat (GNU coreutils) 6.10
cmp (GNU diffutils) 2.8.1
-----
Input data file f1 f2:
Edges: 3:0:3 of 17777 lines in file "f1"
1 Preliminary Matter.
2
3 This text of Melville's Moby-Dick is based on the Hendricks House editi
---
17775 THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWK
17776 D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AN
17777 KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RE
Edges: 3:0:3 of 17777 lines in file "f2"
1 Preliminary Matter.
2
3 This text of Melville's Moby-Dick is based on the Hendricks House editi
---
17775 THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWK
17776 D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AN
17777 KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RE
-----
Results, time for 10000 stat calls:
real 0m39.595s
user 0m10.397s
sys 0m27.694s
-----
Results, time for 10000 cmp calls:
real 0m55.188s
user 0m27.122s
sys 0m25.958s
So perhaps MS sysems require a lot more work to get the size.
For the case of perl, that same amount of work for stat can be done in under 0.1 seconds real time:
#!/usr/bin/env perl
# @(#) p1 Demonstrate stat on open (and un-opened) files.
use strict;
use warnings;
my ($debug);
$debug = 1;
$debug = 0;
my ( $f1, $f2, $f3 );
my ( $s1, $s2, $i, $j, $N );
# Make sure files are around, then close them.
open( $f1, "<", "f1" ) || die " Cannot open file f1\n";
open( $f2, "<", "f2" ) || die " Cannot open file f2\n";
open( $f3, ">", "f3" ) || die " Cannot open file f3 for write\n";
$s1 = ( stat("f1") )[7];
$s2 = ( stat("f2") )[7];
print " Length of f1, f2: $s1, $s2\n";
close $f1;
close $f2;
$j = 0;
$N = 10;
$N = 10000;
for ( $i = 1; $i <= $N; $i++ ) {
$s1 = rand();
$s2 = rand();
$s1 = ( stat("f1") )[7];
$s2 = ( stat("f2") )[7];
if ( $s1 != $s2 ) {
print $f3 " Found mismatch at iteration $i\n";
$j++;
}
print " Length of f1, f2: $s1, $s2\n" if $debug;
}
print STDERR " Called stat $i (-1) times on each file, compared sizes.\n";
if ( $j != 0 ) {
print STDERR " File f3 was written to $j times.\n";
}
exit(0);
producing:
time ./p1
Length of f1, f2: 1205404, 1205404
Called stat 10001 (-1) times on each file, compared sizes.
real 0m0.091s
user 0m0.036s
sys 0m0.040s
However, as I mentioned I don't know about MS systems. It does seem odd that obtaining the length of a file (in *nix, just pull in the length from the inode), whereas reading every byte in two files and comparing them would be so different (and on the wrong side, it seems to me).
Best wishes ... cheers, drl