Hi.. i am running nawk scripts on solaris system to get records of file1 not in file2 and find duplicate records in a while with the following scripts -compare
nawk '{a[$0]++}END{for(i in a){if(a-1)print i,a}}' file1
in the middle of script I get an error message saying nawk: out of space in tostring on record 971360... I am using a file having 2 million records. Please suggest.. It is very very important...
I searched and came to know that gawk can solve this, but it won't run on Solaris..
Thanks Jim.. But I want to avoid using sort as that would reorganise my file and hence display of records which I want to avoid.. Is there not any other solution except using gawk as I don't have much control on my machine..
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
# process any FOO=bar switches
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
line: while (<>) {
chomp; # strip record separator
if ($. == ($.-$FNRbase)) {
$a{$_}++;
next line;
}
if (!$a{$_}) {
print 'line' . ($.-$FNRbase) . $_;
}
}
continue {
$FNRbase = $. if eof;
}
then try running it on a sample of your data to be sure it seems to do the right thing. Supply file names as you did for nawk (or possibly with the order reversed).
The Solaris box I have seem to have omiited processor a2p which automates the work of converting awk to perl. This was done on:
Have you tried using /usr/xpg4/bin/awk instead of nawk ? I don't remember if there is much difference between those two versions of awk on Solaris systems in the way they handle memory management, but it might be worth a try if the perl script doesn't work for you.
Correct your code, your double quote is mismatched also..
To compare files
nawk 'NR==FNR{a[$0];next;} !($0 in a){print "line:" FNR $0}' file1 file2
for duplicate try this
nawk '{A[$0]++}END{for(i in A)if(A>1)print i,A}' file
!a[$0] --> using a[$0] creates an extra empty array element for every $0 that does not exist in array a while reading the second file, so best thing is to do !($0 in a)
I really don't trust a[$0] to compare its my personal experience.
Just add the length and the original order to the records, sort, and then the last two fields is used to make the original record and display the original order.
That's a good point. The lengths are certainly different here:
$ ls -lig /usr/bin/awk /usr/bin/nawk /usr/xpg4/bin/awk /usr/bin/oawk
7456 -r-xr-xr-x 2 bin 80184 Jan 8 2007 /usr/bin/awk
7500 -r-xr-xr-x 1 bin 110100 Jan 8 2007 /usr/bin/nawk
7456 -r-xr-xr-x 2 bin 80184 Jan 8 2007 /usr/bin/oawk
35654 -r-xr-xr-x 1 bin 66816 Oct 10 2007 /usr/xpg4/bin/awk
but I have no idea about the internals. Note that oawk ("old awk") and awk are the same binary.
This system:
OS, ker|rel, machine: SunOS, 5.10, i86pc
Distribution : Solaris 10 10/08 s10x_u6wos_07b X86