I have two files in this format. The files contain the statistics of tables as seen below. The other file is also in this format. I need to compare both the files and if there is a mismatch i need to display the contents within the break lines from both the files for that corresponding table.
awk '
/^Table Name/ { # Select lines starting with 'Table Name'
table = $3; # Memorize table name into variable
tables++ # and array
} #
! NF || /^-+/ { # Select empty and delimiter lines
next # Proceed next line (skip selected lines)
} #
NR == FNR { # Select lines from first input file
stats1 = stats1 $0 ORS; # Memorize stats's table
next # Proceed next line
}
{ # Lines comes from second input file
stats2 = stats2 $0 ORS; # Memorize stats's table
next # Proceed next line
} #
END { # All files have been read
for (t in tables) { # For all memorized tables
if (stats1[t] != stats2[t]) { # If stats mismatch
out = "==========================================" ORS;
out = out stats1[t] ORS stats2[t]; #
print out # Output sep line and stats
} #
} #
} #
' stats1.dat stats2.dat
Does stats1 and stats2 refer to the two input files? Will the code work if i give the path name like /home/frk/ragav/stats1 and /home/frk/ragav/stats2 instead of the file names? Then how should the code be modified?
If i assign the path like /home/frk/ragav/stats1 to a variable how can i call the path in the code?
When i assigned the file name to a variable like
a=stats1.txt
b=stats2.txt
and changed the code to
nawk '
/^Table Name/ { table = $3 ; tables[table]++ }
! NF || /^-+/ { next }
NR == FNR { $e[table] = $e[table] $0 ORS ; next }
{ $f[table] = $f[table] $0 ORS ; next }
END {
for (t in tables) {
if ($e[t] != $f[t]) {
out = "-----------------------------------------------------------------" ORS
out = out $e[t] ORS $f[t]
print out
}
}
}
' $e $f >> result.out
i am getting this error.
nawk: illegal field $()
input record number 1, file startendcut1.txt
source line number 4
Can you please help on the above two ways of modifying the code?
In my script, stats1 and stats2 inside awk code are arrays.
stats1.dat and stats2.dat are the input files.
The inputfiles can be specified with the full path.
Don't use $e and $f as arrays in your awk code, use fixed names (stats1 and stats2 or what you want like first_array and second_array...)
a=stats1.txt
b=stats2.txt
nawk '
/^Table Name/ { table = $3 ; tables++ }
! NF || /^-+/ { next }
NR == FNR { stats1 = stats1 $0 ORS ; next }
{stats2 = stats2 $0 ORS ; next }
END {
for (t in tables) {
if (stats1[t] != stats2[t]) {
out = "-----------------------------------------------------------------" ORS
out = out stats1[t] ORS stats2[t]
print out
}
}
}
' $e $f >> result.out
The spacing in the input files which i have received is not uniform. There are spacing differences in both the files.So the output was not correct.How can this script be modified to ignore spacing difference? Please help.I should deliver it in another six hours.
nawk '
/^Table Name/ { table = $3 ; tables[table]++ }
! NF || /^-+/ { next }
NR == FNR { stats1[table] = stats1[table] $0 ORS ; next }
{stats2[table] = stats2[table] $0 ORS ; next }
END {
for (t in tables) {
if (stats1[t] != stats2[t]) {
out = "-----------------------------------------------------------------" ORS
out = out stats1[t] ORS stats2[t]
print out
}
}
}
' $e $f >> result.out
The content of the input files which i have received
There are spaces between the fields. For example considering this
SUM(F1): 3739. There could be spacing between these two words. But the spacing difference is not constant in both the files.And the spacing is not constant between the field and the field value.Can you help me in bringing the output into a particular single format in both the files ignoring the spacing differences and compare and only print exactly the mismatches?I am not able to show the spacing differences here.
But in the other file this one line is broken into two.
Row Count:96 SUM(F1): 3739 MAX(F1):77 MIN(F1): 0 AVG(F1): 38.9479167
LENGTH(LINE): 2260
Hence the script is showing this as a mismatch.
This is a single line in the file2.txt you posted. However, there are other examples where a long line has been folded (e.g. AQ$_FT_Q_BECMD_I in file1.txt)
I would propose that you normaize file1.txt and file2.txt so that these differences are removed. Change runs of more than one space to a single space, and remove a newline just before tilde (optionally with spaces before it). Then compare the resulting files instead.
When i ran that code before running my comparison code
perl -0777 -pi~ -e 's/ *\n *~/ ~/g; s/ */ /g' file1.txt file2.txt
All lines starting with that tilde "~~" symbol were removed off for tables having huge number of columns.I dont want this to happen. They were not getting added to a single line.I think it is because of the line limit of the file.
They are not removed, they are merged with the previous line. However, seems that you have DOS carriage returns in there too, so you need to change \n to \r?\n in the script -- sorry for missing that.
The file format is kind of messy, so you might need additional normalizations still. Possibly add something like*s/:\s*(\d+)\s*(~|$)/: $1 $2/g to make all numbers between a colon and a tilde (or end of line) separated by whitespace on both sides.