compare to flat files using awk .but in 4th field contains non ordered substring. how to do that.
file1.txt
john|0.0|4|**:25;JP:50;UY:25
file2.txt
andy|0.0|4|JP:50;**:25;UY:25
compare to flat files using awk .but in 4th field contains non ordered substring. how to do that.
file1.txt
john|0.0|4|**:25;JP:50;UY:25
file2.txt
andy|0.0|4|JP:50;**:25;UY:25
An approach using gawk:
gawk -F\| '
NR == FNR {
A[$1,$2,$3] = $4
next
}
($1,$2,$3) in A {
i = split ( A[$1,$2,$3], I, ";" )
j = split ( $4, J, ";" )
asort( I )
asort( J )
if ( i == j )
{
for ( k = 1; k <= j; k++ )
{
if ( I[k] == J[k] )
F = 1
if ( I[k] != J[k] )
{
F = 0
break
}
}
if ( F )
print
}
}
' file1 file2
after comparision print mismatch field ...
---------- Post updated at 03:41 PM ---------- Previous update was at 03:39 PM ----------
ex: file1 john|0.0|4|:25;JP:50;UY:25
file2 andy|0.0|4|JP:50;:25;ZY:25
in file2 instead of UY .. ZY is present ... in output error in 4th field needs to be print
I don't understand! I do see that it is not just the 4th field that has mismatch, but also the 1st field (john/andy).
So how do you match records in both the files and identify mismatch in 4th field?
Tell us what criterion should be used to match records in both the files?
I would suggest you to post a sample input and desired output in code tags. Explain your requirement clearly and also show us what have you tried.
ex: file1 john|0.0|4|:25;JP:50;UY:25
file2 john|0.0|4|JP:50;:25;UY:25
in above two files i need to check compare fourth field , but they are interchaged.