Compare to flat files using awk

veeruasu · May 24, 2013, 5:32am

compare to flat files using awk .but in 4th field contains non ordered substring. how to do that.
file1.txt

john|0.0|4|**:25;JP:50;UY:25

file2.txt

andy|0.0|4|JP:50;**:25;UY:25

Yoda · May 24, 2013, 5:43am

An approach using gawk:

gawk -F\| '
        NR == FNR {
                A[$1,$2,$3] = $4
                next
        }
        ($1,$2,$3) in A {
                i = split ( A[$1,$2,$3], I, ";" )
                j = split ( $4, J, ";" )
                asort( I )
                asort( J )
                if ( i == j )
                {
                        for ( k = 1; k <= j; k++ )
                        {
                                if ( I[k] == J[k] )
                                        F = 1
                                if ( I[k] != J[k] )
                                {
                                        F = 0
                                        break
                                }
                        }
                        if ( F )
                                print
                }
        }
' file1 file2

veeruasu · May 24, 2013, 6:11am

after comparision print mismatch field ...

---------- Post updated at 03:41 PM ---------- Previous update was at 03:39 PM ----------

ex: file1 john|0.0|4|:25;JP:50;UY:25
file2 andy|0.0|4|JP:50;:25;ZY:25
in file2 instead of UY .. ZY is present ... in output error in 4th field needs to be print

Yoda · May 24, 2013, 1:27pm

I don't understand! I do see that it is not just the 4th field that has mismatch, but also the 1st field (john/andy).

So how do you match records in both the files and identify mismatch in 4th field?

Tell us what criterion should be used to match records in both the files?

I would suggest you to post a sample input and desired output in code tags. Explain your requirement clearly and also show us what have you tried.

veeruasu · July 9, 2013, 3:09am

ex: file1 john|0.0|4|:25;JP:50;UY:25
file2 john|0.0|4|JP:50;:25;UY:25
in above two files i need to check compare fourth field , but they are interchaged.