Compare 2 file and write result

Akshay_Hegde · February 26, 2013, 1:20am

Hi friends...I have 2 files, file1.txt and reference.txt

I could able to find difference using

diff

and following command

awk 'NR == FNR { A[$0]=1; next } !A[$0]' reference.txt file1.txt

above command listing data which is not in reference.txt

12.12    87
11.95    88
11.83    89
12.55    84
12.42    85
12.27    86
11.82    89
12.55    84.03
12.38    85.15
12.1    87.03
11.97    88
12.25    86.03
12.07    87.42
11.9    88.35
11.8    89.07

here my intension is to list all data from file1.txt and print text "not available" to data which is not available in reference.txt

output looks like this
9    72.92         available
9.75    72.77    available
10.57    72.75  available
10.53    73       available
10.43    73.5    available
10.33    74       available
9.97    76         available
11.95    88       not available
11.83    89       not available
12.55    84       not available
12.42    85       not available
12.27    86       not available
11.82    89       not available

pamu · February 26, 2013, 1:26am

awk 'NR==FNR{A[$0]=$0; next}{print A[$0]?A[$0] FS "available" : $0 FS "not available" }' reference.txt file1.txt

RudiC · February 26, 2013, 1:37am

$ awk 'NR==FNR {Av[$0]++; next} {print $0 (!Av[$0]?" not":_) " available"}' reference.txt file1.txt

Akshay_Hegde · February 26, 2013, 1:39am

Thanks pamu its working will you please explain..suppose files contains different number of columns say file1.txt contains 5 columns reference.txt contains 10 columns then

A[$0]=$0

, won't help I think

pamu · February 26, 2013, 1:52am

akshay hegde:

Thanks pamu its working will you please explain..suppose files contains different number of columns say file1.txt contains 5 columns reference.txt contains 10 columns then
A[$0]=$0
, won't help I think

Yes it won't work in that case. Because our matching criteria as $0 itself won't work here.

As per our matching pattern we can have arrays. Assume column 1 and 2 from file1.txt and column 2 and 5 from reference.txt. and print file1.txt.
Then we can use sth like this..

awk 'NR==FNR{A[$2,$5]=$0; next}{print A[$1,$2]?$0 FS "available" : $0 FS "not available" }' reference.txt file1.txt

Hope this helps you

pamu

RudiC · February 26, 2013, 1:56am

awk     'NR==FNR {Av[++max]=$0; next}
                 {NT = "not "
                  for (i=1; i<=max; i++) if (Av ~ $0) {NT=""; break}
                  print $0 "\t" NT "available"}
        '  reference.txt file1.txt

This works if the columns occur in the same order. If they don't, please! be way more specific!

Akshay_Hegde · February 27, 2013, 12:16am

Thank you RudiC and pamu both scripts are working fine.

---------- Post updated 02-27-13 at 12:16 AM ---------- Previous update was 02-26-13 at 02:10 AM ----------

here I want to include one more thing

I want to print unmatched reference file data also along with available and not available data in file1.txt

print A[$1,$2]?$0 FS "available" : $0 FS "not available"

How can I print it

I expect result like this

9            72.92          available 
9.75       72.77     available 
10.57     72.75    available 
10.53     73              available 
10.43     73.5        available
10.33     74             available 
9.97       76                available 
11.95     88              not available 
11.83     89             not available 
12.55    84              unmatched reference file 
12.42     85              unmatched reference file

RudiC · February 27, 2013, 2:50am

Are there duplicate lines in file1?
If not, you can delete Av [i]just before the break, and in an END section print out the remaining Av entries.
If yes, you need a second array indexed like Av to hold the fact of a match, and then, in the END section, print out the Av elements that do not have a "match" entry.

Akshay_Hegde · February 27, 2013, 5:45am

yes, there are some records in file1.txt, which are same as that of reference file.
script works fine as per my requirement, but for some other purpose I want to print unmatched records in reference file also along with file1.txt

Thank you for your reply I will try.

---------- Post updated at 05:45 AM ---------- Previous update was at 02:56 AM ----------

Pamu will you please help me to implement #7 in your script

awk 'NR==FNR{
A[$1,$2]=$0; next}
{
if(A[$1,$2])
print A[$1,$2]?$0 FS "available" : $0 FS "not available"
}' FS="\t" reference.txt file1.txt

pamu · February 27, 2013, 6:38am

Is this what you want....?

awk 'NR==FNR{A[$0]=$0; next}{print A[$0]?A[$0] FS "available" : $0 FS "not available";delete A[$0] }
    END{for (i in A){if(A){print A,"unmatched reference file"}}}' reference.txt file1.txt

Akshay_Hegde · February 27, 2013, 6:46am

Yes, Thank you so much..
please explain pamu I will try to learn from you

pamu · February 27, 2013, 6:55am

awk 'NR==FNR{A[$0]=$0; next}    # Here we read reference.txt file and store it in array A with index as $0.

{print A[$0]?A[$0] FS "available" : $0 FS "not available";    # Here we check $0 is available or not in Array A. If $0 is present then print as available or not available.

delete A[$0] }    # Here for those who are availabe delete those records from Array.

END{for (i in A){if(A){print A,"unmatched reference file"}}}'    # Here print those records which is not present in file1.txt

Hope this helps you

pamu

Akshay_Hegde · February 27, 2013, 6:58am

Thanks a lot for nice explanation.