Need the line number of failed records

Hi awk Gurus,

I have file as below :

file1.txt

7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1

file2.txt

7000,john,2,0,0,1,6
7000,john,2,0,0,1,7
7000,john,2,0,0,1,8
7000,john,2,0,0,1,9
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
7003,mike,1,0,0,2,2
7003,mike,1,0,0,2,3
7003,mike,1,0,0,2,4
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7

I needed the output such that, for the ones in file2 which is not present in file1 is displayed. I have the awk as below which performs the operation :

awk -F',' -v file1="$1" -v file2="$2" 'NR == FNR {a[$1,$2,$3,$4]++;next} !(a[$1,$2,$3,$4])' OFS="," file1.txt file2.txt

the Output is as below

7000,john,2,0,0,1,7
7000,john,2,0,0,1,8
7000,john,2,0,0,1,9
7003,mike,1,0,0,2,2
7003,mike,1,0,0,2,3
7003,mike,1,0,0,2,4
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7

Now, i need the output such that along with above, the line numbers of non matching lines of file2.txt is also displayed. I am struck here. Can someone please help ?

Thanks,
Arun

try

awk -F',' -v file1="$1" -v file2="$2" 'NR == FNR {a[$1,$2,$3,$4]++;next} !(a[$1,$2,$3,$4]); END{for(X in a) N++; print N }' OFS="," file1.txt file2.txt

Can't you just add FNR to your output?

awk -F',' -v file1="$1" -v file2="$2" 'NR == FNR {a[$1,$2,$3,$4]++;next} !(a[$1,$2,$3,$4]) {print FNR OFS $0}' OFS="," file1.txt file2.txt
1 Like
=> awk -F',' -v file1="$1" -v file2="$2" 'NR == FNR {a[$1,$2,$3,$4]++;next} !(a[$1,$2,$3,$4]); END{for(X in a) N++; print N }' OFS=","  1.txt 2.txt
7000,john,2,0,0,1,6
7000,john,2,0,0,1,7
7000,john,2,0,0,1,8
7000,john,2,0,0,1,9
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
7003,mike,1,0,0,2,2
7003,mike,1,0,0,2,3
7003,mike,1,0,0,2,4
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7
10

Above is the output, i.e. the file2.txt is getting displayed. I actually needed the line number of failed lines. Please suggest

---------- Post updated at 08:55 PM ---------- Previous update was at 08:44 PM ----------

This will print all the lines :

=> awk -F',' -v file1="$1" -v file2="$2" 'NR == FNR {a[$1,$2,$3,$4]++;next} !(a[$1,$2,$3,$4]) {print FNR FS $0}' OFS="," 1.txt 2.txt
1,7000,john,2,0,0,1,6
2,7000,john,2,0,0,1,7
3,7000,john,2,0,0,1,8
4,7000,john,2,0,0,1,9
5,7001,elen,2,0,0,1,7
6,7002,sami,2,0,0,1,6
7,7003,mike,1,0,0,2,1
8,7003,mike,1,0,0,2,2
9,7003,mike,1,0,0,2,3
10,7003,mike,1,0,0,2,4
11,8001,nike,1,2,4,1,8
12,8002,paul,2,0,0,2,7

I only added FNR, the rest is your original code - which you said 'performs the operation', so I assumed it was correct :p.

EDIT: Shouldn't !(a[$1,$2,$3,$4]) be !(a[$1,$3,$6,$7) ?

EDIT2: Minor bug in my addition - FS should really be OFS (although they're the same in this case).

1 Like

I am sorry, my mistake.

=> awk -F',' -v file1="$1" -v file2="$2" 'NR == FNR {a[$1,$2,$3,$4]++;count++;next} !(a[$1,$3,$6,$7]){print FNR FS $0}' OFS="," 1.txt 2.txt
2,7000,john,2,0,0,1,7
3,7000,john,2,0,0,1,8
4,7000,john,2,0,0,1,9
8,7003,mike,1,0,0,2,2
9,7003,mike,1,0,0,2,3
10,7003,mike,1,0,0,2,4
11,8001,nike,1,2,4,1,8
12,8002,paul,2,0,0,2,7

It works great :slight_smile: Thank you.

Why not try a useful unix code use which could open up many options..
Play with this...

Cat $file2 | awk-vfile1="$file1" -vFS="," '{ sprintf(cmd,"grep %s %s",$1,file1); result=system(cmd); if (result != 0){print $0;}}'

Also please suggest if the error'd string can also added in the output with appropriate message as below (I did not request for this previously):

column 4 in file1.txt having value 6 and column 7 in file2.txt does not match

=>      awk -F',' -v file1="$1" -v file2="$2" 'NR == FNR {a[$1,$2,$3,$4]++;count++;next} !(a[$1,$3,$6,$7]){print FNR FS $0}' OFS="," 1.txt 2.txt
2,7000,john,2,0,0,1,7
3,7000,john,2,0,0,1,8
4,7000,john,2,0,0,1,9
8,7003,mike,1,0,0,2,2
9,7003,mike,1,0,0,2,3
10,7003,mike,1,0,0,2,4
11,8001,nike,1,2,4,1,8
12,8002,paul,2,0,0,2,7

Your requirement isn't very clear, I'm afraid. Isn't that the same output as before?

Currently, only the non matching values in file2 with file1 is getting displayed without any information about the file1 values. I actually requested for help on the output such that the exact failure place during a match between a value in file1 with file2 is displayed for user. I mean more information regarding the failure.

What kind of information? You already have all the file1 values in the output (since they matched the file2 columns).

Give some example input/output...

[carun@fattony] /u/carun/MTNN>
=> cat file2.txt
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1

[carun@fattony] /u/carun/MTNN>
=> cat file1.txt
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,5

=> awk -F',' -v file1="$1" -v file2="$2" 'NR == FNR {a[$1,$2,$3,$4]++;count++;next} !(a[$1,$2,$3,$4]){print FNR FS $0}' OFS="," file2.txt file1.txt
4,7003,1,2,5

Now currently the output shows the lines in file2.txt which does not match with file1.txt

So, the expected output is as below :

Comparing 5 in file1.txt in column 4 in line 4 does not match with 1 in file2.txt with column 4 in line 4

---------- Post updated at 07:52 PM ---------- Previous update was at 04:47 PM ----------

Hi Gurus,

Can someone please help me on this ? I am kind of struck here and unable to proceed.

Thank you.

Well, line 4 of file 1 doesn't match line 1 column 1 of file2, line 1 column 2 of file2, line 1 column 3 of file2, etc. Do you really want all that output?

Otherwise you'll need to define which file2 line you think should have been the match (e.g. because column 1 matches).

=> cat file2.txt
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1

[carun@fattony] /u/carun/MTNN>
=> cat file1.txt
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,5

awk -F',' -v file1="$1" -v file2="$2" 'NR == FNR {a[$1,$2,$3,$4]++;count++;next} !(a[$1,$2,$3,$4]){print FNR FS $0}' OFS="," file2.txt file1.txt

Output displayed :
4,7003,1,2,5

My understanding in the above awk is that it checks the line 1 column 1,2,3,4 of file2.txt with line1 column 1,2,3,4 of file1.txt
initially. This continues till 4th line and then the awk finds that line4 of file2.txt does not match with line4 of
file1.txt

So, in the output, the 4th line of file1.txt is displayed since there is a mismatch (the ones have marked as magenta). Please correct if my understanding is incorrect

Now, the output that I wanted is to state to user that :

Comparing column 4 of line 4 in file2.txt having value 1 does not match with column 4 of line 4 in file1.txt having value 5