awk pattern matching

I have two files, want to compare file1 data with file2 second column and print line which are not matching. Need help in matching the pattern, file2 second column number can be leading 0 or 00 or 000.
Example:

file1

1
2
3

file2

a,0001
b,02
c,000
d,01
e,2
f,0005

Expected output:

c,000
f,0005

Awk statement I am using is :

awk -F ',' 'FNR==NR{a[$0];next}!($2 in a)'   file1 file2

But this is not handling leading zero. Please advise. Thanks!

Hi vegasluxor,

Following may help in same.

awk -F, 'FNR==NR{A[$1]=$1;next} {match($2,/.*0[1-9]/);V=$2;gsub(/0/,X,V);gsub(/^$/,0,V);} !(V in A) {print $0}' file1 file2

Output will be as follows.

c,000
f,0005

EDIT: Also my code will work if file1 doesn't have any digits which are starting from 0 .

Thanks,
R. Singh

1 Like

It can be simplified by just adding zero, no need of match and gsub

[akshay@nio tmp]$ cat f1
1
2
3

[akshay@nio tmp]$ cat f2
a,0001
b,02
c,000
d,01
e,2
f,0005

[akshay@nio tmp]$ awk 'FNR==NR{a[$1];next}!($2+0 in a)' f1 FS=',' f2
c,000
f,0005
1 Like

Thanks for you inputs !!
I am not sure why its not working for below numbers

my data:

f1:
971507520787

f2:

1,2,0971507520787
2,2,333

-------

awk 'FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2

output:

1,2,0971507520787
2,2,333

Please help !!!

I receive this o/p, run dos2unix before inputing files

[akshay@nio tmp]$ awk 'FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2
2,2,333

---------- Post updated at 03:59 PM ---------- Previous update was at 03:53 PM ----------

OR try this

awk '{ sub("\r$", "") } FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2
1 Like

Still no luck :frowning:
Don't know why its not working on AIX

With sample files its working please post your real input sample using codetags

could be because of size of int...

awk 'NR == FNR{a[$1]; next} {t=$3; sub(/^0*/, X, t)} !(t in a)' f1 FS=',' f2

Hey there is typo :slight_smile:

code fixed after exception is caught but before it is reported :wink:

Don't know whats wrong with your input try perl also

#!/usr/bin/perl

use strict;
use warnings;

die "Usage: $0 File1 File2\n" if @ARGV != 2;

my $file2 = pop;

# Read first file
my %seen;
while (<>) { my @F = split; $seen{$F[0]+0} = 1;}

# Compare 2nd file with first file hash
local @ARGV = $file2;
while (<>) { my @F = split(",",$_); print  if !$seen{$F[2]+0};}
Usage : script.pl file1 file2

Ok..

f1

971507520787

f2

1,2,0971507520787
2,2,333

Code

awk 'FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2

Output

1,2,0971507520787
2,2,333
awk '{ sub("\r$", "") } FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2

output:

1,2,0971507520787
2,2,333

Am I missing anything here? Please help.

I don't see any special char in your sample file

can you produce the output of this command ? I get this

akshay@nio:/tmp$ awk 'FNR==1{++i}{print i==1 ? $1 : $3+0}' f1 FS=',' f2
971507520787
971507520787
333

I am getting syntax error.

 awk 'FNR==1{++i}{print i==1 ? $1 : $3+0}' f1 FS=',' f2

Syntax Error The source line is 1.
The error context is
FNR==1{++i}{print >>> i== <<<
awk: 0602-502 The statement cannot be correctly parsed. The source line is 1.

Ok try with Parentheses

awk 'FNR==1{++i}{print ( ( i == 1 ) ? $1 : $3+0 ) }' f1 FS=',' f2

Also post the result of

awk --version
 awk 'FNR==1{++i}{print ( ( i == 1 ) ? $1 : $3+0 ) }' f1 FS=',' f2
971507520787
9.71508e+11
333

cat f1

971507520787

cat f2

1,2,0971507520787
2,2,333

I am using awk on AIX. Its not giving awk version information.

Its clear that your file2 is different from the one we are seeing here.

Here is the screenshot. Is there any other awk solution i can try? Please advise and thanks for your support!

[test@test]$ cat f1
971507520787
[test@test]$ cat f2
1,2,0971507520787
2,2,333
[test@test]$ awk 'FNR==1{++i}{print ( ( i == 1 ) ? $1 : $3+0 ) }' f1 FS=',' f2
971507520787
9.71508e+11
333
[test@test]$

Did you try perl script I had given ?

can you post the output of od -c f2

1 Like

Yes! perl solution worked! thanks :slight_smile: Please let me know, if i want to change column number of second file, where the change will be in in perl code? Sorry for the dummy question

[test@test]$ cat script.pl
#!/usr/bin/perl

use strict;
use warnings;

die "Usage: $0 File1 File2\n" if @ARGV != 2;

my $file2 = pop;

# Read first file
my %seen;
while (<>) { my @F = split; $seen{$F[0]+0} = 1;}

# Compare 2nd file with first file hash
local @ARGV = $file2;
while (<>) { my @F = split(",",$_); print  if !$seen{$F[2]+0};}

[test@test]$
[test@test]$
[test@test]$ script.pl f1 f2
2,2,333
[test@test]$
[test@test]$ cat f1
971507520787
[test@test]$ cat f2
1,2,0971507520787
2,2,333
[test@test]$

Output of od

[test@test]$ od -c f2
0000000    1   ,   2   ,   0   9   7   1   5   0   7   5   2   0   7   8
0000020    7  \n   2   ,   2   ,   3   3   3  \n
0000032
[test@test]$

Thank a lot Akshay !! :slight_smile: