awk pattern matching

vegasluxor · October 8, 2014, 2:54am

I have two files, want to compare file1 data with file2 second column and print line which are not matching. Need help in matching the pattern, file2 second column number can be leading 0 or 00 or 000.
Example:

file1

1
2
3

file2

a,0001
b,02
c,000
d,01
e,2
f,0005

Expected output:

c,000
f,0005

Awk statement I am using is :

awk -F ',' 'FNR==NR{a[$0];next}!($2 in a)'   file1 file2

But this is not handling leading zero. Please advise. Thanks!

RavinderSingh13 · October 8, 2014, 3:31am

vegasluxor:

I have two files, want to compare file1 data with file2 second column and print line which are not matching. Need help in matching the pattern, file2 second column number can be leading 0 or 00 or 000.
Example:

file1
1
2
3

file2
a,0001
b,02
c,000
d,01
e,2
f,0005

Expected output:
c,000
f,0005

Awk statement I am using is :
awk -F ',' 'FNR==NR{a[$0];next}!($2 in a)'   file1 file2
But this is not handling leading zero. Please advise. Thanks!

Hi vegasluxor,

Following may help in same.

awk -F, 'FNR==NR{A[$1]=$1;next} {match($2,/.*0[1-9]/);V=$2;gsub(/0/,X,V);gsub(/^$/,0,V);} !(V in A) {print $0}' file1 file2

Output will be as follows.

c,000
f,0005

EDIT: Also my code will work if file1 doesn't have any digits which are starting from 0 .

Thanks,
R. Singh

Akshay_Hegde · October 8, 2014, 3:51am

ravindersingh13:

Hi vegasluxor,

Following may help in same.
awk -F, 'FNR==NR{A[$1]=$1;next} {match($2,/.*0[1-9]/);V=$2;gsub(/0/,X,V);gsub(/^$/,0,V);} !(V in A) {print $0}' file1 file2
Output will be as follows.
c,000
f,0005
EDIT: Also my code will work if file1 doesn't have any digits which are starting from 0 .

Thanks,
R. Singh

It can be simplified by just adding zero, no need of match and gsub

[akshay@nio tmp]$ cat f1
1
2
3

[akshay@nio tmp]$ cat f2
a,0001
b,02
c,000
d,01
e,2
f,0005

[akshay@nio tmp]$ awk 'FNR==NR{a[$1];next}!($2+0 in a)' f1 FS=',' f2
c,000
f,0005

vegasluxor · October 8, 2014, 5:19am

Thanks for you inputs !!
I am not sure why its not working for below numbers

my data:

f1:
971507520787

f2:

1,2,0971507520787
2,2,333

-------

awk 'FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2

output:

1,2,0971507520787
2,2,333

Please help !!!

Akshay_Hegde · October 8, 2014, 5:29am

vegasluxor:

Thanks for you inputs !!
I am not sure why its not working for below numbers

my data:
f1:
971507520787

f2:
1,2,0971507520787
2,2,333
-------
awk 'FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2
output:
1,2,0971507520787
2,2,333

Please help !!!

I receive this o/p, run dos2unix before inputing files

[akshay@nio tmp]$ awk 'FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2
2,2,333

---------- Post updated at 03:59 PM ---------- Previous update was at 03:53 PM ----------

OR try this

awk '{ sub("\r$", "") } FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2

vegasluxor · October 8, 2014, 7:53am

Still no luck
Don't know why its not working on AIX

Akshay_Hegde · October 8, 2014, 7:56am

With sample files its working please post your real input sample using codetags

SriniShoo · October 8, 2014, 7:56am

could be because of size of int...

awk 'NR == FNR{a[$1]; next} {t=$3; sub(/^0*/, X, t)} !(t in a)' f1 FS=',' f2

Akshay_Hegde · October 8, 2014, 7:59am

Hey there is typo

SriniShoo · October 8, 2014, 8:02am

code fixed after exception is caught but before it is reported

Akshay_Hegde · October 8, 2014, 8:13am

Don't know whats wrong with your input try perl also

#!/usr/bin/perl

use strict;
use warnings;

die "Usage: $0 File1 File2\n" if @ARGV != 2;

my $file2 = pop;

# Read first file
my %seen;
while (<>) { my @F = split; $seen{$F[0]+0} = 1;}

# Compare 2nd file with first file hash
local @ARGV = $file2;
while (<>) { my @F = split(",",$_); print  if !$seen{$F[2]+0};}

Usage : script.pl file1 file2

vegasluxor · October 8, 2014, 8:39am

Ok..

f1

971507520787

f2

1,2,0971507520787
2,2,333

Code

awk 'FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2

Output

1,2,0971507520787
2,2,333

awk '{ sub("\r$", "") } FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2

output:

1,2,0971507520787
2,2,333

Am I missing anything here? Please help.

Akshay_Hegde · October 8, 2014, 10:30am

vegasluxor:

Ok..

f1
971507520787
f2
1,2,0971507520787
2,2,333
Code
awk 'FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2
Output
1,2,0971507520787
2,2,333
awk '{ sub("\r$", "") } FNR==NR{a[$1];next}!($3+0 in a)' f1 FS=',' f2
output:
1,2,0971507520787
2,2,333

Am I missing anything here? Please help.

I don't see any special char in your sample file

can you produce the output of this command ? I get this

akshay@nio:/tmp$ awk 'FNR==1{++i}{print i==1 ? $1 : $3+0}' f1 FS=',' f2
971507520787
971507520787
333

vegasluxor · October 9, 2014, 2:26am

I am getting syntax error.

 awk 'FNR==1{++i}{print i==1 ? $1 : $3+0}' f1 FS=',' f2

Syntax Error The source line is 1.
The error context is
FNR==1{++i}{print >>> i== <<<
awk: 0602-502 The statement cannot be correctly parsed. The source line is 1.

Akshay_Hegde · October 9, 2014, 2:31am

Ok try with Parentheses

awk 'FNR==1{++i}{print ( ( i == 1 ) ? $1 : $3+0 ) }' f1 FS=',' f2

Also post the result of

awk --version

vegasluxor · October 9, 2014, 2:54am

 awk 'FNR==1{++i}{print ( ( i == 1 ) ? $1 : $3+0 ) }' f1 FS=',' f2

971507520787
9.71508e+11
333

cat f1

971507520787

cat f2

1,2,0971507520787
2,2,333

I am using awk on AIX. Its not giving awk version information.

Akshay_Hegde · October 9, 2014, 2:59am

vegasluxor:

 awk 'FNR==1{++i}{print ( ( i == 1 ) ? $1 : $3+0 ) }' f1 FS=',' f2
971507520787
9.71508e+11
333

cat f1
971507520787

cat f2
1,2,0971507520787
2,2,333

I am using awk on AIX. Its not giving awk version information.

Its clear that your file2 is different from the one we are seeing here.

vegasluxor · October 9, 2014, 3:06am

Here is the screenshot. Is there any other awk solution i can try? Please advise and thanks for your support!

[test@test]$ cat f1
971507520787
[test@test]$ cat f2
1,2,0971507520787
2,2,333
[test@test]$ awk 'FNR==1{++i}{print ( ( i == 1 ) ? $1 : $3+0 ) }' f1 FS=',' f2
971507520787
9.71508e+11
333
[test@test]$

Akshay_Hegde · October 9, 2014, 3:17am

vegasluxor:

Here is the screenshot. Is there any other awk solution i can try? Please advise and thanks for your support!
[test@test]$ cat f1
971507520787
[test@test]$ cat f2
1,2,0971507520787
2,2,333
[test@test]$ awk 'FNR==1{++i}{print ( ( i == 1 ) ? $1 : $3+0 ) }' f1 FS=',' f2
971507520787
9.71508e+11
333
[test@test]$

Did you try perl script I had given ?

can you post the output of od -c f2

vegasluxor · October 9, 2014, 3:34am

Yes! perl solution worked! thanks Please let me know, if i want to change column number of second file, where the change will be in in perl code? Sorry for the dummy question

[test@test]$ cat script.pl
#!/usr/bin/perl

use strict;
use warnings;

die "Usage: $0 File1 File2\n" if @ARGV != 2;

my $file2 = pop;

# Read first file
my %seen;
while (<>) { my @F = split; $seen{$F[0]+0} = 1;}

# Compare 2nd file with first file hash
local @ARGV = $file2;
while (<>) { my @F = split(",",$_); print  if !$seen{$F[2]+0};}

[test@test]$
[test@test]$
[test@test]$ script.pl f1 f2
2,2,333
[test@test]$
[test@test]$ cat f1
971507520787
[test@test]$ cat f2
1,2,0971507520787
2,2,333
[test@test]$

Output of od

[test@test]$ od -c f2
0000000    1   ,   2   ,   0   9   7   1   5   0   7   5   2   0   7   8
0000020    7  \n   2   ,   2   ,   3   3   3  \n
0000032
[test@test]$

Thank a lot Akshay !!