AWK help reqd

Sathy153 · April 11, 2008, 1:03am

Hi,

I have below requirement that I need to do it using AWK.

Files

File1: data file that has thousands of recs
File2: Lookup file

I need to compare the position 21-31 of File1 with 1-11 of File2.

If matched then write to new output file(outfile1) else write to another output file(outfile2)

Please let me know the ways to do it in AWK.

Thanks.

era · April 11, 2008, 3:11am

I would not use awk for this.

cut -c21-31 file1 | sed -e 's/[][\\.*^$]/\\&/g' -e 's/^/^/' | grep -f - file2

Add a -v to the last grep to get the non-matches.

The middle part takes care to escape any regular expression special characters so you can pass them to grep. This is off the cuff so I probably forgot about one or two which would need escaping.

Sathy153 · April 11, 2008, 5:08am

Getting below error =>

>cut -c21-31 file1 | sed -e 's/[][\\.*^$]/\\&/g' -e 's/^/^/' | grep -f - file2
grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . .

Also I need entire record from file1 when it matches.
(Note: File2 has just look up data.)

Thanks.

Sathy153 · April 11, 2008, 5:27am

Getting below error =>

>cut -c21-31 file1 | sed -e 's/[][\\.*^$]/\\&/g' -e 's/^/^/' | grep -f - file2
grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . .

Also I need entire record from file1 when it matches.
(Note: File2 has just look up data.)

Thanks.

era · April 11, 2008, 6:33am

The search can be turned around to get the full lines from file1. I'm surprised your grep doesn't support the -f option, but that can be worked around with sed, too.

sed -e 's/[][\\.*^$]/\\&/g' -e 's%^%m^....................%' -e 's/$/%p' file2 | sed -nf - file1

You could create an awk script just as well as a sed script, but I already had the first half in sed, so it was easier to build on that.

Franklin52 · April 11, 2008, 8:10am

Give awk a try:

awk '
NR=FNR{a[substr($0,1,11)];next}
substr($0,21,31) in a{print}
' file2 file1

Regards

Sathy153 · April 22, 2008, 10:40am

awk '
> NR=FNR{a[substr($0,1,11)];next}
> substr($0,21,31) in a{print}
> ' file2 file1
awk: syntax error near line 2
awk: bailing out near line 2

awk 'NR=FNR{a[substr($0,1,11)];next} substr($0,21,31) in a{print}' file2 file1
awk: syntax error near line 1
awk: bailing out near line 1

Please advise on the above error.

Thanks in advance.

matrixmadhan · April 22, 2008, 10:51am

please search the forums

How to correct this error ? - Had been answered many times

Franklin52 · April 22, 2008, 12:10pm

Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards

Sathy153 · April 23, 2008, 4:53am

Hi,

Still not able to resolve ;-(

Below are the file details

>cat file2
12345678901
98765432101

>cat file1
gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg
gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg
gsfgfgfgfgfgfgfgfgfg34578910204fgfgfgffgfgfggffgfgfsgfdg

nawk isn't returning anything while comparing file1 & file2

>/usr/bin/nawk 'NR=FNR{a[substr($0,1,11)];next} substr($0,21,31) in a{print}' file2 file1

>awk 'NR=FNR{a[substr($0,1,11)];next} substr($0,21,31) in a{print}' file2 file1
awk: syntax error near line 1
awk: bailing out near line 1

whereis nawk
nawk: /usr/bin/nawk /usr/man/man1/nawk.1

>whereis awk
awk: /usr/bin/awk /usr/man/man1/awk.1

Please Help!

Franklin52 · April 23, 2008, 1:02pm

Try this, there was something wrong in the second substr function:

nawk '
NR=FNR{a[substr($0,1,11)];next}
substr($0,21,11) in a{print}
' file2 file1

Regards

Sathy153 · April 24, 2008, 1:44am

Hi,

I am still not able to get any matching rows when using 'nawk'.
Please help me to solve this! Also please let me know to debug this.

>cat file1
gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg
gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg
gsfgfgfgfgfgfgfgfgfg34578910204fgfgfgffgfgfggffgfgfsgfdg

>cat file2
12345678901
98765432101

Below script isn't returning any rows =>

>nawk '
> NR=FNR{a[substr($0,1,11)];next}
> substr($0,21,11) in a{print}
> ' file2 file1

Thanks

Franklin52 · April 24, 2008, 3:28am

Oh my, I must be accurater:

awk '
NR==FNR{a[substr($0,1,11)];next}
substr($0,21,11) in a{print}
' file2 file1

Regards

Sathy153 · April 25, 2008, 1:48am

Thanks a lot!!!
That works fine.... I need some tips on debugging these, plz send me

Franklin52 · April 25, 2008, 2:23am

awk '
NR==FNR{a[substr($0,1,11)];next}    # Create an array of the key (position 1-11) in file2
substr($0,21,11) in a{print}        # If position 21-31 of file1 exist in array print the record
' file2 file1

This link gives an explanation how to deal with errors in awk:

Debugging (sed & awk, Second Edition)

Regards

era · April 25, 2008, 2:29am

Another point of cleverness to note is "NR==FNR" which is true while processing the first line on the command line. Both are incremented when a new line is read, but FNR gets reset to zero when the file changes, while NR grows forever. And the "next" on that line means the following line of the script doesn't get executed while the condition is true; it causes awk to read the next line and start over.

Sathy153 · June 19, 2008, 3:24am

Hi,

If I want to add one more condition to the below while comparing.. can I modify as below =>

Old code

awk '
NR==FNR{a[substr($0,1,11)];next} # Create an array of the key (position 1-11) in file2
substr($0,21,11) in a{print} # If position 21-31 of file1 exist in array print the record
' file2 file1

New code

awk '
NR==FNR{a[substr($0,1,11)];next} # Create an array of the key (position 1-11) in file2
substr($0,21,11) | substr($0,41,11)in a{print} # If position (21-31) or (41 -51) of file1 exist in array print the record
' file2 file1

Many thanks in advance.

era · June 19, 2008, 3:30am

The syntax for specifying multiple conditions is a bit more explicit and cumbersome.

(substr($0,21,11) in a || substr($0,41,11) in a) { print }

Note also that the logical "or" operator is a double pipe, not single.

AWK help reqd

Files

Below are the file details

>cat file1 gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg gsfgfgfgfgfgfgfgfgfg34578910204fgfgfgffgfgfggffgfgfsgfdg

nawk isn't returning anything while comparing file1 & file2

>awk 'NR=FNR{a[substr($0,1,11)];next} substr($0,21,31) in a{print}' file2 file1 awk: syntax error near line 1 awk: bailing out near line 1

Below script isn't returning any rows =>

>nawk ' > NR=FNR{a[substr($0,1,11)];next} > substr($0,21,11) in a{print} > ' file2 file1

Old code

New code

>cat file1
gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg
gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg
gsfgfgfgfgfgfgfgfgfg34578910204fgfgfgffgfgfggffgfgfsgfdg

>awk 'NR=FNR{a[substr($0,1,11)];next} substr($0,21,31) in a{print}' file2 file1
awk: syntax error near line 1
awk: bailing out near line 1

>nawk '
> NR=FNR{a[substr($0,1,11)];next}
> substr($0,21,11) in a{print}
> ' file2 file1