AWK help reqd

Hi,

I have below requirement that I need to do it using AWK.

Files

File1: data file that has thousands of recs
File2: Lookup file

I need to compare the position 21-31 of File1 with 1-11 of File2.

If matched then write to new output file(outfile1) else write to another output file(outfile2)

Please let me know the ways to do it in AWK.

Thanks.

I would not use awk for this.

cut -c21-31 file1 | sed -e 's/[][\\.*^$]/\\&/g' -e 's/^/^/' | grep -f - file2

Add a -v to the last grep to get the non-matches.

The middle part takes care to escape any regular expression special characters so you can pass them to grep. This is off the cuff so I probably forgot about one or two which would need escaping.

Getting below error =>

>cut -c21-31 file1 | sed -e 's/[][\\.*^$]/\\&/g' -e 's/^/^/' | grep -f - file2
grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . .

Also I need entire record from file1 when it matches.
(Note: File2 has just look up data.)

Thanks.

Getting below error =>

>cut -c21-31 file1 | sed -e 's/[][\\.*^$]/\\&/g' -e 's/^/^/' | grep -f - file2
grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . .

Also I need entire record from file1 when it matches.
(Note: File2 has just look up data.)

Thanks.

The search can be turned around to get the full lines from file1. I'm surprised your grep doesn't support the -f option, but that can be worked around with sed, too.

sed -e 's/[][\\.*^$]/\\&/g' -e 's%^%m^....................%' -e 's/$/%p' file2 | sed -nf - file1

You could create an awk script just as well as a sed script, but I already had the first half in sed, so it was easier to build on that.

Give awk a try:

awk '
NR=FNR{a[substr($0,1,11)];next}
substr($0,21,31) in a{print}
' file2 file1

Regards

awk '
> NR=FNR{a[substr($0,1,11)];next}
> substr($0,21,31) in a{print}
> ' file2 file1
awk: syntax error near line 2
awk: bailing out near line 2

awk 'NR=FNR{a[substr($0,1,11)];next} substr($0,21,31) in a{print}' file2 file1
awk: syntax error near line 1
awk: bailing out near line 1

Please advise on the above error.

Thanks in advance.

please search the forums :slight_smile:

How to correct this error ? - Had been answered many times

Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards

Hi,

Still not able to resolve ;-(

Below are the file details

>cat file2
12345678901
98765432101

>cat file1
gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg
gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg
gsfgfgfgfgfgfgfgfgfg34578910204fgfgfgffgfgfggffgfgfsgfdg

nawk isn't returning anything while comparing file1 & file2 :confused:

>/usr/bin/nawk 'NR=FNR{a[substr($0,1,11)];next} substr($0,21,31) in a{print}' file2 file1

>awk 'NR=FNR{a[substr($0,1,11)];next} substr($0,21,31) in a{print}' file2 file1
awk: syntax error near line 1
awk: bailing out near line 1

whereis nawk
nawk: /usr/bin/nawk /usr/man/man1/nawk.1

>whereis awk
awk: /usr/bin/awk /usr/man/man1/awk.1

Please Help!

Try this, there was something wrong in the second substr function:

nawk '
NR=FNR{a[substr($0,1,11)];next}
substr($0,21,11) in a{print}
' file2 file1

Regards

Hi,

I am still not able to get any matching rows when using 'nawk'. :confused:
Please help me to solve this! Also please let me know to debug this.

>cat file1
gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg
gsfgfgfgfgfgfgfgfgfg12345678901fgfgfgfgfgfgfgffgfgfsgfdg
gsfgfgfgfgfgfgfgfgfg34578910204fgfgfgffgfgfggffgfgfsgfdg

>cat file2
12345678901
98765432101

Below script isn't returning any rows =>

>nawk '
> NR=FNR{a[substr($0,1,11)];next}
> substr($0,21,11) in a{print}
> ' file2 file1

Thanks

Oh my, I must be accurater:

awk '
NR==FNR{a[substr($0,1,11)];next}
substr($0,21,11) in a{print}
' file2 file1

Regards

Thanks a lot!!! :b:
That works fine.... I need some tips on debugging these, plz send me :slight_smile:

awk '
NR==FNR{a[substr($0,1,11)];next}    # Create an array of the key (position 1-11) in file2
substr($0,21,11) in a{print}        # If position 21-31 of file1 exist in array print the record
' file2 file1

This link gives an explanation how to deal with errors in awk:

Debugging (sed & awk, Second Edition)

Regards

Another point of cleverness to note is "NR==FNR" which is true while processing the first line on the command line. Both are incremented when a new line is read, but FNR gets reset to zero when the file changes, while NR grows forever. And the "next" on that line means the following line of the script doesn't get executed while the condition is true; it causes awk to read the next line and start over.

Hi,

If I want to add one more condition to the below while comparing.. can I modify as below =>

Old code

awk '
NR==FNR{a[substr($0,1,11)];next} # Create an array of the key (position 1-11) in file2
substr($0,21,11) in a{print} # If position 21-31 of file1 exist in array print the record
' file2 file1

New code

awk '
NR==FNR{a[substr($0,1,11)];next} # Create an array of the key (position 1-11) in file2
substr($0,21,11) | substr($0,41,11)in a{print} # If position (21-31) or (41 -51) of file1 exist in array print the record
' file2 file1

Many thanks in advance.

The syntax for specifying multiple conditions is a bit more explicit and cumbersome.

(substr($0,21,11) in a || substr($0,41,11) in a) { print }

Note also that the logical "or" operator is a double pipe, not single.