filter record from a file reading another file

gpaulose · April 14, 2010, 11:18am

Hi,
I want to filter record from a file if the records in the second column matches the data in another file. I tried the below awk command but it filters the records in the filter file. I want the opposite, to include only the records in the filter file.

I tried this:

awk -F'|' 'NR==FNR{_[$1];next}!($2 in _)' filter.txt record.txt

Record.txt:

00000000048717|0003|20100408|0|001|000000.084|000000.093|000000.107|
00000000048717|0004|20100408|0|001|000000.084|000000.093|000000.107|
00000000048717|0005|20100408|0|001|000000.084|000000.093|000000.107|
00000000048717|0006|20100408|0|001|000000.084|000000.093|000000.107|

Filter file:

Output should be:

00000000048717|0003|000000.084|

Note that the output needs only first, second and sixth colum from input record file.

Any help will be much appreciated.

vgersh99 · April 14, 2010, 11:27am

awk -F'|' 'NR==FNR{_[$1];next}$2 in _ {print $1,$2,$6}' OFS='|' filter.txt record.txt

kcoder24 · April 14, 2010, 11:56am

awk -F\| 'NR==FNR{f[$1];next}($2 in f){print $1"|"$2"|"$6}' filter.txt record.txt

---------- Post updated at 10:56 AM ---------- Previous update was at 10:31 AM ----------

vgersh99 I've learned something new today ;), I'm talking about

print $1,$2,$6

gpaulose · April 14, 2010, 12:08pm

Thanks guys

---------- Post updated at 12:08 PM ---------- Previous update was at 12:06 PM ----------

How about if I had to check for 2 columns in the Record file and I have two values in filter file.
for eg: Filter file:

0003|20100408

I should get the output only if the second column in the record file is 0003 and third column is 20100408.

vgersh99 · April 14, 2010, 12:16pm

gpaulose:

Thanks guys

---------- Post updated at 12:08 PM ---------- Previous update was at 12:06 PM ----------

How about if I had to check for 2 columns in the Record file and I have two values in filter file.
for eg: Filter file:
0003|20100408
I should get the output only if the second column in the record file is 0003 and third column is 20100408.

awk -F'|' 'NR==FNR{_[$1,$2];next}($2 SUBSEP $3) in _ {print $1,$2,$6}' OFS='|' filter.txt record.txt

kcoder24 · April 14, 2010, 12:17pm

Then you must modify like this

awk -F\| 'NR==FNR{f[$1,$2];next}($2SUBSEP$3 in f){print $1"|"$2"|"$6}' filter.txt record.txt

I never made the try with $2,$3 instead $2SUBSEP$3 but you can make the try

gpaulose · April 15, 2010, 12:06pm

This worked. Thanks to all of you who helped me

kcoder24:

Then you must modify like this
awk -F\| 'NR==FNR{f[$1,$2];next}($2SUBSEP$3 in f){print $1"|"$2"|"$6}' filter.txt record.txt
I never made the try with $2,$3 instead $2SUBSEP$3 but you can make the try

Jairaj · April 15, 2010, 12:58pm

Hi Kcoder,

Please explain this command process:

awk -F\| 'NR==FNR{f[$1,$2];next}($2SUBSEP$3 in f){print $1"|"$2"|"$6}' filter.txt record.txt

gpaulose · April 20, 2010, 10:53am

I found a probelm with this. Can somebody help please?

When the match is done, if the filter records position is less than the record files position this will not work.

For eg: Consider the below files
Record file:

0003@00000000000190@20100401@201004012010040120100401@003@
0003@00000000000200@20100329@201003292010032920100329@003@
0003@00000000000550@20100401@201004012010040120100401@048@
0003@00000000000910@20100401@201004012010040120100401@017@
0004@00000000001401@20100328@201003282010032820100328@300@
0004@00000000001401@20100401@201004012010040120100401@010@
0004@00000000001431@20100401@201004012010040120100401@010@
0005@00000000001441@20100329@201003302010033020100330@300@
0005@00000000001830@20100328@201004012010040120100401@300@
0005@00000000002080@20100329@201003292010032920100329@003@

Filter File:

0004@300@20100328@20100329
0005@300@20100328@20100329

awk -F\@ 'NR==FNR{f[$1,$3,$5];next}($1SUBSEP$3SUBSEP$2 in f) {print $0}' $Record $Filter

This will not work since we are comparing 1st position to 1st, 3rd to 3rd and 5th to 2nd.

But if the filter file was like below this code would have worked.

 awk -F\@ 'NR==FNR{f[$1,$3,$5];next}($1SUBSEP$3SUBSEP$5 in f) {print $0}' $Record $Filter

0004@300@20100328@20100329@300
0005@300@20100328@20100329@300

Could somebody help me to figure out how we can make the first process work?