I use conditional operators alot in AWK to print rows from large text files based on values in a certain column. For example:
awk -F '\t' '{ if ($1 == "A" || $1 == "C" ) print $0}' OFS="\t" file1.txt > file2.txt
In this case every row is printed from file1 to file2 for which the column 1 value in file1 is either A or C.
This is fine if I'm working with a few conditional values in column 1 (2 in the above example), however, if there are many values to look up, say 40, it becomes cumbersome to write out 40 conditional values. In this case it would be more efficient to place all those conditional values in a separate file say file3. For example, file 3 could look like:
File3
BY1038
BY1039
BY2222
BY30142
BY30150
BY30161
BY30209
BY30211
CTS10416
CTS10700
CTS10879
Basically I would like to print all file1 rows to file2 for which the column1 value in file1 is one of those listed in file3
File1
BY1038 23497067 C->G
BY1039 14032743 A->T
BY2222 2720236 A->G
CTS493 8122429 A->G
CTS5035 14708870 G->C
CTS5268 15018138 A->G
CTS5336 17367361 T->C
BY30211 7625503 C->A
CTS10416 16513268 T->C
CTS10700 16651541 G->T
CTS10879 14409483 A->C
CTS11569 17841608 A->G
CTS12657 15373303 C->T
CTS150 2913202 G->A
CTS3585 21225818 A->C
CTS3837 14043980 T->C
CTS4014 7064140 G->C
CTS4652 8891051 C->T
CTS482 15658416 G->A
Desired output (file2) should look like:
BY1038 23497067 C->G
BY1039 14032743 A->T
BY2222 2720236 A->G
CTS10416 16513268 T->C
CTS10700 16651541 G->T
CTS10879 14409483 A->C
Thanks in advance...
------ Post updated at 09:59 AM ------
I modified a code that I was using for something else, and it seems to work fine for printing columns1 and column2 of file1 to file2, however, I need the column3 value also printed.
awk 'NR == FNR {REP[$1] = $2; next} $1 in REP {$2 = REP[$1]} 1' OFS="\t" file1 file3 > file2
Here is what I get (only 1st 2 columns are printed)
file2
BY1038
BY1039
BY2222
BY30142
BY30150
BY30161
BY30209
BY30211
CTS10416 19443432
CTS10700 22712606
CTS10879 22815955