I was wondering whether anyone has any idea what is happening here. I'm using simple code to compare 2 tab delimited files based on column 1 values. If the column1 value of file1 exists in file2, then I'm to print the column4 value in file2 in column3 of file1. Here is my code:
1st I have to produce file1 by concatenating columns 4&5 of the input file:
INPUT FILE:
1 52828739 rs12044739 C T
1 52835713 rs72899818 T C
1 52836736 rs10157619 G A
1 52844478 rs6684941 A G
Using this simple code, I reorder columns in the INPUT file and concatenate columns 4 &5:
awk -F '\t' ' {print $3,$1,$2,$4$5}' OFS="\t" INPUT > file1
This produces the following file1 which looks fine:
rs12044739 1 52828739 CT
rs72899818 1 52835713 TC
rs10157619 1 52836736 GA
rs6684941 1 52844478 AG
This is file2:
rs12044739 1 52828739 CC
rs72899818 1 52835713 TC
rs10157619 1 52836736 GG
rs6684941 1 52844478 AA
Using the code:
awk 'NR == FNR {REP [$1] = $4; next} $1 in REP {$3 = REP[$1]} 1' OFS="\t" file2 file1 > results
Yields the following output file, RESULTS, which so far seems to look fine:
rs12044739 1 CC CT
rs72899818 1 TC TC
rs10157619 1 GG GA
rs6684941 1 AA AG
For final processing we need to print all rows for which column3 = column4. So I used this simple code:
awk -F '\t' '{ if ($3 = $4) print $0}' OFS="\t" RESULTS > RESULTS2
Thus RESULTS2 should look like this:
rs72899818 1 TC TC
Instead what I get is this:
rs72899818 1 TC
TC
Any ideas as to what is causing the column4 value to print to the following row?