i have two files,
one looks like this (file1):
novelMiR_892 novelMiR_891,
novelMiR_852
novelMiR_893
novelMiR_1661
novelMiR_854
novelMiR_1210
novelMiR_1251
novelMiR_855
novelMiR_1252
novelMiR_897 novelMiR_2336,novelMiR_2335,
and the second like this (file2):
>novelMiR_891
AAAABBBCCCDDD
>novelMiR_892
BBBCCCDDDEEEF
>novelMiR_852
HHHHGGGFFFDD
now I want rename all ">headers" which are in file 1 in the same line with the first name in file1. this is what I want (file3):
>novelMiR_892 (renamed)
AAAABBBCCCDDD
>novelMiR_892
BBBCCCDDDEEEF
>novelMiR_852
HHHHGGGFFFDD
the first renamed, because it is the same as 891 (seen from file 1)
my solution is (BUT DOES NOT WORK):
awk 'NR==FNR{n[$1]=$1","$2;next} { $1 ~ ">" ;
name=substr($1,2,length($1)-1); getline seq;
{for (i in n) if(n ~ /'"$name"'/) names=i} print names "\n" seq > "file3" }' file1 file2
explanation:
first I create an array with all names concatenated by "," and indexed with the names I want to be used.
n[novelMiR_892] = novelMiR_892,novelMiR_891,
now I get line for line all names (without ">") and the corresponding sequences and compare if the name is one of the n-array. if yes the index should be kept and printed.
But I always get only the first name for all sequences:
>novelMiR_892
AAAABBBCCCDDD
>novelMiR_892
BBBCCCDDDEEEF
>novelMiR_892
HHHHGGGFFFDD
where is ma fallacy....