Matching 2 files based on one column

Hi,

On a similar subject, the following. I have two files:

file1.txt

dbSNP_rsID,Chromosome,Position,Gene
rs10399749,chr. 01,45162,?
rs4030303,chr. 01,72434,?
rs4030300,chr. 01,72515,?
rs940550,chr. 01,78032,?
rs13328714,chr. 01,81468,?
rs11490937,chr. 01,222077,?
rs6683466,chr. 01,524446,"OR4F29, OR4F16, OR4F3"
rs12025928,chr. 01,536560,"OR4F29, OR4F16, OR4F3"

file2.txt

dbSNP_rsID	R_square
rs6650104	0.000
rs9629043	0.000
rs11497407	0.000
rs12565286	0.332
rs11804171	0.338
rs2977670	0.352
rs2977656	0.000

And now I would like to match the files (file1 to file2) on dbSNP_rsID so that I get a new file:

file_new.txt
dbSNP_rsID,R_square,Chromosome,Position,Gene
rs12565286,0.332,chr. 01,711153,"OR4F29, OR4F16, OR4F3"

I've really tried to understand the code you guys have supplied, but I really don't understand. :confused:Could you please help me with a code? :cool:

Thanks!

Sander

Hi Sander,

Welcome to the forum.. In future it is best to create a new thread and then refer to this thread for example..
Your case is a little bit simpler, except for the fact that your input files use different field separators.
You could give this a try:

awk 'NR==FNR{A[$1]=$2;next}A[$1]{$2=A[$1] FS $2;print}' file2 FS=, OFS=, file1

Which will print only those records that have a match in file2 and it will use the value in the second field from that file...

1 Like

Hi,

Thanks a bunch! It work! Got now a new file with this code:

awk 'NR==FNR{A[$1]=$2;next}A[$1]{$2=A[$1] FS $2;print}' file2.txt FS=, OFS=, file1.txt > file3.txt

Thank you also for pointing out how the forum works. I just have some questions about the code though: can you explain the parts? I don't fully understand what each part does, and than if I'd understand I could learn maybe new commands to work my files.:cool:

Thanks again.

Sander

---------- Post updated at 10:17 AM ---------- Previous update was at 10:13 AM ----------

Also I'd like to have the original file2 to be added upon with those columns from file1 that match. If there's no match, I just want the position in file2 to remain blank for those columns. Is that possible too?

Can you post the desired output, based on the given 2 files?

Please use code tags.

Hi,

Here's the file I'd like to get:

d

For instance the first three lines are not in file1, but are in file2. I'd like to keep them in the final file3. So basically I'd like to add the information from file1 to file2, if there's information, if not, just leave it blank.

Thanks.

Sander