Combine information from 2 files

Nika · April 3, 2018, 7:53am

Hi there, I�m a newbie in linux (ubuntu) working with several files, some of them containing hundred thousands of lines. I started to extract information out of 2 files, combining them by 1 column: I need a Vlookup-like command that reads sampleID (column 2)(line 2,..line by line) in file 1, looks up this sampleID in file 2 and writes all corresponding readIDs (e.g. 250000) that could be found to this sampleID to a new file (each in a new line). (new file: �sampleID�_readIDs.file) Try to do this for all sampleIDs in summary file1. In the end I have a number of .files as lines of sampleIDs in file 1.
file 1:

name    sampleID    nr 
Sample1    123    250000 
Sample2    345    200000 
Sample3    456    180000

file 2:

 readID    read_value    sampleID
15sj10n3-9372-9d73-i3i2-64b40faa330b    6000    123
19pe26j3-9372-9g22-i3i2-81f59a56d939    5900    123
93os17k5-9372-6k63-i3i2-b8b765b1a729    6050    456
49kk23o2-9372-9d73-i3i2-b09f4b1f0557    6080    123
09iy02p8-9372-9d73-i3i2-0d479e6fb751    5990    345

123_readID.file (output in new file):

15sj10n3-9372-9d73-i3i2-64b40faa330b
19pe26j3-9372-9g22-i3i2-81f59a56d939
49kk23o2-9372-9d73-i3i2-b09f4b1f0557

I am thankful for any help or suggestions. Nika

RudiC · April 3, 2018, 9:11am

Welcome to the forum.

It usually helps to post your OS and shell versions as well as e.g. preferred tools, and to show your own attempts on a solution.

Howsoever, how far would this get you:

awk 'NR==FNR {SID[$2]; next} $3 in SID {FN = $3 "_readIDs.file"; print $1 >> FN; close (FN)}' file1 file2