For loop question

awc228 · July 20, 2012, 10:33am

I have two files. In file one, there are many columns, but only two of interest to me. Column 1 contains a list of individuals, defined by an ID number. Column 10 contains the diagnosis that each individual has (I am a physician). All together, there are 3000 lines in this file, one line per individual.

Column1.........Column10
PT1234 DiseaseA
PT5678 DiseaseB
PT2345 DiseaseA
PT4567 DiseaseA

In file 2 there are two columns. As before there are 3000 lines, one line per individual. The information in the columns in this file is not important other than that somewhere in each line (for some lines in column1, for some lines in column2) the individuals ID appears.
For example, the row in file2 that corresponds to the first row in file1 might be

sldbris-%*PT1234xthb SW-efgs

What I would like to do is create a third file that is the same as file 2 except it has a third column for the diagnosis for the ID (ie, the string PT#### that appears in that line).

Please help, thank you.

bartus11 · July 20, 2012, 10:53am

Try:

awk 'NR==FNR{a[$1]=$10;next}{for (i in a) if ($0~i) print $0,a}' file1 file2 > file3

balajesuri · July 20, 2012, 10:53am

perl -lane 'BEGIN{open O, "> file3"}; open F1, "< file1"; for $l (<F1>) { chomp $l; @x = split /\s+/, $l; ($F[0] =~ /$x[0]/) && print O "@F $x[9]" }; close F1; END{close O}' file2