Making a composite file of transposed columns

Hello,

I have a directory with allot of tab delimited text files that have data that look like,

filePath	distance
(1,4-dioxan-2-ylmethyl)methylamine	0.0
4-methylmorpholine	0.0755473632594
1-propyl-4-piperidone	0.157792911954
heptaminol	0.158142893249
N-acetylputrescine	0.158689628956
spermidine	0.170417125303

for simplicity, I have include the first 7 rows of the file, but there are 25.

I need to sort through these and extract the string at column 1 row 2 and column 2 rows 3 through n. I then need to transpose this data so that there is one row for each input file.

For the data above, the row would look like,

(1,4-dioxan-2-ylmethyl)methylamine	0.075547363	0.157792912	0.158142893	0.158689629	0.170417125

There are many files, so if the second input file was,

filePath	distance
(1-methyl(4-piperidyl))(3-pyridylmethyl)amine	0.0
lidocaine	0.0971033747257
methoxyphenamine	0.106031307815
meperidine	0.107826404718
fenspiride	0.118603492524
tetracaine	0.122268535847

The output for both files would look like,

(1,4-dioxan-2-ylmethyl)methylamine	0.075547363	0.157792912	0.158142893	0.158689629	0.170417125
(1-methyl(4-piperidyl))(3-pyridylmethyl)amine	0.097103375	0.106031308	0.107826405	0.118603493	0.122268536

I'm guessing that it could be done with awk, but I have never transposed columns before. My alternative is excel, so suggestions would be appreciated. I have about 1500 files to sort through, so this will take a while if I can't get something automated.

thanks,

LMHmedchem

something along these lines as a starter:

awk 'FNR==1 && FNR!=NR{print x;next} FNR==2{printf $1;next} FNR>2{printf("%s%s", OFS, $2)}END {print}' file1 file2 fileN
1 Like

This seems to work, but I still need to do some checking,

#!/bin/bash

FILELIST=$(ls './'*'.out.txt')

awk 'FNR==1 && FNR!=NR{print x;next} FNR==2{printf $1;next} FNR>2{printf("%s%s", OFS, $2)}END {print}' $FILELIST > _out.txt

The last row seems to have a problem. This is the input file for that row,

filePath	distance
{[3-(dimethylamino)propyl]amino}-N-naphthylcarboxamide	0.0
cyheptamide	0.0653914864686
oxyphenbutazone	0.0757147160492
melatonin	0.0870885855315
doxylamine	0.0925023853048
pyrilamine	0.0959876565376
tripelennamine	0.0978033137399
fentanyl	0.102771505703
R-apocodeine	0.103639227882
deacetyldiltiazem	0.104552836744
acepromazine	0.107673347298
diphenhydramine	0.110669230413
nefopam	0.110982685756
propiomazine	0.120416954487
desipramine	0.122117385721
benzydamine	0.124971244941
isoxsuprine	0.127177834512
propranolol	0.131049138037
promazine	0.132307894169
cinchonine	0.133675134701
dimefline	0.134699085898
nortriptyline	0.136270819322
benzphetamine	0.137019529216
imipramine	0.140966900695
ractopamine	0.14360602345
apomorphine	0.146103096076

and this is the output row,
{[3-(dimethylamino)propyl]amino}-N-naphthylcarboxamide 0.065391486 0.075714716 0.087088586 0.092502385 0.095987657 0.097803314 0.102771506 0.103639228 0.104552837 0.107673347 0.11066923 0.110982686 0.120416954 0.122117386 0.124971245 0.127177835 0.131049138 0.132307894 0.133675135 0.134699086 0.136270819 0.137019529 0.140966901 0.143606023 0.146103096076apomorphine

You can see that the text from column 1 has gotten into the value somehow.

LMHmedchem

sorry about. Change END {print} to END {print x}