loop through lines and save into separate files

yifangt · November 1, 2011, 7:05pm

I have two files:
file-gene_families.txt that contains 30,000 rows of 30 columns. Column 1 is the ID column and contains the

Col1                  Col2  Col3 ...
One gene-encoded CBPs ABC  111 ...
One gene-encoded CBPs ABC  222 ...
One gene-encoded CBPs ABC  212 ...
Two gene encoded CBPs EFC   223 ...
Two gene encoded CBPs EFC   133 ...
Two gene encoded CBPs EFC   103 ...
Two gene encoded CBPs EFC   323 ...
Three gene(encoded) CBPs CGC  20 ...
Four gene/encoded (CBPs) GGH NULL ...
Four gene/encoded (CBPs) GGH 0 ...
Four gene/encoded (CBPs) GGH 1 ... 
Four gene/encoded (CBPs) GGH 2 ...
Four gene/encoded (CBPs) GGH 3 ...
Four gene/encoded (CBPs) GGH 56 ...

and
file-group.list.

One gene-encoded CBPs
Two gene encoded CBPs
Three gene(encoded) CBPs
Four gene/encoded (CBPs)

I want separate file-gene_families.txt based on the file-group.list using the each line of file-group.list as the file names of the output, substitute these brackets space and slash with hyphen "-".

One-gene-encoded-CBPs.tmp
Two-gene-encoded CBPs.tmp
Three-gene-encoded-CBPs.tmp
Four-gene-encoded-CBPs.tmp

for example in One-gene-encoded-CBPs.tmp

One gene-encoded CBPs ABC  111 ...
One gene-encoded CBPs ABC  222 ...
One gene-encoded CBPs ABC  212 ...

Could not get my script working. Can someone help me out?

#!/usr/bin/bash
IFS=$'\n'
for line in $(cat At-GeneFamily-Unique-group.list)
do
name=$(sed 's/\ |\//-/g' $line)
grep $line gene_families.txt >> $name.tmp
done

Thanks a lot!
YF

jayan_jay · November 2, 2011, 3:05am

Use ksh to run the script ..

while read i
do
        name=$(echo $i | sed 's,(, ,g;s,),,g;s,/, ,g;s, ,-,g')
        grep "$i" file-gene_families.txt > ${name}.tmp
done<file-group.list

itkamaraj · November 2, 2011, 5:05am

nawk '{print $0 >> $1}' inputfile

yifangt · November 2, 2011, 9:34am

Thanks to you both! It works like a charm!
Raj, could you explain the script a little more detail? So simple and worked except some error, e.g. for those the first columns contains "/" did not go through.

$0 for the whole row,
$1 for the first column

What's the trick of the ">>" ? I am trying to combine the "sed" part to get rid of those special characters for the output file name.
Thanks a lot again!
YF

vgersh99 · November 2, 2011, 9:40am

Are columns TAB separated?
Based on your sample file, you have embedded space in what you call a 'first column' in the file-gene_families.txt

yifangt · November 2, 2011, 9:50am

Yes, they are TAB separated.
That's why I want the embedded space replaced with hyphen in the output files. Not fully understand Raj's script, Jayan's works good though.
Thanks. YF

vgersh99 · November 2, 2011, 10:02am

not tested...

nawk 'FNR==NR{f2[$0];next} $1 in f2 {a=$1;gsub("[ /()] ","-",a);out=(a ".txt"); print >> out;close(out)}' FS='\t' file-group.list file-gene_families.txt

itkamaraj · November 2, 2011, 11:51am

!yifangt

check the explicit File output

Awk - A Tutorial and Introduction - by Bruce Barnett