Hi all,
I have a directory with many subdirectories each named like so: KOG0001, KOG0002, ...KOG9999.
Each of these subdirectories contain a variable number two kinds of files (nuc and prot) named like so: Capitella_sp_nuc_hits.fasta (nuc) and Capitella_sp_prot_hits.fasta (prot). The Capitella_sp part represents the name of the species and varies from file to file.
I'm trying to write a script that will go through each subdirectory and concatenate the contents of all the _prot_hits.fasta files into one file in the main directory named like KOG0001.fasta, KOG0002.fasta, and so on. I think I have it figured out except how to reference the source files that I want. Can anyone help me out?
find . -maxdepth 2 -type f -name "KOG[0-9][0-9][0-9][0-9]*_prot_hits.fasta" -printf "%p\n" | awk -F"_" '{print $1}' | sed 's/$/&.fasta/g' awk -F"/" 'BEGIN {filename=$3 while((getline line < **original_source_files** ) > 0) {print line >> filename} close(filename)}'
Thanks!
Kevin