How to loop the script for directories and sub-directories?

I need to replace fasta file name as fasta header, so I have used the following script

awk '/^>/ {gsub(/.fa(sta)?$/,"",FILENAME);printf(">%s\n",FILENAME);next;} {print}' input.fasta

This script serves my purpose but, I need to loop it for numerous sub-directories. For example, I have a directory namely genome which has multiple sub-directories and those multiple directories contains multiple fasta files as shown below,

./genome/enzyme/ocean.fasta, river.fasta
./genome/protein/1.fasta, 2.fasta  

I need to implement the above mentioned script for all the fasta files, which lies in genome sub-directories, I tried to do something like this but end up with an error,

awk '/^>/ {gsub(/.fa(sta)?$/,"",FILENAME);printf(">%s\n",FILENAME);next;} {print}' */*

Please help me to make the script suitable for task.

why not using find (searching for '*.fasta' files) in conjunction with xargs and your awk (which can be streamlined)?
You'll have to consider some changes to your awk code:

  1. derive the leaf file name from the PATHNAME containing PATH (i.e. a/b/c/foo.fasta -> foo.fasta). An example of this derivation was given in one of your previous threads.

  2. How do you make changes to a file - not simply printing to stdout? If your version of gawk is 4.1.++ (I think), you can use Enabling In-Place File Editing. Or you can come up with a different editing scheme, e.g. here's a sed example: { rm FILE; sed -e '...' > FILE; } < FILE. There might be other alternative options.

Others might have different ideas.
Give it a whirl.

1 Like

Thank you for your suggestion. My above script also works nearly ok. But one problem is that, it is adding folder name also along with file name.

As mentioned, we already dealt with path/filename derivation.
Look at your previous thread.

2 Likes

@dineshkumarsrk, just curious - any progress on this post?