How to loop the script for directories and sub-directories?

dineshkumarsrk · October 14, 2020, 10:57am

I need to replace fasta file name as fasta header, so I have used the following script

awk '/^>/ {gsub(/.fa(sta)?$/,"",FILENAME);printf(">%s\n",FILENAME);next;} {print}' input.fasta

This script serves my purpose but, I need to loop it for numerous sub-directories. For example, I have a directory namely genome which has multiple sub-directories and those multiple directories contains multiple fasta files as shown below,

./genome/enzyme/ocean.fasta, river.fasta
./genome/protein/1.fasta, 2.fasta

I need to implement the above mentioned script for all the fasta files, which lies in genome sub-directories, I tried to do something like this but end up with an error,

awk '/^>/ {gsub(/.fa(sta)?$/,"",FILENAME);printf(">%s\n",FILENAME);next;} {print}' */*

Please help me to make the script suitable for task.

vgersh99 · October 14, 2020, 12:29pm

why not using find (searching for '*.fasta' files) in conjunction with xargs and your awk (which can be streamlined)?
You'll have to consider some changes to your awk code:

derive the leaf file name from the PATHNAME containing PATH (i.e. a/b/c/foo.fasta -> foo.fasta). An example of this derivation was given in one of your previous threads.
How do you make changes to a file - not simply printing to stdout? If your version of gawk is 4.1.++ (I think), you can use Enabling In-Place File Editing. Or you can come up with a different editing scheme, e.g. here's a sed example: { rm FILE; sed -e '...' > FILE; } < FILE. There might be other alternative options.

Others might have different ideas.
Give it a whirl.

dineshkumarsrk · October 14, 2020, 1:32pm

Thank you for your suggestion. My above script also works nearly ok. But one problem is that, it is adding folder name also along with file name.

vgersh99 · October 14, 2020, 1:40pm

As mentioned, we already dealt with path/filename derivation.
Look at your previous thread.

vgersh99 · October 15, 2020, 2:38pm

@dineshkumarsrk, just curious - any progress on this post?