Renaming a file according to the name of another for multiple instances

Hi there,

I recently completed an analysis using an array job in SLURM that left me with 279 files that bear the following name: sample_sorted.bam.

These files are all in separate folders, which contain the original BAM file aligned to the reference bearing the actual sample name. What I would like to do, if possible, is to get the name of that original BAM file and parse it before the _sorted.bam instead of the sample string.

See the example below, this is how the tree -h structure for the various folders looks like

sgdp_001
├── [  89G]  abh100.bam
├── [  69G]  sample_sorted.bam
├── [  36G]  R1.fastq.gz
└── [  36G]  R2.fastq.gz

and what I'm trying to do is to have the file sample_sorted.bam renamed as abh100_sorted.bam. Is there a script that can do that iteratively over the 279 folders?

I'm quite new to working with such large amount of data and still experimenting with different options in other contexts but this is not as trivial for me. Let me know, thanks in advance!

P. S. the original BAMs in each folder have different names, although I'm not sure whether it helps

@overcraft , having read this a number of times .... :confused:

I presume you want :
for each of the 279-directories there are two .bam files
xxxxx.bam sample_sorted.bam
and you want to rename sample_sorted.bam to xxxxx_sorted.bam where xxxxx is the varying component of the filename.

Does the xxxxx have a structure ? like AAA999 .bam
where AAA is any 3letters and 999 is any 3 numbers ?
btw, is this related to your previous post -
Moving files from folder to folders efficiently
?

working on the basis of your last post regarding moving files effectively

something along the following ....

NB: remove the echo in front of the mv ... if/when satisfied this fits your needs

#!/bin/bash

for dir in sgdb_???;
do
        cd "$dir" 
        bambam=$(echo ??????.bam) #assumes filename in  AAA999.bam format
        echo mv sample_sorted.bam "${bambam%\.bam}_sorted.bam";
        cd ..
done

Another one, with some checks:

#!/bin/bash
samplefile=sample_sorted.bam
for bamfile in ??*/?????*.bam
do
    dir=${bamfile%/*}
    if [ "$bamfile" != "$dir/$samplefile" ] && [ -f "$bamfile" ] && [ -f "$dir/$samplefile" ]
    then
        echo mv "$dir/$samplefile" "${bamfile%.bam}_sorted.bam"
    fi
done
1 Like

@munkeHoller yes indeed, it is related to my previous post. However, I already generated the output I needed but now I'm facing the issue of having to rename them.

This is unfortunate and only a problem of the cluster where I work; in fact, they have a strict policy of deleting the files from the /scratch after 40dd which might not leave me enough time to conclude the analyses and, therefore, I cannot move 279 file with the same name to the work_area.

I will try the script out and let you know, there is no specific for the filename some are longer some shorter as well as no apparent pattern... I understand this makes things more complicated :sweat_smile:

@MadeInGermany ,

Is the script you shared accounting for different length in the original BAM file. For instance, some might have 6 character before the .bam, other 2 and some others 10.

Let me know, thanks in advance!

if there's only 2 files in the directory ending in .bam - one being sample_sorted.bam then its relatively straightforwards ...

change the line
bambam=$(echo ??????.bam)
to
bambam=$(ls *.bam|grep -v sample_sorted.bam)

this will work for variable length filenames
as usual - you need to test and be happy the code works for you.

1 Like

Each ? stands for one character.
To allow two or more characters for the directories and the bam files:

for bamfile in ??*/??*.bam

Both @munkeHoller and @MadeInGermany you have been extremely helpful with your solutions to the past issues I had.

I got all your points and noted the ideas behind, I was aware of the meaning of the "?"; however, something you both used is your answers is: %\ or %/. What do they do/stand for?

I hope some other ppl working with big data might find this useful.

Cheers!

check out this section of the bash manual for details of this and other bash goodies !

recommend you experiment using these to see their behaviours

1 Like

In ${var%pattern} it is $var but the pattern is stripped from the end.
The pattern can be .bam or /* where the latter leaves the directory name.
\.bam is the same as .bam, where the \ quotes the . that is not needed. (But would be needed for \%bam because %% would be another operator, or \*bam if a literal * is meant.)

1 Like

Thanks again, I will need to go through with calm on these concept to better understand possibly integrating with the resource @munkeHoller shared so that I can do some experiments.