Bash to trim folder and files within a path that share a common file extension

The bash will trim the folder to trim folder. Within each of the folders (there may be more than 1) and the format is always the same, are several .bam and matching .bam.bai files (file structure) and the bash under that executes and trims the .bam as expected but repeats the .bam.bai extentions after trimming those files. xxxx_0113_xxx_xxx.bam.bai.bam.bai also in the set -xv . I think the .bam extension common to both may be causing the repeat but am not sure. Removing the .bam.bai from the mv did not fix the repeats. The end goal is to trim the folders and the files within each of the folders and I am not sure if the nested loops are the best way (probably not). Thank you :).

bash to trim folder

for folder in /home/cmccabe/rename/*/ ; do  ## start loop in subdirectory
     mv  "$folder"  "${folder%%-v5.6*}"  ## trim folder name
done  ## close loop

folder in /home/cmccabe/rename

R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions

trim folder

R_2019_01_30_14_24_53_user_S5-0271-95

file structure in each /home/cmccabe/rename/<folder>

xxxx_0111_xxx_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam
xxxx_0111_xxx_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai
xxxx_0113_xxx_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam
xxxx_0113_xxx_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai

bash to trim files

for d in /home/cmccabe/rename/* ; do  ## start loop in parentdir
  if [ -d "$d" ]; then  ##  grab subdir and store in parentdir/subdir in $d
     subdir="$(basename $d)"  ## define sub-directory
  fi  ## end if
 for bam in "${d}"/*.bam ; do ## iterate through each file in parentdir and read into bam
   for bai in "${d}"/*.bam.bai ; do ## iterate through each file in parentdir/subdir and read into bai
     bam_path_removed=$(echo $bam| awk -F/ '{print $NF}') ## cut text before last /
     bai_path_removed=$(echo $bai| awk -F/ '{print $NF}') ## cut text before last /
     bam_trim=$(echo "$bam_path_removed"|cut -f1,2,3,4 -d'_')
     bai_trim=$(echo "$bai_path_removed"|cut -f1,2,3,4 -d'_')
       mv "${bam}" "${d}/${bam_trim}".bam ## rename all bam
       mv "${bai}" "${d}/${bai_trim}".bam.bai ## rename all bai
  done ## close loop
 done ## close loop
done ## close loop

desired trim files

xxxx_0111_xxx_xxx.bam
xxxx_0111_xxx_xxx.bam.bai
xxxx_0113_xxx_xxx.bam
xxxx_0113_xxx_xxx.bam.bai

set -xv w/ echo mv

+ echo mv /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxxx_0111_xxx_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxxx_0111_xxx_xxx.bam.bai

set -xv w/o echo mv

mv /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxxx_0111_xxx_xxx.bam.bai /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxxx_0111_xxx_xxx.bam.bai.bam.bai

See if this helps :

#!/usr/bin/bash
# we are running this from ROOTDIR, or abort.
ROOTDIR=/home/cmccabe/rename
cd $ROOTDIR || exit 1
# matching name "R_2019*" to operate on desired directory names, expand this to be precise.
MDIR="R_2019*"

# GNU find feature here, since we do not have information about subdirectories under 'rename'
# If you have subdirectories and want posix more effort will be required, you did not specify operating system.

find . -mindepth 1 -maxdepth 1 -type d -name "${MDIR}" | while read RDIR
do
	TRIMSTR="${RDIR%%-v5.6*}"
	for FLN in $RDIR/*.bam $RDIR/*.bam.bai
		do
		FLNSUB="_${RDIR/\.\//}"
		echo "mv ${FLN} ${FLN/$FLNSUB}"
	done
	# Now we shall rename the folder, after files inside have been renamed.
echo "mv $RDIR ${TRIMSTR}"      
done

Remove the echo infront of mv commands to execute against files, otherwise it will just print on terminal.
Probably could use some more error handling and stuff.

Hope that helps
Regards
Peasant.

2 Likes

What you have shown us above makes it look like you may have moved a bunch of your *.bam.bai files into *.bam.bai.bam.bai files and possibly moved *.bam files into *.bam.bam files. The purpose of the echo commands was to make sure that the mv commands that would be executed looked good before actually moving files. The fact that the echo set -xv output did not show the same filenames as the mv set -xv output seems to imply that when the echo was removed from the echo mv lines in your script, something else in your script was changed than just removing the echo in front of the two mv commands.

Since the output from the set -xv trace showed that it was going to rename files in the wrong directory, why did you remove the echo and run it again? The purpose of having the echo in there is so that you can verify that the command being echoed is the command that you want the script to actually perform when run a second time with the echo s removed.

Please show us the output from the command:

find /home/cmccabe/rename/ \( -type d -o -name '*.bam*' \) -exec ls -ld {} +

so we can see how things stand now. Please also tell us what operating system you're using. (PLEASE always tell us what operating system and shell you're using when you start a new thread.)

I don't know if you have tried running the code Peasant suggested in post #2 in this thread. I'm afraid the code Peasant suggested might only work with the file hierarchy you described before any files were moved. (Note that I haven't tried to figure out what his code will do if it starts with your modified file hierarchy instead of what may be the current file hierarchy.) Do you have backups so you can restore that original state? If not, I'm hoping that with the output from the find command above we'll be able to find a way to get to where you want to be without losing any data.

1 Like

I am using ubuntu 14.04 as my os.
Each .bam and .bam.bai is inside each R_2019 directory but it looks like the files can not be found. Thank you :).

set -xv

ROOTDIR=/home/cmccabe/rename
ROOTDIR=/home/cmccabe/rename
+ ROOTDIR=/home/cmccabe/rename
cmccabe@Satellite-M645:~$ cd $ROOTDIR || exit 1
cd $ROOTDIR || exit 1
+ cd /home/cmccabe/rename
cmccabe@Satellite-M645:~/rename$ # matching name "R_2019*" to operate on desired directory names, expand this to be precise.
# matching name "R_2019*" to operate on desired directory names, expand this to be precise.
cmccabe@Satellite-M645:~/rename$ MDIR="R_2019*"
MDIR="R_2019*"
+ MDIR='R_2019*'
cmccabe@Satellite-M645:~/rename$ find . -mindepth 1 -maxdepth 1 -type d -name "${MDIR}" | while read RDIR
find . -mindepth 1 -maxdepth 1 -type d -name "${MDIR}" | while read RDIR
> do
do
>   TRIMSTR="${RDIR%%-v5.6*}"
  TRIMSTR="${RDIR%%-v5.6*}"
>      for FLN in $RDIR/*.bam $RDIR/*.bam.bai
     for FLN in $RDIR/*.bam $RDIR/*.bam.bai
>      do
     do
> FLNSUB="_${RDIR/\.\//}"
FLNSUB="_${RDIR/\.\//}"
>     "mv ${FLN} ${FLN/$FLNSUB}"
    "mv ${FLN} ${FLN/$FLNSUB}"
>      done
     done
> # Now we shall rename the folder, after files inside have been renamed.
# Now we shall rename the folder, after files inside have been renamed.
>     "mv $RDIR ${TRIMSTR}"      
    "mv $RDIR ${TRIMSTR}"      
> done
done
+ read RDIR
+ find . -mindepth 1 -maxdepth 1 -type d -name 'R_2019*'
+ TRIMSTR=./R_2019_01_30_14_24_53_user_S5-0271-95
+ for FLN in '$RDIR/*.bam' '$RDIR/*.bam.bai'
+ FLNSUB=_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
+ 'mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx.bam'
bash: mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx.bam: No such file or directory
+ for FLN in '$RDIR/*.bam' '$RDIR/*.bam.bai'
+ FLNSUB=_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
+ 'mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx.bam'
bash: mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx.bam: No such file or directory
+ for FLN in '$RDIR/*.bam' '$RDIR/*.bam.bai'
+ FLNSUB=_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
+ 'mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx.bam.bai'
bash: mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx.bam.bai: No such file or directory
+ for FLN in '$RDIR/*.bam' '$RDIR/*.bam.bai'
+ FLNSUB=_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
+ 'mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx.bam.bai'
bash: mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx.bam.bai: No such file or directory
+ 'mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions ./R_2019_01_30_14_24_53_user_S5-0271-95'
bash: mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions ./R_2019_01_30_14_24_53_user_S5-0271-95: No such file or directory
+ read RDIR

--- Post updated at 08:01 AM ---

@Don Cragun I just tried the scrip by @Peasent and posted the results. The files could not be found to trim. I am using ubuntu 14.04 currently and may be migrating to centos 7 in the near future.

I do have backups of the data and removed the echo as the output looked correct and since Ihave backups I performed the mv . As I look back it was not correct but the .bam files were trimmed as expected it was the .bam.bai that were not. Thank you :).

find /home/cmccabe/rename/ \( -type d -o -name '*.bam*' \) -exec ls -ld {} +

drwxrwxr-x 3 cmccabe cmccabe 4096 Mar 17 07:28 /home/cmccabe/rename/
drwxrwxr-x 2 cmccabe cmccabe 4096 Mar 17 07:34 /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
-rw-rw-r-- 1 cmccabe cmccabe    0 Feb 28 14:59 /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam
-rw-rw-r-- 1 cmccabe cmccabe    0 Feb 28 14:59 /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_011_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai
-rw-rw-r-- 1 cmccabe cmccabe    0 Feb 28 14:59 /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam
-rw-rw-r-- 1 cmccabe cmccabe    0 Feb 28 14:59 /home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/xxx_013_xx00_xxx_R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.bam.bai

OK. We're dealing with empty .bam and .bam.bai files, so we won't be destroying your data no matter what we do here.

When I recreated your file hierarchy on my system, change the first line of Peasant's code from:

#!/usr/bin/bash

to:

#!/bin/bash

and ran his code the results looked promising. So, I change the lines:

		echo "mv ${FLN} ${FLN/$FLNSUB}"
 ... ... ...
echo "mv $RDIR ${TRIMSTR}"

in his code to:

		mv "${FLN}" "${FLN/$FLNSUB}"
	done
	# Now we shall rename the folder, after files inside have been renamed.
	mv "$RDIR" "${TRIMSTR}"

and reran the script. Afterwards, I ran an ls on the resulting file hierarchy and got:

$ ls -lR home
total 0
drwxr-xr-x  3 dwc  staff  96 Mar 17 08:51 cmccabe

home/cmccabe:
total 0
drwxr-xr-x  3 dwc  staff  96 Mar 17 08:53 rename

home/cmccabe/rename:
total 0
drwxr-xr-x  6 dwc  staff  192 Mar 17 08:53 R_2019_01_30_14_24_53_user_S5-0271-95

home/cmccabe/rename/R_2019_01_30_14_24_53_user_S5-0271-95:
total 0
-rw-r--r--  1 dwc  staff  0 Mar 17 08:51 xxx_011_xx00_xxx.bam
-rw-r--r--  1 dwc  staff  0 Mar 17 08:51 xxx_011_xx00_xxx.bam.bai
-rw-r--r--  1 dwc  staff  0 Mar 17 08:51 xxx_013_xx00_xxx.bam
-rw-r--r--  1 dwc  staff  0 Mar 17 08:51 xxx_013_xx00_xxx.bam.bai
$ 

which looks exactly like what I thought you were trying to produce.

I guess Peasant and I both thought that you knew how to rearrange the quotes on his echo commands to get working mv commands.

If you try his code again with the mv commands changed to match what I have suggested above, do you get what you want?

1 Like

Hi Peasant,
Here is a slightly modified version of your script that just uses options and variable expansions defined by the POSIX standards. But, of course, it still depends on us knowing the pathname of a shell that provides those standard variable expansions. (Note that find isn't needed for this; we can get what we need just using shell pathname expansions.)

#!/bin/bash
set -xv
# we are running this from ROOTDIR, or abort.
ROOTDIR=/home/cmccabe/rename
cd $ROOTDIR || exit 1
# matching name "R_2019*" to operate on desired directory names, expand this to be precise.

for RDIR in R_2019*/
do
	TRIMSTR=${RDIR%%-v5.6*}
	for FLN in $RDIR*.bam	# Note that RDIR contains a trailing /.
	do
		FLNSUB=${FLN%_R_2019_*}
		mv "${FLN}" "$FLNSUB.bam"
		# Use the fact that .bam and .bam.bai files are paired.
		mv "${FLN}.bai" "$FLNSUB.bam.bai"
	done
	# Now we shall rename the folder, after files inside have been renamed.
	mv "$RDIR" "${TRIMSTR}"
done

Hi cmccabe,
If Peasant's script (modified as suggested in post #5 worked for you, the script above should also work and should even run a little bit faster since it doesn't need to invoke find to get the job done.

I hope this helps,
Don

2 Likes

Thank you both :).... both scripts work great. I guess I don't understand "" and have noticed it makes a difference in the output but have to read more about it. I always thought it was for escaping a whitespace in a filename or variable. Is that not true? Thanks again :).

Yes, double-quotes are used to keep field separators in pathnames from being recognized as field separators. But, in an mv command you still need the shell to separate the mv command name from the two pathname operands that you want to pass to it. When you issue the command:

"mv $RDIR ${TRIMSTR}"

you're telling the shell that you want to execute a command named something like:

mv ./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions ./R_2019_01_30_14_24_53_user_S5-0271-95

with no operands instead of executing the mv command with the two operands

./R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions

and

./R_2019_01_30_14_24_53_user_S5-0271-95

that you get with the command:

mv "$RDIR" "${TRIMSTR}"
1 Like

Thank you very much for the great explanation, very helpful :).

One should protect $RDIR from further expansion here:

    for FLN in "$RDIR"*.bam    # Note that RDIR contains a trailing /.
2 Likes