In the below I am trying to rename the contents within each data subfolder in a specific run, based on a partial match of the IonCode_0000_ in each file in the data subdirectory to $1 in f1 . There will be multiple runs in f1 but each run in $uniq is unique and will be found in f1 and the rename values stored in $string . The below code is commented as to what I think is going on. Thank you :).
/path/to/run/R_2019_00_00_00_00_00_xxxx_xx1-127
data --- sub-folder ---
IonCode_0402_xxx.xxx_xxx.bam
IonCode_0402_xxx.xxx_xxx.bam.bai
IonCode_0404_xxx.xxx_xxx.bam
IonCode_0404_xxx.xxx_xxx.bam.bai
dir=/path/to/run/
for run in "$dir"/R_2019* ; do ## # matching "R_2019*" to operate on desired directory and expand
uniq=${run##*/} ## store run with no path as s5
cd "$dir"/"$uniq"/data ## change directory to subfolder
string=$(awk -F '\n' -v RS="" -v ref="$uniq" '$0 ~ ref {d=split($0, val, " "); for(i=2;i<d;i+=2) printf "%s ",val; printf "\n"}' "$dir"/f1) ## loop through f1 for unique run and store $2 in string
for $f in "$dir"/"$s5"/data/*.bam* ; do sample_basename=$(basename "${f}") ;
rename_file_path="$string" ## define rename string
cmd=$(sed -n "/$f/,/IonCode_[0-9][0-9][0-9][0-9]_*/{s/\(.*\.bam\) \(.*\)/mv \1 \2/g}" $rename_file_path) ## rename file in data subfolder matching IonCode_ to f1 and replacing with $2 of f1
done
done
You were pretty close on this. I used a here-string ( <<< ) piped into a while read block. As we want to change directories in the loop using a sub shell ensures the pwd is reset after each rename loop.
change echo mv in red below to mv if you are happy with what it's doing
dir=/path/to/run/
for run in "$dir"/R_2019* ; do ## # matching "R_2019*" to operate on desired directory and expand
uniq=${run##*/} ## store run with no path as s5
while read from to
do
(
cd "$dir"/"$uniq"/data
for file in *.bam*
do
newname=${file/$from/$to}
[ -f "$file" ] && [ "$newname" != "$file" ] && echo mv "$file" "$newname"
done
)
done <<<$(
awk -F '\n' -v RS="" -v ref="$uniq" '
$0 ~ ref {
d=split($0, val);
for(i=1;i<d;i++) print val;
}' "$dir"/f1
) ## loop through f1 for unique run and populate from and to
done
Looks like the last file in the directory is getting renamed with both matching not the unique. Is another loop needed or all the values on one line instead of separate? Thank you :).
11-1111-yy-yy-yyy_test.bam ---- this is IonCode_0402_xxx.xxx_xxx ---
11-1111-yy-yy-yyy_test.bam.bai ---- this is IonCode_0402_xxx.xxx_xxx ---
00-0000-xxx-xxx-xxx_test.bam ---- this is IonCode_0404_xxx.xxx_xxx ---
00-0000-xxx-xxx-xxx_test.bam.bai ---- this is IonCode_0404_xxx.xxx_xxx ---
It looks a bit complicated.
Perhaps you want to do the following?
dir=/path/to/run
ind=0
while read a b c
do
if [ -n "$b" ]
then
fsearch[ind]=$a
mvto[ind]=$b
((ind++))
elif [ -z "$a" ]
then
ind=0
else
while [ $ind -gt 0 ]
do
((ind--))
echo "In $dir/$a/data/ rename ${fsearch[ind]}*.bam* to ${mvto[ind]}_test.bam*"
done
fi
done < $dir/f1
What do you want to happen here? f1 requires that IonCode_0404 in directory *127* be renamed to both 10-0000-aa-aa-aa and 00-0000-xxx-xxx-xxx . Is this a mistake in the data file or how should the script handle this?
Change the newname assignment to this newname=${file/$from*.bam/${to}_test.bam}
11-1111-yy-yy-yyy_test.bam ---- this is IonCode_0402_xxx.xxx_xxx ---
11-1111-yy-yy-yyy_test.bam.bai ---- this is IonCode_0402_xxx.xxx_xxx ---
00-0000-xxx-xxx-xxx_test.bam ---- this is IonCode_0404_xxx.xxx_xxx ---
00-0000-xxx-xxx-xxx_test.bam.bai ---- this is IonCode_0404_xxx.xxx_xxx ---
I further qualified my question (see #5 above) - this appears to be a problem with the data file.
If the actual renames were done instead of echo, only the first match would apply as the file would then have a different name and the 2nd rename would not be attempted. Red lines will not occur as file has already be renamed on lines 1 and 2:
I'm not sure I understand completly but in f1 the same IonCode may appear multiple times. However, the value in uniq is always unique and the each IonCode above each uniq unill the newline in f1 will be found in the data as a pair. That is in f1IonCode_0404 but in dataIonCode_0404.bam and IonCode_0404.bam.bai . In f1IonCode_0402 but in dataIonCode_0402.bam and IonCode_0402.bam.bai.
Both IonCode pairs are renamed with the $2 values from each matching IonCode above uniq with _test after it. Thank you very much :).
My apologies, I have corected the typo in post 1 and here as well. All 3 uniq values in f1 will always be different I just transcribed them wrong. Line 3 (the duplicate) will never be there (computers make less mistakes) . Thank you :).
I put an echo "These are the files:" $file statement and the files in data before the script executes are:
These are the files: IonCode_0402_xxx.xxx_xxx.bam
These are the files: IonCode_0402_xxx.xxx_xxx.bam.bai
These are the files: IonCode_0404_xxx.xxx_xxx.bam
These are the files: IonCode_0404_xxx.xxx_xxx.bam.bai
Don't know what that "after the script executes" is showing. Are your filenames ending up with spaces in the etc. like shown above?
Here is the script I'm using:
dir=/path/to/run/
for run in "$dir"/R_2019* ; do ## # matching "R_2019*" to operate on desired directory and expand
uniq=${run##*/} ## store run with no path as s5
while read from to
do
(
cd "$dir"/"$uniq"/data
for file in *.bam*
do
newname=${file/$from*.bam/${to}_test.bam}
[ -f "$file" ] && [ "$newname" != "$file" ] && mv "$file" "$newname"
done
)
done <<<$(
awk -F '\n' -v RS="" -v ref="$uniq" '
$0 ~ ref {
d=split($0, val);
for(i=1;i<d;i++) print val;
}' "$dir"/f1
) ## loop through f1 for unique run and populate from and to
done
After the rename scripts runs then only one pair of the files is renamed with both values in it, with a space in between. This is shown in the above, but im not sure why. Your output looks good. Thank you :).
Are you using bash shell? Can you post output with this additional debugging:
#!/bin/bash
dir=/path/to/run/
for run in "$dir"/R_2019* ; do ## # matching "R_2019*" to operate on desired directory and expand
uniq=${run##*/} ## store run with no path as s5
while read from to
do
(
cd "$dir"/"$uniq"/data
echo "Rename from:$from to:$to"
for file in *.bam*
do
newname=${file/$from*.bam/${to}_test.bam}
[ -f "$file" ] && [ "$newname" != "$file" ] && echo mv "$file" "$newname"
done
)
done <<<$(
awk -F '\n' -v RS="" -v ref="$uniq" '
$0 ~ ref {
d=split($0, val);
for(i=1;i<d;i++) print val;
}' "$dir"/f1
) ## loop through f1 for unique run and populate from and to
done
Here is the output of od -c /path/to/f1 . Thank you :).
0000000 I o n C o d e _ 0 4 0 4 0 0 -
0000020 0 0 0 0 - x x x - x x x - x x x
0000040 \n I o n C o d e _ 0 4 0 2 1 1
0000060 - 1 1 1 1 - y y - y y - y y y \n
0000100 R _ 2 0 1 9 _ 0 0 _ 0 0 _ 0 0 _
0000120 0 0 _ 0 0 _ x x x x _ x x 1 - 1
0000140 2 7 - x x x _ x x x _ x x x _ x
0000160 x x _ x x _ x x _ x x \n \n I o n
0000200 C o d e _ 0 4 0 2 2 2 - 2 2 2
0000220 2 - z z - z z z z - z z z \n R _
0000240 2 0 1 9 _ 0 0 _ 0 0 _ 0 0 _ 0 0
0000260 _ 0 0 _ x x x x _ x x 1 - 1 2 6
0000300 - x x x _ x x x _ x x x _ x x x
0000320 _ x x _ x x _ x x \n \n I o n C o
0000340 d e _ 0 4 0 4 1 0 - 0 0 0 0 -
0000360 a a - a a - a a \n I o n C o d e
0000400 _ 0 4 1 2 5 5 - 1 1 1 1 - b b
0000420 - b b b - b b b \n R _ 2 0 1 9 _
0000440 0 0 _ 0 0 _ 0 0 _ 0 0 _ 0 0 _ x
0000460 x x x _ x x 1 - 1 2 0 - x x x _
0000500 x x x _ x x x _ x x x _ x x _ x
0000520 x _ x x \n
I pass run_dir as an argument instead of hardcoding dir . I think that is the only difference. I made that change to make it easier for others. Thank you
run_dir=$1
for run in "$run_dir" ; do ## # grab run to operate on desired directory
uniq=${run_dir##*/} ## store run with no path as uniq
while read from to
do
(
cd "$run_dir"/bam
echo "Rename from:$from to:$to"
for file in *.bam*
do
newname=${file/$from*.bam/${to}_RNA.bam}
[ -f "$file" ] && [ "$newname" != "$file" ] && mv "$file" "$newname"
done
)
done <<<$(
awk -F '\n' -v RS="" -v ref="$uniq" '
$0 ~ ref {
d=split($0, val);
for(i=1;i<d;i++) print val;
}' "$run_dir"/f1
) ## loop through f1 for unique run and populate from and to
done