I am trying to mv
each of the .vcf
files in the variants folder
to the folder in /home/cmccabe/f2
that the .vcf
id is found in file
. $2
in file
will always have the id of a .vcf
in the variants folder
. The line in blue staring with R_2019
in file
up to the -v5.6
will always be an exact match to a folder in /home/cmccabe/f2
. There may be multiple folders in /home/cmccabe/f2
but will only have one match in file
. There also may be mulitple id's but always only one .vcf
in /home/cmccabe/f1/variants
.
When a match is found between the folder in /home/cmccabe/f2
and the R_
in file
, then the id(s) in $2
will be found in /home/cmccabe/f1/variants
as a .vcf
. Each .vcf
is then moved to the matching folder in /home/cmccabe/f2
in a the sub-folder variants
. This is the last step of a procedure that I am stuck on. I have included an attempt in bash
and included comments, but im sure there is a better way. Thank you :).
file in /home/cmccabe/f1
IonCode_0007 19-0004-La-Fi
IonCode_0009 19-0005-Last-Firs
IonCode_0011 19-0008-LastN-FirstN
IonCode_0013 190320-Control
R_2019_03_12_13_59_54_user_S5-0271-100-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
IonCode_0005 19-0000-LastName-FirstName
IonCode_0001 19-0001-Las-Fir
IonCode_0003 190319-Control
R_2019_03_12_11_10_20_user_S5-0271-99-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
variants folder in /home/cmccabe/f1
19-0000-LastName-FirstName.vcf
19-0001-Las-Fir.vcf
190319-Control.vcf
19-0004-La-Fi.vcf
19-0005-Last-Firs.vcf
19-0008-LastN-FirstN.vcf
190320-Control.vcf
current structure of /home/cmccabe/f2
R_2019_03_12_11_10_20_user_S5-0271-99 ---parent directory ---
- bam --- sub-folder ---
- qc --- sub-folder ---
- 19-0000-LastName-FirstName
- variants
- 19-0001-Last-Firs
- variants
- 190319-Control
- variants
R_2019_03_12_13_59_54_user_S5-0271-100 ---parent directory ---
- bam --- sub-folder ---
- qc --- sub-folder ---
19-0004-La-Fi
- variants
- 19-0005-Last-Firs
- variants
- 19-0008-LastN-FirstN
- variants
- 190320-Control.vcf
-variants
desired structure of /home/cmccabe/f2
R_2019_03_12_11_10_20_user_S5-0271-99 ---parent directory ---
- bam --- sub-folder ---
- qc --- sub-folder
- 19-0000-LastName-FirstName
- variants
19-0000-LastName-FirstName.vcf
- 19-0001-Last-Firs
- variants
19-0001-Last-Firs.vcf
- 190319-Control
- variants
190319-Control.vcf
R_2019_03_12_13_59_54_user_S5-0271-100 ---parent directory ---
- bam --- sub-folder ---
- qc --- sub-folder ---
- 19-0004-La-Fi
- variants
19-0004-La-Fi.vcf
- 19-0005-Last-Firs
- variants
19-0005-Last-Firs.vcf
- 19-0008-LastN-FirstN
- variants
19-0008-LastN-FirstN.vcf
- 190320-Control.vcf
-variants
190320-Control.vcf
possible bash
for file in /home/cmccabe/f1/variants/*.vcf ; do
bname=$(basename $file) # strip of path
VCF="$(echo $bname|cut -d. -f1)" # remove .vcf extension
f=$(printf '%s' /home/cmccabe/f1/file/${VCF}) ## # Find matching id
FILE2=$(awk '{print $2}' $f') # set VCF lookup to column
for RDIR in "$DIR"/R_2019* ; do FOLDER=${RDIR%%-v5.6*}; done ## trim folder match in RDIR from -v5.6 and store in FOLDER
if [[ $VCF = $FILE2 ]] # only execute file on match
then
mkdir -p /home/cmccabe/f2/$FOLDER/variants ## create variants sub-folder
mv /home/cmccabe/f1/file/$VCF /home/cmccabe/f2/$FOLDER/$VCF/variants ## move vcf to folder/id/variants
fi ## end if
done ## close loop
What operating system are you using for this exercise?
It seems that the text description of your problem says that everything you need to find the files to be moved and the locations to which they should be moved is found in a file named /home/cmccabe/f1/file
, but your script is treating that regular file as a directory. What am I missing?
Furthermore, you go to a lot of work to create a variable named VCF
which contains the name of a file after stripping off the .vcf
filename extension. But when you start moving the .vcf
files, you use $VCF
as the name of those files without reinstating the filename extension???
I then got completely lost when you started a loop on all of the R_2019*
files in $DIR
. Note that the DIR
variable is never defined in your script and is never mentioned in your description of what you are trying to do.
I'm having a hard time guessing at what files are being processed by the code:
FILE2=$(awk '{print $2}' $f')
(which should have "$f"
instead of $f
). I'm guessing that this will set FILE2
to a list of filenames that you are then treating as a single filename; but since I don't know what the contents are of the file that has been selected by $f
; I'm lost.
I'm assuming that you have tried running your script and it is failing to work. What diagnostics is it printing, or if there aren't any, in what way is it failing to do what you want it to do?
Please indent your code to show its structure. Then comments like "end if" and "end loop" won't be needed and we won't have to wonder where the start of the "if" and "loop" are located. I know the shell doesn't care about indentation, but you are a human and you're asking humans on this forum to read your code. Lack of indentation makes it make difficult for humans (including you) to understand what your code is trying to do.
2 Likes
I am using ubuntu 14.04
as my os.
/home/cmccabe/f1/file
is the path to file
(which has all the necessary information for the move, (folder name, ids).
The for loop on RDIR
was for trimming the R_2019
[/ICODE] in file to match the folder name in /home/cmccabe/f2
but is undefined and maybe should be /home/cmccabe/f1/file
. The FILE2=$(awk '{print $2}' $f')
was then intended to read each id from file1
in FILE2
. The code executes but nothing is moved and set -x
shows the variables not being populated correctly as you already knew :).I indented the code above but add comments to help me learn and help me in my logic. Thank you for your help:).
I rewrote the script (well a portion) and most of the variables seem good: $STRING
is the same as FILE2
, I just changed the name to hopefully be more clear as I am looking for a string. However, the loop
is not working so only the first id is retained in $STRING
. I think I am on the right track but is there a better way? Thank you :).
set -x
DIR=/home/cmccabe/f1
DEST=/home/cmccabe/f2
for file in "$DIR"/variants/*.vcf ; do
bname=$(basename $file) # strip of path
VCF="$(echo $bname|cut -d. -f1)" # remove .vcf extension
for f in "$DIR"/file; do STRING=( $(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file) ); echo "This is the string" "$STRING"; done
done
set -x
cmccabe@Satellite-M645:~$ set -x
cmccabe@Satellite-M645:~$ DIR=/home/cmccabe/f1
+ DIR=/home/cmccabe/f1
cmccabe@Satellite-M645:~$ DEST=/home/cmccabe/f2
+ DEST=/home/cmccabe/f2
cmccabe@Satellite-M645:~$ for file in "$DIR"/variants/*.vcf ; do
> bname=$(basename $file) # strip of path
> VCF="$(echo $bname|cut -d. -f1)" # remove .vcf extension
> for f in "$DIR"/file; do STRING=( $(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file) ); echo "This is the string" "$STRING"; done
> done
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0000-LastName-FirstName.vcf
+ bname=19-0000-LastName-FirstName.vcf
++ echo 19-0000-LastName-FirstName.vcf
++ cut -d. -f1
+ VCF=19-0000-LastName-FirstName
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0002-L-F.vcf
+ bname=19-0002-L-F.vcf
++ echo 19-0002-L-F.vcf
++ cut -d. -f1
+ VCF=19-0002-L-F
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0004-La-Fi.vcf
+ bname=19-0004-La-Fi.vcf
++ echo 19-0004-La-Fi.vcf
++ cut -d. -f1
+ VCF=19-0004-La-Fi
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/19-0020-Las-Fir.vcf
+ bname=19-0020-Las-Fir.vcf
++ echo 19-0020-Las-Fir.vcf
++ cut -d. -f1
+ VCF=19-0020-Las-Fir
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/190319-Control.vcf
+ bname=190319-Control.vcf
++ echo 190319-Control.vcf
++ cut -d. -f1
+ VCF=190319-Control
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
+ for file in '"$DIR"/variants/*.vcf'
++ basename /home/cmccabe/f1/variants/190320-Control.vcf
+ bname=190320-Control.vcf
++ echo 190320-Control.vcf
++ cut -d. -f1
+ VCF=190320-Control
+ for f in '"$DIR"/file'
+ STRING=($(awk '{for(i=2; i<=NF; i++) print $i}' "$DIR"/file))
++ awk '{for(i=2; i<=NF; i++) print $i}' /home/cmccabe/f1/file
+ echo 'This is the string' 19-0000-LastName-FirstName
This is the string 19-0000-LastName-FirstName
I apologize for not getting back to you sooner. (I was distracted for a few days by other activities.)
Have you made any progress on resolving this problem?
2 Likes
I have been able to get a working solution that produces my desired results... using set -x
and the below modifications
if [[ $VCF = ${STRING[*]} ]] # only execute file on match
then
RSTRING=$(awk '/R_2019/' "$DIR"/run) ## search for lines matching R_2019 pattern
VCFRUN=$(awk -F '\n' -v RS="" -v ref="$VCF" '$0 ~ ref {print $NF}' "$DIR"/file) ## search file for matching $VCF and return last column ($2)
RUN="$(echo $RSTRING|cut -d- -f1,2,3)" ## remove after third _ in line with R_2019
mv "$DIR"/variants/${VCF}.vcf "$DEST"/"$RUN"/"$VCF"/variants ## move vcf to folder in destination
This matched each .vcf
and moved the match to the correct run file. Maybe this will help others as well.
Thank you very much for your help :).