Move matching file to folder in same directory

The below bash executes but nothing is copied. I am trying to cp or mv the matching .pdf to the corresponding folder. There will always be a matching .pdf found but the number of folders may vary. The portion in red is what is used to match the .pdf --- variable $pdf to the folder--- variable $fmatch . If I do an echo I can see the variables are populated correctly. I used cp to test before mv to make sure it works. Thank you :).

folders in $dir --- this is $fmatch

R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions

pdf in $dir --- this is $pdf

Auto_user_S5_0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf
Auto_user_S5_0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf

desired output

Auto_user_S5_0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf  ---> R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
Auto_user_S5_0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf ---> R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
dir=/home/cmccabe/folder
for file1 in $dir/*.pdf; do
# Grab file prefix
  bname=`basename $file1` # strip of path
  pre="$(echo $bname|cut -d_ -f1,2,3,4)" # remove after fourth underscore
  pdf="$(echo $pre|awk -F- '{print $2"-"$3}')" # split on - and print 2 and third field
# Find matching folder
  folder=$(find /home/cmccabe/folder/*/ -type d)
  fpre="$(echo "$folder"|cut -d_ -f1,2,3,4,5,6,7,8,9)" # remove after ninth underscore
  fmatch="$(echo "$fpre"|awk -F- '{print $3"-"$4}')"  # split on - and print third and fourth field
     if [[ $pdf = $fmatch ]] # only execute file---folder value match
      then
   cp $dir/$pdf.pdf $dir/$fmatch
fi
done

bash

And what tracing output do you get when you run your script with set -xv enabled?

1 Like
cmccabe@Satellite-M645:~$ set -xv
cmccabe@Satellite-M645:~$ dir=/home/cmccabe/folder
dir=/home/cmccabe/folder
+ dir=/home/cmccabe/folder
cmccabe@Satellite-M645:~$ for file1 in $dir/*.pdf; do
for file1 in $dir/*.pdf; do
> # Grab file prefix
# Grab file prefix
>   bname=`basename $file1` # strip of path
  bname=`basename $file1` # strip of path
>   pre="$(echo $bname|cut -d_ -f1,2,3,4)" # remove after fourth underscore
  pre="$(echo $bname|cut -d_ -f1,2,3,4)" # remove after fourth underscore
>   pdf="$(echo $pre|awk -F- '{print $2"-"$3}')" # split on - and print 2 and third field
  pdf="$(echo $pre|awk -F- '{print $2"-"$3}')" # split on - and print 2 and third field
> # Find matching folder
# Find matching folder
>   folder=$(find /home/cmccabe/folder/*/ -type d)
  folder=$(find /home/cmccabe/folder/*/ -type d)
>   fpre="$(echo "$folder"|cut -d_ -f1,2,3,4,5,6,7,8,9)" # remove after ninth underscore
  fpre="$(echo "$folder"|cut -d_ -f1,2,3,4,5,6,7,8,9)" # remove after ninth underscore
>   fmatch="$(echo "$fpre"|awk -F- '{print $3"-"$4}')"  # split on - and print third and fourth field
  fmatch="$(echo "$fpre"|awk -F- '{print $3"-"$4}')"  # split on - and print third and fourth field
>      if [[ $pdf = $fmatch ]] # only execute file---folder value match
     if [[ $pdf = $fmatch ]] # only execute file---folder value match
>       then
      then
>    cp $dir/$pdf.pdf $dir/$fmatch
   cp $dir/$pdf.pdf $dir/$fmatch
> fi
fi
> done
done
+ for file1 in '$dir/*.pdf'
basename $file1
++ basename /home/cmccabe/folder/Auto_user_S5_0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf
+ bname=Auto_user_S5_0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf
echo $bname|cut -d_ -f1,2,3,4
++ echo Auto_user_S5_0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf
++ cut -d_ -f1,2,3,4
+ pre=Auto_user_S5_0271-95-v5.6
echo $pre|awk -F- '{print $2"-"$3}'
++ echo Auto_user_S5_0271-95-v5.6
++ awk -F- '{print $2"-"$3}'
+ pdf=95-v5.6
find /home/cmccabe/folder/*/ -type d
++ find /home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/ /home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/ -type d
+ folder='/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/
/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/'
echo "$folder"|cut -d_ -f1,2,3,4,5,6,7,8,9
++ echo '/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/
++ cut -d_ -f1,2,3,4,5,6,7,8,9
/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/'
+ fpre='/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6
/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-96-v5.6'
echo "$fpre"|awk -F- '{print $3"-"$4}'
++ echo '/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6
/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-96-v5.6'
++ awk -F- '{print $3"-"$4}'
+ fmatch='95-v5.6
96-v5.6'
+ [[ 95-v5.6 = 95-v5.6
96-v5.6 ]]
+ for file1 in '$dir/*.pdf'
basename $file1
++ basename /home/cmccabe/folder/Auto_user_S5_0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf
+ bname=Auto_user_S5_0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf
echo $bname|cut -d_ -f1,2,3,4
++ echo Auto_user_S5_0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf
++ cut -d_ -f1,2,3,4
+ pre=Auto_user_S5_0271-96-v5.6
echo $pre|awk -F- '{print $2"-"$3}'
++ echo Auto_user_S5_0271-96-v5.6
++ awk -F- '{print $2"-"$3}'
+ pdf=96-v5.6
find /home/cmccabe/folder/*/ -type d
++ find /home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/ /home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/ -type d
+ folder='/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/
/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/'
echo "$folder"|cut -d_ -f1,2,3,4,5,6,7,8,9
++ echo '/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/
/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions/'
++ cut -d_ -f1,2,3,4,5,6,7,8,9
+ fpre='/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6
/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-96-v5.6'
echo "$fpre"|awk -F- '{print $3"-"$4}'
++ echo '/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-95-v5.6
/home/cmccabe/folder/R_2019_01_30_14_24_53_user_S5-0271-96-v5.6'
++ awk -F- '{print $3"-"$4}'
+ fmatch='95-v5.6
96-v5.6'
+ [[ 96-v5.6 = 95-v5.6
96-v5.6 ]]

And, in that trace output do you see that the if test is always failing as shown in red above because fnmatch is being set to two values as shown in orange and then compared to $pdf which only contains a single value?

And, note that your copy command:

   cp $dir/$pdf.pdf $dir/$fmatch

can't do what you want it to do if $fmatch expands to two directory names.

Does this tell you what you need to do next?

1 Like

The script below will identify the oldest folder in the directory and and stored as $filename and the parsed value is $folder .... user_S5-0271-96-v5.6 is an example.

The awk extracts the matching pdf based on the $filename variable. That pdf is parsed and the user_S5-0271-96-v5.6 is stored in $pdf . The set -xv shows that nothing populates in $pdf . I am not sure why though?

I will then perform the match on the $folder = $ pdf and cp . to the matching $folder in $dir . Is there another better way or am I getting closer?

I also removed the gsub(/^0+/,"", FNUM) as that is not needed to match the pdf .

set -xv  # add error checking
dir=/home/cmccabe/folder  # define directory as dir
# Find oldest directory
find "$dir" -maxdepth 1 -mindepth 1 -type d -printf '%T+\t%P\0' | sort -z |  #s earch dir for only folders by time and sort
while read -r -d $'\t' time && read -r -d '' filename  # read each folder into $filename and grab oldest
do  # start loop
 printf "The oldest folder is $filename, created on $time\n"  # print message with oldest folder
folder="$(echo $filename|cut -d'_' -f8-)" # split on _ and print  # create $folder variable with parse output
echo "The folder is" "$folder"  # confirm parse message --- user_S5-0271-96-v5.6 --- is an example
# Find matching pdf
pdf=$(awk -v FL="$filename" '  # store oldest folder and $FL
         FNR == 1 {filenum++}  # start loop
         filenum==1 && index($0, FL) { # look at only 1 folder and index
              match($0, "_0*([0-9]+)/") # match substring _user in each folder name
              FNUM=substr($0,RSTART+1,RLENGTH-2) # extract contents and store as $FNUM --- user_S5-0271-96-v5.6 --- is an example  
          }filenum==2 && $0 ~ FNUM".pdf$"') # print $FNUM for pdf
   break  # end loop
done  # end processing
echo "The matching pdf is" $pdf  # confirm pdf match
# copy pdf to folder
if [[ "$pdf" = "$folder" ]] # only execute file--->folder value match
      then  # perform action on match
   cp $dir/*$pdf.pdf $dir/*$folder  # copy pdf to matching folder
fi  # end 
done # processing complete

set -xv output

cmccabe@Satellite-M645:~$ set -xv
cmccabe@Satellite-M645:~$ dir=/home/cmccabe/folder
dir=/home/cmccabe/folder
+ dir=/home/cmccabe/folder
cmccabe@Satellite-M645:~$ # Find oldest directory
# Find oldest directory
cmccabe@Satellite-M645:~$ find "$dir" -maxdepth 1 -mindepth 1 -type d -printf '%T+\t%P\0' | sort -z |
find "$dir" -maxdepth 1 -mindepth 1 -type d -printf '%T+\t%P\0' | sort -z |
> while read -r -d $'\t' time && read -r -d '' filename
while read -r -d $'\t' time && read -r -d '' filename
> do
do
>  printf "The oldest folder is $filename, created on $time\n"
 printf "The oldest folder is $filename, created on $time\n"
> folder="$(echo $filename|cut -d'_' -f8-)" # split on _ and print
folder="$(echo $filename|cut -d'_' -f8-)" # split on _ and print
> echo "The folder is" "$folder"
echo "The folder is" "$folder"
> # Find matching pdf
# Find matching pdf
> pdf=$(awk -v FL="$filename" '
pdf=$(awk -v FL="$filename" '
>          FNR == 1 {filenum++}
         FNR == 1 {filenum++}
>          filenum==1 && index($0, FL) { 
         filenum==1 && index($0, FL) { 
>               match($0, "_0*([0-9]+)/")
              match($0, "_0*([0-9]+)/")
>               FNUM=substr($0,RSTART+1,RLENGTH-2)
              FNUM=substr($0,RSTART+1,RLENGTH-2)
>               gsub(/^0+/,"", FNUM)
              gsub(/^0+/,"", FNUM)
>           }filenum==2 && $0 ~ FNUM".pdf$"')
          }filenum==2 && $0 ~ FNUM".pdf$"')
>    break
   break
> done
done
+ sort -z
+ find /home/cmccabe/folder -maxdepth 1 -mindepth 1 -type d -printf '%T+\t%P\0'
+ read -r -d '	' time
+ read -r -d '' filename
+ printf 'The oldest folder is R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions, created on 2019-03-01+11:32:47.3690364740\n'
The oldest folder is R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions, created on 2019-03-01+11:32:47.3690364740
echo $filename|cut -d'_' -f8-
++ echo R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
++ cut -d_ -f8-
+ folder=user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
+ echo 'The folder is' user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
The folder is user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
awk -v FL="$filename" '
         FNR == 1 {filenum++}
         filenum==1 && index($0, FL) { 
              match($0, "_0*([0-9]+)/")
              FNUM=substr($0,RSTART+1,RLENGTH-2)
              gsub(/^0+/,"", FNUM)
          }filenum==2 && $0 ~ FNUM".pdf$"'
++ awk -v FL=R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions '
         FNR == 1 {filenum++}
         filenum==1 && index($0, FL) { 
              match($0, "_0*([0-9]+)/")
              FNUM=substr($0,RSTART+1,RLENGTH-2)
              gsub(/^0+/,"", FNUM)
          }filenum==2 && $0 ~ FNUM".pdf$"'
+ pdf=
+ break
cmccabe@Satellite-M645:~$ echo "The matching pdf is" $pdf
echo "The matching pdf is" $pdf
+ echo 'The matching pdf is'
The matching pdf is

Hi mccabe,
I am completely confused by your description of your code. I don't understand how most of what you have described could happen. (Or, since your code isn't working, maybe I'm not confused.)

OK. think I understand the first part of this. You are afraid that some of your directory names contain <newline> characters, so instead of using <newline>s to separate the timestamp and directory name output records from find , you're using a null byte to separate records. You hadn't said anything about <newline>s in filenames (including directories) being a problem, so why add this complexity? It is clearly confusing me and your awk script.

I'm further confused by your choice of variable names. The value assigned to the variable named filename is the pathname of a directory (which I believe is a synonym for folder in your code). The value assigned to the variable named folder is a substring of the value assigned to filename and, if I understand it correctly is also a substring of exactly one PDF file in this directory as well. Is there also supposed to be a directory with a name that exactly matches that substring? Are you trying to copy the PDF file to a regular file with that common substring as its new filename? Are you trying to copy the PDF file into a directory whose name is that common substring?

I don't know what most of that means, but I see absolutely no possibility that this code could do that. The awk code seems to be playing some games with the name of the directory being processed on this time through the loop. The awk code is not given any filenames to process, so it will be reading all of the remaining output from sort as a single partial line of text that contains a NULL byte (meaning it is not text) for each directory that find found. This also means that we will only go through this loop once. The only way that this awk script could ever print anything would be if a line of input in the second input file it processes ends with the string .pdf . But since there is only one partial line of input in the output from sort that awk is reading and there is no second input file for awk to process, that can never happen. Therefore, as we see in the trace output, awk does not print anything and the pdf variable in your script is set to an empty string.

The awk utility is not capable of parsing a PDF file. The awk utility is intended to read lines from text files. PDF files are not text files. The output from sort -z is not a text file either.

Huh? Do you really have directories that have names that end with .pdf ?

I'm sorry, but I don't know if you're getting closer. I haven't figured out what directories are involved in what you're trying to do and I haven't figured out what your goals are. I don't even know if you're just trying to move one PDF file or move one PDF file for each directory that is in the directory named /home/cmccabe/folder .

Since you aren't checking the return codes from the functions you call in awk , you don't know whether or not anything matched. You assume that it did and at least sometimes it doesn't.

In post #1 you said you wanted to produce the output:

Auto_user_S5_0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf  ---> R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
Auto_user_S5_0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf ---> R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions

but I don't see how that relates to making a copy of the oldest file, why there would be two lines of output related to making a copy of the oldest file, nor why the spacing is different on the desired two lines of output.

1 Like
Auto_user_S5_0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf  ---> R_2019_01_30_14_24_53_user_S5-0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
Auto_user_S5_0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf ---> R_2019_01_30_14_24_53_user_S5-0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions

Yes I am just trying to move or copy the .pdf with a partial match (portion in green) to the folder with the partial match (other portion in green).

I will use more descriptive variables in my codes. $filename is a path with the folder name in it. The path is removed from it using basename and the $folder substing is cut from it (user_S5-0271-96-v5.6). There is exactly 1 pdf in the directory with this substring in it (there may be multiple folders and pdf's with different names). I am trying to copy or move the pdf with the common substring into the folder.

If there are 10 folders in /home/cmccabe/folder , there will be 10 partial matching pdf in there as well. The substring will be unique (that is only be a partial match to 1 folder in the directory). If all 10 can be moved or copied to there folder that is the ultimate goal.

The find was because I thought I needed to perform the action (move or copy) one at a time and wanted to make sure all the names were sorted and trimmed . Thank you very much :).

But the portions shown in green don't match. The green portion in the PDF files contain two underscores and two hyphens. The green portion in the directories contain one underscore and three hyphens.

1 Like

I missed that, should be:

pdf ---> folder
0271-96-v5.6 0271-96-v5.6
0271-95-v5.6 0271-95-v5.6

Thank you :).

Maybe something a little faster and easier (i.e., no invocations of awk , basename , cut , find , or sort ), like:

#!/bin/bash
set -xv
cd /home/cmccabe/folder || exit 1
for pdf in *.pdf
do	# Extract common component from the name...
	key=${pdf#*_*_*_}	# Get rid of 1st 3 underscores and everything
				# before them.
	key=${key%%_*}		# Get rid of the next underscore and everything
				# after it.

	# Copy PDF file to directory with a name containing the key.
	cp "$pdf" *"$key"*/
done

This was tested with both bash and ksh on macOS Mojave, version 10.14.3 and seems to do what you want with the two sample PDF files and the two sample directory names provided in this thread. The PDF files were copied to the expected directories.

1 Like

works perfect.... thank you :).

What does pdf# in the key do? Thank you :).

In the code:

for pdf in *.pdf
do	# Extract common component from the name...
	key=${pdf#*_*_*_}	# Get rid of 1st 3 underscores and everything
				# before them.

$pdf expands to the name of the PDF file being processed on this iteration through the for loop. The first value assigned to key is the expansion of the contents of the value assigned pdf with the shortest string matching the filename matching pattern *_*_*_ removed from the beginning of the string if that pattern matches the value assigned to pdf , otherwise it returns the entire value assigned to pdf . So, with your sample PDF filename:

Auto_user_S5_0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf

used on the first time through the loop, key is initially assigned the value:

0271-95-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf

without the text shown in red that is matched by the pattern *_*_*_ and on the 2nd time though the loop with pdf assigned the 2nd PDF filename:

Auto_user_S5_0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf

key will initially be assigned the value:

0271-96-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions.pdf

without the text marked in red that matches the same filename matching pattern.