Locate the files in the first column and copy the files in 2nd column

kenshinhimura · March 1, 2018, 12:14pm

#cat data.txt

file1   folder1
file2   thisforfile2
file3   thisfolderforfile3
lata4 folder4

step 1: create the folder first in column 2

for i in `awk '{print $2}' data.txt`
do
mkdir /home/data/$i
done

step 2: locate the files in column1 and stored them into a file

for i in `awk '{print $1}' data.txt`
do
locate $i >> locate.txt

step 3:

HOW TO COPY NOW?

Expected Output:

output:

/home/data/folder1/file1
/home/data/thisforfile2/file2
/home/data/thisfolderforfile3/file3
/home/data/folder4/lada4

Aia · March 1, 2018, 1:19pm

Maybe, all in just one script?

Note: Not tested

set -x # to output to screen the execution of the script

while read f d
do
    if [[ -n $d ]] && [[ -n $f ]]
    then
        filepath=$(locate $f)
        if [[ -e $filepath ]]
        then
            mkdir -p "/home/data/$d" && cp "$filepath" "/home/data/$d"
        fi
    fi
done < data.txt

kenshinhimura · March 1, 2018, 1:58pm

not working,
locate/find.. it will locate the files anywhere in the system, the file1,file2 could be in different directories..

Aia · March 1, 2018, 2:06pm

Substitute that
filepath=$(location $f)
for
filepath=$(locate $f)

I am correcting the misspelling of the command in the original.

abdulbadii · March 2, 2018, 4:04am

while read -r a b
do
 unset bd af
 [[ $b ]] && bd=`locate $b`
 [[ $bd ]] &&{ mkdir -p "/home/data/$b";: } || { echo "Not exist $b"; exit; }
 [[ $a ]] af=`locate $a`
 [[ $af  ]] && cp -bfp "$a" "/home/data/$b"
done < data.txt

kenshinhimura · March 5, 2018, 12:41pm

it works now, but it only copy the last file to the last folder.

I was expecting something like this

output:

/home/data/folder1/file1
/home/data/thisforfile2/file2
/home/data/thisfolderforfile3/file3
/home/data/folder4/lada4

But in your script i only have this

/home/data/folder4/lada4

Aia · March 5, 2018, 1:04pm

Please, post the script that you are running.
Please, post the output from your screen of running the script.
I presume you set -x in it, in fact make that line set -xv to be even more verbose and post the result to help troubleshoot.

kenshinhimura · March 5, 2018, 1:54pm

+ read f d
+ [[ -n folder1 ]]
+ [[ -n file1 ]]
++ locate file1
+ filepath='/home/aaa/a/file1
/home/do/file1
/home/files/file1
/var/lib/mysql/ib_logfile1'
+ [[ -e /home/aaa/a/file1
/home/do/file1
/home/files/file1
/var/lib/mysql/ib_logfile1 ]]
+ read f d
+ [[ -n thisforfile2 ]]
+ [[ -n file2 ]]
++ locate file2
+ filepath='/home/bbb/11/22/file2
/home/data/thisforfile2
/home/files/file2'
+ [[ -e /home/bbb/11/22/file2
/home/data/thisforfile2
/home/files/file2 ]]
+ read f d
+ [[ -n thisfolderforfile3 ]]
+ [[ -n file3 ]]
++ locate file3
+ filepath='/home/data/thisfolderforfile3
/home/ttt/file3
/home/files/file3'
+ [[ -e /home/data/thisfolderforfile3
/home/ttt/file3
/home/files/file3 ]]
+ read f d
+ [[ -n folder4 ]]
+ [[ -n lada4 ]]
++ locate lada4
+ filepath=/home/vv/lada4
+ [[ -e /home/vv/lada4 ]]
+ mkdir -p /home/data/folder4
+ cp /home/vv/lada4 /home/data/folder4
+ read f d

Aia · March 5, 2018, 2:08pm

kenshinhimura:

+ read f d
+ [[ -n folder1 ]]
+ [[ -n file1 ]]
++ locate file1
+ filepath='/home/aaa/a/file1
/home/do/file1
/home/files/file1
/var/lib/mysql/ib_logfile1'
+ [[ -e /home/aaa/a/file1
/home/do/file1
/home/files/file1
/var/lib/mysql/ib_logfile1 ]] # Here's checking if the multiple line is an actual path, it will fail the condition.
+ read f d
+ [[ -n thisforfile2 ]]
+ [[ -n file2 ]]
++ locate file2
+ filepath='/home/bbb/11/22/file2
/home/data/thisforfile2
/home/files/file2'
+ [[ -e /home/bbb/11/22/file2
/home/data/thisforfile2
/home/files/file2 ]]
+ read f d
+ [[ -n thisfolderforfile3 ]]
+ [[ -n file3 ]]
++ locate file3
+ filepath='/home/data/thisfolderforfile3
/home/ttt/file3
/home/files/file3'
+ [[ -e /home/data/thisfolderforfile3
/home/ttt/file3
/home/files/file3 ]]
+ read f d
+ [[ -n folder4 ]]
+ [[ -n lada4 ]]
++ locate lada4
+ filepath=/home/vv/lada4
+ [[ -e /home/vv/lada4 ]]
+ mkdir -p /home/data/folder4
+ cp /home/vv/lada4 /home/data/folder4
+ read f d

That shows the risk of using the command locate for what you want. It is returning multiple paths where the filename matches (concatenated as an string with newlines) and since, on purpose, I am not accepting the return as valid, I am checking if that result is an actual existent path, only [[ -e /home/vv/lada4 ]] is real.

kenshinhimura · March 5, 2018, 2:12pm

yup they are duplicate files in multiple locations..i intend to do that.. so it should be copied on the specied folder..

Aia · March 5, 2018, 2:28pm

You can always loop through the result of locate if you want to accept its result as "duplication" but if it has the same filename you'll be just overwriting them at destination. But I suspect that would not be the case, neither, since they could be a partial match or even not a file:

/home/files/file1
/var/lib/mysql/ib_logfile1

/home/bbb/11/22/file2
/home/data/thisforfile2

/home/data/thisfolderforfile3
/home/ttt/file3

The point is that you do not have any guarantee that what locate is giving is what you expect.

kenshinhimura · March 6, 2018, 1:51pm

ITs really hard right? because that is just an example..the real data is 100 files..

What i have done in the past is to run the 2 for loop and copy manullay to specific folder.

RudiC · March 6, 2018, 2:12pm

Your statement / request is not quite clear. You use locate yourself in post#1. Do you want all files locate d (including the ones whose file names are supersets of the search term) to be copied to your target directory given in your data file? If several files with identical file names exist, they will overwrite each other - which one should survive?
To give you a starting point, you might want to consider / analyse this:

while read FN DN
    do  [ -d $DN ] || echo mkdir $DN
        for LN in $(locate $FN)
          do    [ ! $FN = ${LN##*/} ] && continue
                echo cp $LN $DN
          done
    done < datafile

It will check the resp. file name against the one in datafile and copy only if identical, but will not check for overwriting. Directories are checked for existence and created if non-existent. It will of course depend on the locate-DB to be up to date, and on those names not containing white space as these would confuse the for loop.
Give it a try an comment back

Aia · March 6, 2018, 3:00pm

Not truly hard. The hard part is for you to recognize the way that you discern what files need to be copied when you do it manually and communicate it in a way that can be translated into an automation script without getting unexpected results. I hope I have clearly pointed out that accepting the result from locate is not it.

For example the snippet below it might be alright if you understand that locate MUST never return a matched directory, a partial match in filename or directory name and never returns neither with spaces on them. Otherwise, you MUST accommodate for those conditions.

rudic:

To give you a starting point, you might want to consider / analyse this:

while read FN DN
   do  [ -d $DN ] || echo mkdir $DN
   for LN in $(locate $FN)
   do    [ ! $FN = ${LN##*/} ] && continue
   echo cp $LN $DN
   done
   done < datafile

Would this work?


homepath="/home/data"
while read f d; do
    if [[ -n $f ]] && [[ -n $d ]]; then
        paths=$(locate $f)
        while read -r p; do
            if [[ -f $p ]] && [[ ! -f ${homepath}/${d}/${p##*/} ]]; then
                 mkdir -p "${homepath}/${d}" && cp "$p" "${homepath}/${d}"
            fi
        done <<< "$paths"
    fi
done < data.txt

kenshinhimura · March 7, 2018, 11:25am

rudic:

Your statement / request is not quite clear. You use locate yourself in post#1. Do you want all files locate d (including the ones whose file names are supersets of the search term) to be copied to your target directory given in your data file? If several files with identical file names exist, they will overwrite each other - which one should survive?
To give you a starting point, you might want to consider / analyse this:
while read FN DN
   do  [ -d $DN ] || echo mkdir $DN
   for LN in $(locate $FN)
   do    [ ! $FN = ${LN##*/} ] && continue
   echo cp $LN $DN 
   done
   done < datafile
It will check the resp. file name against the one in datafile and copy only if identical, but will not check for overwriting. Directories are checked for existence and created if non-existent. It will of course depend on the locate-DB to be up to date, and on those names not containing white space as these would confuse the for loop.
Give it a try an comment back

Instead of locate i change it to "find /specific_directory -name"
Hi Rudi, care to share why it worked? Do you mind explaning it line by line please? Also why While? WHy not another for loop?
THanks

---------- Post updated at 12:21 PM ---------- Previous update was at 12:18 PM ----------

THis code belongs confuse me

while read f d

is the same with

awk '{print $1, $2}'

---------- Post updated at 12:25 PM ---------- Previous update was at 12:21 PM ----------

aia:

Not truly hard. The hard part is for you to recognize the way that you discern what files need to be copied when you do it manually and communicate it in a way that can be translated into an automation script without getting unexpected results. I hope I have clearly pointed out that accepting the result from locate is not it.

For example the snippet below it might be alright if you understand that locate MUST never return a matched directory, a partial match in filename or directory name and never returns neither with spaces on them. Otherwise, you MUST accommodate for those conditions.

Would this work?
homepath="/home/data"
while read f d; do
   if [[ -n $f ]] && [[ -n $d ]]; then
   paths=$(locate $f)
   while read -r p; do
   if [[ -f $p ]] && [[ ! -f ${homepath}/${d}/${p##*/} ]]; then
   mkdir -p "${homepath}/${d}" && cp "$p" "${homepath}/${d}"
   fi
   done <<< "$paths"
   fi
done < data.txt

Please explain why,

while read f d

will correspond to column 1 and column 2 in a file. If you can explain that to me. I will not rely on for loop anymore. ill start writing script with while.. i can do if statement and for loop..simple scripting for linux admin...but that while is really help ful,but need to understand why

Aia · March 7, 2018, 12:14pm

A for loop is _almost_ never a good choice to iterate over lines in a file, nor for dealing with file/directory paths since the default separator is white space. That means that if a filename or path contains spaces it might give you an undesirable result.

The read built-in command is good at reading lines and better at tokenization of the lines.
If you give it one variable, it will read the whole line into it.
If you give it two variables it will split the line into two substrings at the first space
If you give it three variables it will split the line into three substrings at the first and second space.
And so on.
read doesn't loop, it reads once. Thus the while loop to help read to imitate an iteration of all lines.

Summary:

Iterate over every line in datafile, splinting it into two tokens: f (filename) d (directoryname). The f and d could be any place holder names.

while read f d; do ... done < datafile

---------- Post updated at 10:14 AM ---------- Previous update was at 09:54 AM ----------

It would require that you know a bit about how AWK works. In this example each record is indexed into fields using white spaces as field separator and it is asking it to just output the values of fields one and two.

RudiC · March 7, 2018, 4:06pm

Little to add to Aia's excellent explanation. man bash and man awk help, as almost always. My proposal was a "proof of concept", not a full blown solution to your (not quite understood) problem.

while read FN DN                                        # read TWO fields from stdin (here redirected from datafile: file name - directory name)
    do  [ -d $DN ] || echo mkdir $DN                    # test for existence of target directory; create if missing (remove echo)
        for LN in $(locate $FN)                         # loop through matching names from locate-DB
          do    [ ! $FN = ${LN##*/} ] && continue       # test for non-exact match; skip cp if not identical
                echo cp $LN $DN                         # copy located name to target directory (echo for testing purposes; remove if OK)
          done                                          # end of for loop
    done < datafile                                     # end of while loop incl. redirection from datafile

Chubler_XL · March 7, 2018, 4:56pm

Not sure how you want the script to react if locate identifies more that one exact match for a file. RudiC's current script will silently overwrite the file with each subsequent identified match.

If you would like to keep the first match found and then move on to the next file replace:

echo cp $LN $DN                    # copy located name to target directory (echo for testing purposes; remove if OK)

with

echo cp $LN $DN && break           # copy located name to target directory (echo for testing purposes; remove if OK) and move onto next file

You could also detect the file was already there and report some sort of warning like Warning: Not replacing existing file xxx with /path/to/xxx