Copy files based on specific word in a file name & its extension and putting it in required location

prajaktaraut · October 20, 2016, 7:40pm

Hello All,
Since i'm relatively new in shell script need your guidance.
I'm copying files manually based on a specific word in a file name and its extension and then moving it into some destination folder.
so if filename contains hyr word and it has .md and .db extension; it will move to TUM/HYR folder; if extension is .txt it should convert using dd command to edcdic and move it to TUM/HYR
if filename contains par word and it has .md and .db extension; it will move to TUM/par folder; if extension is .txt it should convert using dd command to edcdic and move it to TUM/par
if filename contains mar word and it has .md and .db extension; it will move to TUM/mar folder; if extension is .txt it should convert using dd command to edcdic and move it to TUM/mar

I have a folder which contains multiple sub folder.
Main folder name is MUM
sub folders are (there are multiple folder, but just for testing i have mentioned few folders only; script should be for any numbers of folder)

HYR
PAR
MAR
SAR

Now i have few files in the above subfolder. File name are

ttt_hyr_20162010.md
ttt_hyr_20162010.db
ttt_hyr_20162010.txt

ttt_par_20162010.md
ttt_par_20162010.db
ttt_par_20162010.txt

ttt_mar_20162010.md
ttt_mar_20162010.db
ttt_mar_20162010.txt

ttt_sar_20162010.md
ttt_sar_20162010.db
ttt_sar_20162010.txt

so for example the folder structure and the files in it will look like below

MUM/HYR/ttt_hyr_20162010.md
MUM/HYR/ttt_hyr_20162010.db
MUM/HYR/ttt_hyr_20162010.txt
MUM/PAR/ttt_par_20162010.md
MUM/PAR/ttt_par_20162010.db
MUM/PAR/ttt_par_20162010.txt

and so on..

I simply want to create a shell script to copy files from MUM/HYR to TUM/HYR , MUM/PAR to TUM/PAR , MUM/MAR to TUM/MAR and MUM/SAR to TUM/SAR folders based on a word in a filename and its extension

The script will read the MUM folder and its subfolders( HYR ) and the file name and from that file name it will read hyr word and copy that particular file which has particular word hyr and extension .md and .db and put it into the output folder( TUM/HYR ).
Same things should happen for other subfolder. The script will read the MUM folder and its subfolders called PAR and the file name and from that file name it will read par word and copy that particular file which has particular word par and extension .md and .db and put it into the output folder called TUM/PAR .
For .txt file in those particular subfolder; the files should get converted from ascii to ebcdic and then move to that particular output subfolder based on the particular word in a file hyr and

put that in TUM/HYR folder

For .txt file in those particular subfolder; the files should get converted from ascii to ebcdic and then move to that particular output subfolder based on the particular word in a file par and

put that in TUM/PAR folder

After every file copy error and success log should be created

After the Script is executed the output folder will look like below

TUM/HYR/ttt_hyr_20162010.md
TUM/HYR/ttt_hyr_20162010.db
MUM/HYR/ttt_hyr_20162010.dat
TUM/PAR/ttt_par_20162010.md
TUM/PAR/ttt_par_20162010.db
TUM/HYR/ttt_hyr_20162010.dat

and so on..

below is the code im in need of

#!/bin/bash
DATESTMP="`date +%m%d%y%H%M`"
OGIT="/usr/local/testing/log.${DATESTMP}"
touch $OGIT
echo " SCRIPT STARTED AT `date` " >$OGIT
chmod 666 "$OGIT"
MRPT=/usr/local/resting




echo " ############# START OF AUTOMATION SCRIPT FOR MOVING FILES at `date` ################# "

cd MUM/HYR
if [filename contains hyr and has extension .md and db]; then
cp MUM/HYR/*.md TUM/HYR 
cp MUM/HYR/*.db TUM/HYR

if [filename contains hyr and has extension .txt]; then
dd  if=text.ascii of=text.ebcdic conv=ebcdic
cp MUM/HYR/*.txt TUM/HYR 

cd MUM/PAR
if [filename contains hyr and has extension .md and db]; then
cp MUM/PAR/*.md TUM/HYR 
cp MUM/PAR/*.db TUM/HYR


if [filename contains par and has extension .txt]; then
dd  if=text.ascii of=text.ebcdic conv=ebcdic
cp TUM/PAR/*.txt TUM/PAR


echo " ############# END OF AUTOMATION SCRIPT FOR MOVING FILES at `date` ################# "

The script should look for all the sub folders and copy files of particular word in that file and its extension and put it into output or destination folder.

Chubler_XL · October 20, 2016, 8:28pm

This might get you started:

find MUM/HYR -type f -name "*hyr*" \( -name "*.db" -o -name "*.md" \) -print0 |
  xargs -r -0 cp -t TUM/HYR

cd MUM/HYR
for file in *hyr*.txt
do
   [ -f "$file" ] || continue
   dd if="$file" of="${file}.tmp" conv=edcdic
   mv "${file}.tmp" ../../TUM/HYR/"$file"
done
cd ../..

The [ -f "$file" ] || continue line above traps when not *hyr*.txt files exist in the source folder.
-r option of xargs stops running the cp command when input is empty (no files detected by find command).

prajaktaraut · October 20, 2016, 9:04pm

Thanks Chubler for the quick response.
But the above script is only looking specifically into MUM/HYR folder...but I'm looking out for a script which will look the main folder MUM and then all the subfolders and then based on the word in the filename and extension the file will be copied to destination folder...
Pls refer files names and it's extension, its related subfolders as well...
So this script will be considered as a automated scripts where in if it has any subfolders and files in it, files will be copied based on above logic and copied to destination folders..
Also once one by one copying is done error log or success log shud be created

Chubler_XL · October 20, 2016, 10:20pm

So what we should do is take those two conversion scripts that work for HYR and extend them to work on your 4 various types.

Using a for loop should get us there:

for typ in hyr par mar sar
do
  TYP=$(echo $typ | tr '[:lower:]' '[:upper:]')

  find MUM -type f -name "*${typ}*" \( -name "*.db" -o -name "*.md" \) -print0 |
  xargs -r -0 cp -t TUM/${TYP}

  find MUM -type f -name "*${typ}*.txt" -print0 | while read file
  do
      dd if="$file" of="${file}.tmp" conv=edcdic
      dest=TUM/${TYP}/${file##*/}
      mv "${file}.tmp" "$dest"
  done
done 2>> ${OGIT}

Here we are also appending stderr to your OGIT logfile. If you are after a log of what was done (not just the failed stuff) consider adding the -v (verbose) option to the cp and mv commands above.

prajaktaraut · October 21, 2016, 7:35am

The challenge is I don't hv 4 there , I almost have more that 35..
HYR PAR MAR SAR & so on
This is a challenge for me...so the script will check in every subfolders and will search a word hyr in HYR folder and will check the .db and .md and then copy in destination folder.

RudiC · October 21, 2016, 8:19am

Are there ONLY those files that are to be transferred, or others as well that should remain? Is the file name structure always ccc_TYP_DATE.EXT ?
And, what bash version do you run?

Does the (uppercase) directory name always coincide with the filename's three char (lower case) TYP fragment?

RudiC · October 21, 2016, 3:20pm

Based on a few assumptions as answers to above questions, try

for i in MUM/*/*
  do    EXT=${i#*.}
        FP=${i%.*}
        [ $EXT == "txt" ] && { CV="conv=ebcdic"; EXT="dat"; } || CV=""
        echo dd if=$i of=${FP/MUM/TUM}.$EXT $CV
  done
dd if=MUM/HYR/ttt_hyr_20162010.db of=TUM/HYR/ttt_hyr_20162010.db
dd if=MUM/HYR/ttt_hyr_20162010.md of=TUM/HYR/ttt_hyr_20162010.md
dd if=MUM/HYR/ttt_hyr_20162010.txt of=TUM/HYR/ttt_hyr_20162010.dat conv=ebcdic
dd if=MUM/MAR/ttt_mar_20162010.db of=TUM/MAR/ttt_mar_20162010.db
dd if=MUM/MAR/ttt_mar_20162010.md of=TUM/MAR/ttt_mar_20162010.md
dd if=MUM/MAR/ttt_mar_20162010.txt of=TUM/MAR/ttt_mar_20162010.dat conv=ebcdic
dd if=MUM/PAR/ttt_par_20162010.db of=TUM/PAR/ttt_par_20162010.db
dd if=MUM/PAR/ttt_par_20162010.md of=TUM/PAR/ttt_par_20162010.md
dd if=MUM/PAR/ttt_par_20162010.txt of=TUM/PAR/ttt_par_20162010.dat conv=ebcdic
dd if=MUM/SAR/ttt_sar_20162010.db of=TUM/SAR/ttt_sar_20162010.db
dd if=MUM/SAR/ttt_sar_20162010.md of=TUM/SAR/ttt_sar_20162010.md
dd if=MUM/SAR/ttt_sar_20162010.txt of=TUM/SAR/ttt_sar_20162010.dat conv=ebcdic

You may want to play with dd 's status=xxx operand to influence the amount of info printed to stderr...

prajaktaraut · October 21, 2016, 8:17pm

There can be multiple files, and the structure of the filename can be different but it has hyr in the filename. So if filename contains hyr word and it has .md and .db extension; it will move to TUM/HYR folder; if extension is .txt it should convert using dd command to edcdic and move it to TUM/HYR
if filename contains par word and it has .md and .db extension; it will move to TUM/par folder; if extension is .txt it should convert using dd command to edcdic and move it to TUM/PAR
if filename contains mar word and it has .md and .db extension; it will move to TUM/mar folder; if extension is .txt it should convert using dd command to edcdic and move it to TUM/MAR..
Can anyone help me with the entire script so that I can test it...

One important thing I missed is; all these files and folders are in Hadoop HDFS.

Main folder is MUM
SUBFOLDERS ARE HYR PAR MAR... there can be multiple such subfolders
But the subfolders and the files in it has a match with filename,like if HYR is a subfolder it will hv a file name which has hyr_20162001.md and db in it, so the script can pick word hyr from filename.

So script will read main MUM folder and the all the subfolders one by one and copy file into out directory which is :-
TUM/HYR for hyr files with .md .d
TUM/PAR for par files with .md .db
TUM/MAR for mar files with .md .db
TUM/SAR for sar files with .md .db
remember there can be other subfolders as well but it has a standard like lets assume I hv other subfolders called ABC so the file in it will be abc_20162019.db and md and it will be copied into TUM/MUM folder..

Note all these files and folders are in Hadoop so it will be Hadoop commands. Not sure if above unix command like find and all will work or not

RudiC · October 22, 2016, 4:11am

Just repeating the info given in post#1 doesn't really help. Info on the contents of directories might be extremely beneficial, like will there be "hyr" files in the "PAR" subdir, and e.g. a (partial) directory listing.
Adding lately that *nix commands probably don't work is, hmmm, debatable.

prajaktaraut · October 22, 2016, 6:09am

Apologies RudiC, to answer to ur question.
hyr file name will be there in HYR subfolder, par filename will be in PAR subfolder and so on... I'm not sure about unix command will work or not... Actually these main folder MUM and it's subfolders N files are in Hadoop and I want to copy these to destination folders as mentioned in my earlier post...
Hope that clarifies...

prajaktaraut · October 23, 2016, 8:23am

Hello all,
A quick solution will let me move ahead...
Do share a script for further analysis

Don_Cragun · October 23, 2016, 7:15pm

The UNIX & Linux Forums is here to help you learn how to write your own code; the volunteers here are not your unpaid programming staff.

RudiC has given you code that you should be able to mold into something that will meet your requirements. RudiC also asked questions that you have not answered that would provide information that would greatly simplify code that I would write if I were trying to solve your problem. You were asked to show us the output of ls (which would clearly demonstrate the format of your filenames). You have shown us one relative pathname ( TUM/ABC/abc_20162019.db ), if all of the filenames you want to process follow this filename format (i.e., 3 lowercase letters; <underscore>; 6 decimal digits; and one of the three filename extensions: .db , .md , and .txt ), then code could be significantly simplified. If we knew that there are no files in those directories that do not conform to the above filename format, code could be simpler still.

If you would like our help to complete your project, please answer the questions that have been asked, show us the code you have tried, show us what is working, and explain clearly what still needs to be done.

prajaktaraut · October 23, 2016, 7:55pm

Apology Don Cragun and RudiC. But I did replied to the questions raised by RudiC. File name will contain project_abc_20162012.db & md and .txt as well.
So again if I would reiterate, the main Folder is MUM and the subfolder is ABC anf this subfolder u will hv 3 or more files and these files will be copied to output folder which is TUM/ABC by searching abc word from the filename and .db & .md in it.
This folder can contain other files as well but they will stay there in the same folder only the file which matches the above criteria will move to output folders.
As of now I'm copying these files manually...
I search files in MUM/ABC folder which contains abc word in filename and which has .md and .db extension and copy those to TUM/ABC folders.
Same process I do it for other subfolders par sar mar.
Key point- the folder name has the relavant filenaming convention files in it.like PAR subfolder will hv project_par_20162010.db .md and .txt files in it which has to be moved to TUM/ PAR destination folder.
Iam hoping this clarifies..

Most important thing- all these files and folders are in Hadoop HDFS.

Sincere apology for the trouble...

RudiC · October 24, 2016, 7:43am

Try to replace for i in MUM/*/* in post#7 with

for i in MUM/*/*.{db,md,txt}

and post where this approach doesn't meet your needs. This is all I can (and will) do until you stop repeating the well known info and give the necessary ones.