Is better way copy list of multiple files, rename and gzip

dotran · January 31, 2015, 7:20pm

Is better way to write the script loop through one by one "Old_File_1: & New_File_1:" to copy 100 files to /staging/test folder then re-name & gzip all those files? I wrote this code below and don't like that much. Thanks
I have a control_file under /tmp/test folder like below 100 files and {DATE_FILE} = YYYYMMDD

/tmp/test: cat control_file
Old_File_1: AB_P_Cdf_{DATE_FILE}.txt 
Old_File-2: DD_P_DAdf_{DATE_FILE}.txt
Old_File-3: dsf_P_DEMO_{DATE_FILE}.txt
Old_File-4: sdfd_P_Pd_{DATE_FILE}.txt
bla bla until Old_File-100
 
New_File_1: test1_sd_WW_{DATE_FILE}.txt
New_File_2: test2vd_WW_new_{DATE_FILE}.txt
New_File_3: test3cfd_dfP_dff_{DATE_FILE}.txt
New_File_4: test4gdd_WW_P_OdfsDUCT_{DATE_FILE}.txt
bla bla until New_File-100

 
#!/bin/ksh
DATE="$1"
Old_File=`cat /tmp/test/controlfile | grep Old_File | awk '{print $2}' | sed "s/{DATE_FILE}/${DATE}/g"`
#Old_File=`cat /tmp/test/controlfile | grep Old_File | awk '{print $2}' | sed "s/{DATE_FILE}/${1}/"`
New_File=`cat /tmp/test/controlfile | grep New_File | awk '{print $2}' | sed "s/{DATE_FILE}/${DATE}/g"`
#New_File=`cat /tmp/test/controlfile | grep New_File | awk '{print $2}' | sed "s/{DATE_FILE}/${1}/"`
 
###################################
# Copy file to /staging/test folder
###################################
cd /tmp/test
#Old_File=`cat /tmp/test/controlfile | grep Old_File | awk '{print $2}' | sed "s/{DATE_FILE}/${DATE}/g`
for i in $Old_File;
do
cp $i /staging/test;
done
 
###################################
# Rename all Old File to New File
###################################
cd /staging/test
cat /tmp/test/controlfile | grep Old_File | awk '{print $2}' | sed "s/{DATE_FILE}/${DATE}/g" > /staging/test/file1.txt
cat /tmp/test/controlfile | grep New_File | awk '{print $2}' | sed "s/{DATE_FILE}/${DATE}/g" > /staging/test/file2.txt
 
paste -d" " file1.txt file2.txt > file3.txt
sed 's/^/mv /' file3.txt > file4.txt
chmod 775 file4.txt
./file4.txt
rm -f file*.txt
###################################
# Gzip all New File
###################################
for i in $New_File;
do
gzip $i
done

Don_Cragun · February 1, 2015, 1:48am

This seems to run a little bit faster than your script and should produce the same results:

#!/bin/ksh
date="${1:-$(date '+%Y%m%d')}"	# Date to process ($1 or today if no operands specified)
from='/tmp/test'		# Source directory
to='/staging/test'		# Target directory

cd "$to"
awk -v date="$date" -v from="$from" -v to="$to" '
{	sub(/[{]DATE_FILE[}]/, date)	# Replace "{DATE_FILE}" with desired date
}
/^Old_File/ {
	o[++oc] = $2	# Accumulate old file names.
	next
}
/^New_File/ {
	# Process new file names...
	++nc	# Increment # of new file names seen
	printf("cp %s/%s %s\n", from, o[nc], $2)	# Print cp command
	printf("gzip %s\n", $2)	# Print gzip command
}' "$from/controlfile" | ksh

I would suggest that you remove the | ksh at the end of the script to see the cp and gzip command that the script will produce. Then, if the commands look right, put the pipe through the shell back in to actually execute the command instead of just printing them.

If you want to run this on a Solaris/SunOS system, change awk in the script to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

dotran · February 1, 2015, 2:49am

Thank you for you new code Mr Don....but I could not make it work...I did tried all the option below and still can't pass that awk command.

 
#!/bin/ksh
#date="${1:-$(date '+%Y%m%d')}" # Date to process ($1 or today if no operands specified)
date="${1}"                     # Date to process ($1 or today if no operands specified)
from='/tmp/test'  # Source directory
to='/staging/test'  # Target directory
cd "$to"
#nawk -v date="$date" -v from="$from" -v to="$to" '
#/usr/bin/awk -v date="$date" -v from="$from" -v to="$to" '
#/iusr/xpg4/bin/awk -v date="$date" -v from="$from" -v to="$to" '
#/usr/xpg6/bin/awk -v date="$date" -v from="$from" -v to="$to" '
/usr/bin/awk -v date="$date" -v from="$from" -v to="$to" '
{       sub(/[{]DATE_FILE[}]/, date)    # Replace "{DATE_FILE}" with desired date
}
/^Old_File/ {
        o[++oc] = $2    # Accumulate old file names.
        next
}
/^New_File/ {
        # Process new file names...
        ++nc    # Increment # of new file names seen
        printf("cp %s/%s %s\n", from, o[nc], $2)        # Print cp command
        printf("gzip %s\n", $2) # Print gzip command
}' "$from/controlfile" | ksh

./test4.ksh 20150109
awk: syntax error near line 1
awk: bailing out near line 1

 
uname -a
SunOS test 5.10 Generic_147147-26 sun4v sparc sun4v

Don_Cragun · February 1, 2015, 2:54am

Sorry,
I had a typo, it should have been /usr/xpg4/bin/awk instead of /iusr/xpg4/bin/awk . But, I'm very surprised that nawk didn't work for you. (What diagnostics did you get when you tried nawk and /usr/xpg6/bin/awk .)

dotran · February 1, 2015, 3:14am

Thanks Mr. Don.....I did try nawk and worked great. Thanks very much !!!

/usr/bin/nawk -v date="$date" -v from="$from" -v to="$to"

RudiC · February 1, 2015, 8:19am

In lieu of relying on the correct sequence of filenames in the control file, using file numbering might be more reliable (an assumption that is not true looking at the control file in post#1). Try

awk -vDATE="20150201" -vfrom="/tmp/" -vto="/staging/"\    
        '       {sub(/[{]DATE_FILE[}]/, DATE)
                 ONW=("Old"==substr ($1,1,3))
                 FIX=substr($1,5)
                 CPARR[FIX] = CPARR[FIX] (ONW?"cp " from:" "to) $2 
                }
         END    {for (c in CPARR) print CPARR[c]}
        ' file
cp /tmp/AB_P_Cdf_20150201.txt /staging/test1_sd_WW_20150201.txt
cp /tmp/DD_P_DAdf_20150201.txt /staging/test2vd_WW_new_20150201.txt
cp /tmp/dsf_P_DEMO_20150201.txt /staging/test3cfd_dfP_dff_20150201.txt
cp /tmp/sdfd_P_Pd_20150201.txt /staging/test4gdd_WW_P_OdfsDUCT_20150201.txt

dotran · February 2, 2015, 11:26pm

Please please help with this code. I spend couple days do all kinda stuffs and can't figure out how change this code (copy, rename and pkzip all the files) instead (copy, rename and gzip all the files). This code below work really great.....but require pkzip instead gzip the files.

pkzip -add test1_sd_WW_{DATE_FILE}.zip AB_P_Cdf_{DATE_FILE}.txt

 
#!/bin/ksh
date="${1}"                     # Date to process ($1 or today if no operands specified)
from='/tmp/test'  # Source directory
to='/staging/test'  # Target directory
cd "$to"
/usr/bin/nawk -v date="$date" -v from="$from" -v to="$to" '
{       sub(/[{]DATE_FILE[}]/, date)    # Replace "{DATE_FILE}" with desired date
}
/^Old_File/ {
        o[++oc] = $2    # Accumulate old file names.
        next
}
/^New_File/ {
        # Process new file names...
        ++nc    # Increment # of new file names seen
        printf("cp %s/%s %s\n", from, o[nc], $2)        # Print cp command
        printf("gzip %s\n", $2) # Print gzip command
}' "$from/controlfile" | ksh

Out put:

 
test1_sd_WW_{DATE_FILE}.zip
test2vd_WW_new_{DATE_FILE}.zip
test3cfd_dfP_dff_{DATE_FILE}.zip
test4gdd_WW_P_OdfsDUCT_{DATE_FILE}.zip
bla bla until New_File-100

Don_Cragun · February 3, 2015, 8:14am

I don't understand what you're trying to do.

Do you mean that you want to use pkzip to create a zip file for each input file using its OLD filename with its filename extension replaced by .zip as the name of the zip file and the contents of that zip file will be the compressed NEW filename?

Do you mean that you want to use pkzip to create a zip file for all input files for a given date using the name of the first OLD filename with its filename extension replaced by .zip as the name of the zip file and the contents of that zip file with be all of the compressed NEW filenames for the given date?

If you're going to the bother of renaming your files, why should the zip file be named to an untranslated OLD name instead of the corresponding NEW name? If you're creating one zip for for all of the files for a given date, why not just use YYYYMMDD.zip (where YYYYMMDD corresponds to the date of the files in that zip file) as the zip file name?

dotran · February 3, 2015, 11:07am

Thanks very much Mr.Don for reply this subject....cause I can't figure what's the best way when use pkzip instead gzip. And out put like below.

Do you mean that you want to use pkzip to create a zip file for each input file using its OLD filename with its filename extension replaced by .zip as the name of the zip file and the contents of that zip file will be the compressed NEW filename? Yes...pkzip one by one and ouput with new name and without extention.txt. Thanks

/tmp/test: cat controlfile
Original_File_1: AB_P_Cdf_{OLD_DATE}.txt 
Original_File-2: DD_P_DAdf_{OLD_DATE}.txt
Original_File-3: dsf_P_DEMO_{OLD_DATE}.txt
Original_File-4: sdfd_P_Pd_{OLD_DATE}.txt
bla bla until Old_File-100
 
New_File_1: test1_sd_WW_{NEW_DATE}.txt
New_File_2: test2vd_WW_new_{NEW_DATE}.txt
New_File_3: test3cfd_dfP_dff_{NEW_DATE}.txt
New_File_4: test4gdd_WW_P_OdfsDUCT_{NEW_DATE}.txt
bla bla until New_File-100

 
#!/bin/ksh
TMP_DIR="$1"
CONTROL_FILE="$2"
ORIGINAL_DATE="$3"
TARGET_DATE="$4"
WORK_DIR="$5"
 
#TMP_DIR='/tmp/test'
#WORK_DIR='/staging/test'
 
cd "$TMP_DIR"
/usr/bin/nawk -v TMP_DIR="$1" -v CONTROL_FILE="controlfile" -v ORIGINAL_DATE="$3" -v TARGET_DATE="$4" -v WORK_DIR="$5" '
{       sub(/[{]OLD_DATE[}]/, ORIGINAL_DATE)
}
{       sub(/[{]NEW_DATE[}]/, TARGET_DATE)
}
/^Original_File/ {
        o[++oc] = $2
        next
}
/^New_File/ {
        # Process new file names...
        ++nc    
        printf("cp %s/%s %s\n", WORK_DIR, o[nc], $2)
        #printf("gzip %s\n", $2) 
Something replace with pkzip for each file and output with new name
}' "$WORK_DIR/$CONTROL_FILE" | ksh

./test.ksh /tmp/test controlfile 20150109 20150230 /staging/test

 
Example pkzip one by one....
pkzip -add test1_sd_WW_{NEW_DATE}.zip AB_P_Cdf_{OLD_DATE}.txt
pkzip -add test2vd_WW_new_{NEW_DATE}.zip DD_P_DAdf_{OLD_DATE}.txt
pkzip -add test3cfd_dfP_dff_{NEW_DATE}.zip dsf_P_DEMO_{OLD_DATE}.txt
pkzip -add test4gdd_WW_P_OdfsDUCT_{NEW_DATE}.zip sdfd_P_Pd_{OLD_DATE}.txt
bla bla until New_File-100 or 200 files

Output:

test1_sd_WW_{NEW_DATE}.zip
test2vd_WW_new_{NEW_DATE}.zip
test3cfd_dfP_dff_{NEW_DATE}.zip
test4gdd_WW_P_OdfsDUCT_{NEW_DATE}.zip
bla bla until New_File-100

sea · February 3, 2015, 12:20pm

This works for me localy:

#!/bin/bash
WORK_DIR="." #"$1"
CONTROL_FILE=controlfile #"$2"
ORIGINAL_DATE=2014.06.13 #"$3"
TARGET_DATE=2015.02.03 #"$4"
declare -i C	# Counter

cd "$WORK_DIR"
C=1

grep Original_File[_-] "$CONTROL_FILE" | \
	while read id entry ; do
		newFile=$(grep New_File[_-]${C} "$CONTROL_FILE"|awk '{print $2}')
		C=$(( $C + 1 ))
		
		# Change to false if there are no STRINGS OLD_DATE or NEW_DATE in the CONTROL_FILE.
		if [ true ]
		then	entry="${entry/\{OLD_DATE\}/$ORIGINAL_DATE}"
			newFile="${newFile/\{NEW_DATE\}/$TARGET_DATE}"
		fi
		
		echo cp "${entry}" "${newFile}"
		echo pkzip -add "${newFile/txt/zip}" "$entry"
		echo
	done

exit 0

Outputs as:

$ sh dotran-sea

cp AB_P_Cdf_2014.06.13.txt test1_sd_WW_2015.02.03.txt
pkzip -add test1_sd_WW_2015.02.03.zip AB_P_Cdf_2014.06.13.txt

cp DD_P_DAdf_2014.06.13.txt test2vd_WW_new_2015.02.03.txt
pkzip -add test2vd_WW_new_2015.02.03.zip DD_P_DAdf_2014.06.13.txt

cp dsf_P_DEMO_2014.06.13.txt test3cfd_dfP_dff_2015.02.03.txt
pkzip -add test3cfd_dfP_dff_2015.02.03.zip dsf_P_DEMO_2014.06.13.txt

cp sdfd_P_Pd_2014.06.13.txt test4gdd_WW_P_OdfsDUCT_2015.02.03.txt
pkzip -add test4gdd_WW_P_OdfsDUCT_2015.02.03.zip sdfd_P_Pd_2014.06.13.txt

Hope this helps

dotran · February 3, 2015, 12:49pm

Thanks very much Sea, Don & RudiC. Your code worked really well.....i spend 2 days and can't figure how to accomplish this task. You guy really helpful and make me understanding new way how to write the code. Again....thanks all !!!