Hi Chubler_XL
Thanks, I will try this now. Will let you know how it works. I will put this in a script and can call from my DataStage process. Let me write this as a script and then I will post the results here.
In the mean while if others have any other suggestions, please keep posting, I will try everything. This is really great.
Thanks again for every ones help.
---------- Post updated 11-23-14 at 02:06 AM ---------- Previous update was 11-22-14 at 07:38 PM ----------
Hi, this is what I have done.
I'm using a 3 GB file to test my process. But the script hangs after: "echo "Checking ${2} file size now:""
Not sure what to do, please correct me if I've done something wrong.
Here is the script:
#!/bin/bash
# usage:
# check for input:
if [ ! $# == 3 ]; then
echo "Input Parameter missing."
fi
#Main Logic Begins:
clear
#echo Input Parameters:
echo "**********************************************************************************************"
echo "Main Source file is located in: $1 \n"
echo "Currently processing file: $2 \n"
echo "All the split files will be located at: $3 \n"
echo "**********************************************************************************************"
#Check if Split file directory exists:
if [ -d "${3}" ];
then
echo "Split file directory Exist, So deleting Directory and its contents \n"
rm -rf ${3};
else
echo "No Split file directory present \n";
fi
# Create New directory to place split files
echo
echo "Create New directory to place split files. \n"
mkdir ${3}
chmod 777 ${3}
if [ -d "${3}" ];
then
echo "Split file directory created successfully \n"
echo "Split file directory Permission set to 777 \n"
else
echo "Split File Directory creation failed \n";
fi
# Check input file size:
echo "Checking ${2} file size now:"
for ifile in ${2}
do
ipsize=$(istat "$ifile" | awk '/Length/ {print $(NF-1)}')
echo "Total file size in Byetes: $ipsize \n"
if [ $ipsize -gt 1000000000 ]
then
lines=$(wc -l < "$iflie")
let avg=ipsize/lines
let splitcount=5000000000/avg
split -l $splitcount -a1 -verbose "$ifile" "${3}/TT_$2"
fi
done
echo "Total Row Count in ${2}: $lines \n"
echo "Average Row lenght in ${2}: $avg \n"
echo "Row count per split file is: $splitcount \n"
echo "Total split files and row counts \n"
wc -l ${3}/TT_$2*
---------- Post updated at 02:43 AM ---------- Previous update was at 02:06 AM ----------
Hi
I made some changes to the script, since the split command didn't work properly, now its working fine:
#!/bin/bash
# usage:
# sh ./[script] [inputfile] [row count]
# check for input:
if [ ! $# == 3 ]; then
echo "Input Parameter missing."
fi
#Main Logic Begins:
clear
#echo Input Parameters:
echo "**********************************************************************************************"
echo "Main Source file is located in: $1 \n"
echo "Currently processing file: $2 \n"
echo "All the split files will be located at: $3 \n"
echo "**********************************************************************************************"
#Check if Split file directory exists:
if [ -d "${3}" ];
then
echo "Split file directory Exist, So deleting Directory and its contents \n"
rm -rf ${3};
else
echo "No Split file directory present \n";
fi
# Create New directory to place split files
echo
echo "Create New directory to place split files. \n"
mkdir ${3}
chmod 777 ${3}
if [ -d "${3}" ];
then
echo "Split file directory created successfully \n"
echo "Split file directory Permission set to 777 \n"
else
echo "Split File Directory creation failed \n";
fi
# Check input file size:
echo "Checking ${2} file size now:"
for ifile in ${2}
do
ipsize=$(istat "$ifile" | awk '/Length/ {print $(NF-1)}')
echo "Total file size in Byetes: $ipsize \n"
if [ $ipsize -gt 1000000000 ]
then
lines=$(wc -l < "$ifile")
echo "Total Row Count in ${2}: $lines \n"
let avg=`expr ${ipsize} / ${lines}`
echo "Average Row lenght in ${2}: $avg \n"
let splitcount=1000000000/avg
echo "Row count per split file is: $splitcount \n"
split -l $splitcount "$ifile" "${3}/TT_$2"
#-a1 --verbose
echo "Total split files and row counts \n"
wc -l ${3}/TT_$2*
fi
done
and then I get the following results:
**********************************************************************************************
Main Source file is located in: /some/dir/path
Currently processing file: inputfile.dat
All the split files will be located at: /some/dir/path/splitdir
**********************************************************************************************
Split file directory Exist, So deleting Directory and its contents
Create New directory to place split files.
Split file directory created successfully
Split file directory Permission set to 777
Checking inputfile.dat file size now:
Total file size in Byetes: 3329056768
Total Row Count in inputfile.dat: 2684723
Average Row lenght in inputfile.dat: 1240
Row count per split file is: 806451
Total split files and row counts
806451 /some/dir/path/splitdir/TT_inputfile.dataa
806451 /some/dir/path/splitdir/TT_inputfile.datab
806451 /some/dir/path/splitdir/TT_inputfile.datac
265370 /some/dir/path/splitdir/TT_inputfile.datad
2684723 total
Can somebody help me how to add additional features like, log all the messages or steps, then if the file size is less than 1 GB, then I want to send a note that file size is less than 1GB and exit. Also when ever this script fails, I want to capture all the steps that were executed, and then send it in email.
thanks