Hi All,
This is my first post here. Hoping to share and gain knowledge from this great forum !!!!
I've scanned this forum before posting my problem here, but I'm afraid I couldn't find any thread that addresses this exact problem.
I'm trying to split a large XML file (with multiple tag sets) into smaller files of equal size so that the splitting doesn't happen between tags, i.e. I'm trying to have a complete tag set in a file. The size limit of the smaller files is specified in a parameter file. For example, if the size limit is 100 KB, and the Large file is 440 KB, I should have five smaller files of sizes 100 KB,100 KB,100 KB,100 KB and 40 KB.
My initial approach was to create the large file with all the complete tag sets in a single line each, and then to use the split function based on the size limit. However, the complete tag sets are not getting accommodated in single lines since the XMLs are itself Huge. So I was thinking of splitting the large file based on tags, as well as within the size limit.
Below is what I tried to do so far
#!/bin/bash
export ORACLE_HOME=.........
export ORACLE_SID=...........
export PATH=........
. ./params # contains the parameter sizelimit
FILE="datafile.txt"
sqlplus -s userid/password@DB <<EOF
SET HEADING OFF
SET PAGESIZE 0
SET LINESIZE 32000
SET LONG 32000
SET NEWPAGE NONE
SET FEEDBACK OFF
SET TRIMSPOOL ON
SET DEFINE ON
SET VERIFY OFF
SET SERVEROUTPUT OFF
SPOOL $FILE
[....query to create the master file...]
SPOOL OFF
EXIT
EOF
filesize= ls -l $FILE | awk '{print $5}'
#echo $filesize
#echo $sizelimit
if ! echo "$filesize $sizelimit -p" | bc | grep > /dev/null ^-;
then split -b $sizelimit $FILE part
else echo "less than the limit"
fi
This was the first attempt in using Split function. However, I don't think this can be used, given my criterion. Assuming the tag sets are like <URL>...</URL>, can anyone suggest any other way out?
Thanks a lot,
- Avik