Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted!
- The problem statement, all variables and given/known data:
Hi,
The problem statement is: I am trying to read line by line from a flat file by using a while loop. The flat file will contain 100k records and each record will have 25 columns. While reading each line, I have to read some values from an array and create a map of the values of the array and the fields extracted from each line. I tried using a for inside the while loop, but that is killing the performance. I would like to know any alternate approach to avoid the nested loops. Any help would be greatly appreciated.
- Relevant commands, code, scripts, algorithms:
Command to run the script:
Create_Index.ksh <config_file> "ABC" 1
Indexfields_1 will contain the values separated by "," for which the mapping needs to be created.
E.g: "A","B","C", "D" ...... like that 25 fields
#!/usr/bin/ksh
if [[ $# != 3 ]];then
echo "Incorrect No .of aurguments sent to script"
echo "Usage: Create_Index.ksh <config_file_name><table_identifier><segment_number> "
echo "Insufficient parameters to continue execution. Exiting the $(basename ${0}) script with 1 at $(date)"
exit 1
fi
config_file=${1}
if [ -s ${config_file} ]
then
. ${config_file}
else
log "Config file not found"
fi
#-------------------------------------
# function to log message to log file
#-------------------------------------
function log
{
msg="$1"
echo "== $(date '+%m/%d/%Y %H:%M:%S') :${msg}" >>${IndexCreation_DAILY_LOG}
}
#-------------------------------------
# function ends
#------------------------------------
base_dir="${BASE_DIR}"
afp_dir="${AFP_DIR}"
index_dir="${INDEX_DIR}/$2/$2$3"
log_dir="${LOG_DIR}/$2/$2$3"
trigger_dir="${TRIGGER_DIR}/$2/$2$3"
log_filename_suffix="${LOG_FILENAME_SUFFIX}"
output_file_path="${OUTPUT_FILE_PATH}/$2$3"
IndexCreation_DAILY_LOG=${log_dir}/${log_filename_suffix}.$(date +%m%d%y_%H%M%S)
metadata_file_name="${METADATA_FILENAME}"
trigger_file_prefix=`basename ${metadata_file_name%.dat}`
trigger_file_name="${trigger_file_prefix}.indexing"
if [[ ! -d "${log_dir}" ]];then
mkdir -p "${log_dir}"
fi
#rm -rf ${index_dir}/*
if [[ ! -d "${index_dir}" ]];then
mkdir -p "${index_dir}"
fi
if [[ ! -d "${afp_dir}" ]];then
mkdir -p "${afp_dir}"
fi
log "**********************************************************************************"
log "********Script**started**at***$(date '+%m/%d/%Y %H:%M:%S')************************"
log "**********************************************************************************"
rm -rf ${index_dir}/*
if [ $? != 0 ]
then
log "Unable to delete the old index files. Indexing failed, so creating failed trigger"
> ${trigger_dir}/${trigger_file_prefix}.indexfailed
exit 1
else
log "Successfully deleted the old index files from the directory ${index_dir}"
fi
identifier=$2
declare -i i=1
declare -i outfilecount=0
#Fetches the index values for the identifier passed in the argument
grep $identifier Indexfields_1 > tempfile1
indexfieldsnumber=`awk 'BEGIN {FS=","} ; END{print NF}' tempfile1`
log "fields to be present in undex file are $indexfieldsnumber"
cat tempfile1
#Populates the fetched index values from previous step in an array.
declare -i j=1
declare -i k=0
while [[ $j -le $indexfieldsnumber ]] ; do
indexfieldname=`cut -d "," -f${j} tempfile1`
array[${k}]="$indexfieldname"
j=$j+1
k=$k+1
done
#Finished populating the index fields values for an identifier in the array.
declare -i outfilecount=0
declare -i numberoflinesread=0
declare -i linenumber=0 #debug purpose
while read line #read the metadata file
do
record="$line"
#record=$(echo "${record}" | tr -d '[[:space:]]')
declare -i mdfieldcount=0
declare -i arrayfieldnum=0
for fieldposition in "${array[@]}" #read the field name
do
# groupfieldvalue=`echo ${line} | cut -d , -f${mdfieldcount}`
#echo "fieldposition is $fieldposition and value is $groupfieldvalue"
if [[ ${fieldposition} != ${2} ]]
then
groupfieldvalue=`echo ${line} | cut -d , -f${mdfieldcount}`
groupfieldvalue=$(echo "${groupfieldvalue}" | tr -d '[[:space:]]')
# if [[ $? != 0 ]]
# then
# log "unable to find the group field value for ${fieldposition}"
# mv ${trigger_file_name} ${trigger_file_prefix}.failed
# fi
if [[ ${fieldposition} != "${DOCUMENT_NAME}" && ${fieldposition} != "${DOCUMENT_OFFSET}" && ${fieldposition} != "${DOCUMENT_LENGTH}" && ${fieldposition} != "${COMP_OFFSET}" && ${fieldposition} != "${COMP_LENGTH}" ]]
then
echo "GROUP_FIELD_NAME:${fieldposition}" >> ${index_dir}/afp${i}.ind
echo "GROUP_FIELD_VALUE:${groupfieldvalue}" >> ${index_dir}/afp${i}.ind
fi
fi
if [[ ${fieldposition} == "${DOCUMENT_NAME}" ]]
then
docname=${groupfieldvalue}
docname="$(echo "$docname" | tr -d ' ')"
fi
if [[ ${fieldposition} == "${DOCUMENT_OFFSET}" ]]
then
docoff=${groupfieldvalue}
fi
if [[ ${fieldposition} == "${DOCUMENT_LENGTH}" ]]
then
doclen=${groupfieldvalue}
fi
if [[ ${fieldposition} == "${COMP_LENGTH}" ]]
then
complength=${groupfieldvalue}
fi
if [[ ${fieldposition} == "${COMP_OFFSET}" ]]
then
compoffset=${groupfieldvalue}
fi
filename="Decomp_${docname}_${compoffset}_${complength}.out"
indexfilename="Decomp_${docname}_${compoffset}_${complength}.ind"
filename=$(echo "${filename}" | tr -d '[[:space:]]')
indexfilename=$(echo "${indexfilename}" | tr -d '[[:space:]]')
currentfilename=$filename
if [[ $previousfilename != $currentfilename ]]
then
newcompoffset=true
fi
mdfieldcount=${mdfieldcount}+1 #Increment the metadata field count to fetch the next value from the metadt file
done
echo "GROUP_OFFSET:${docoff}" >> ${index_dir}/afp${i}.ind
echo "GROUP_LENGTH:${doclen}" >> ${index_dir}/afp${i}.ind
echo "GROUP_FILENAME:${output_file_path}/${filename}" >> ${index_dir}/afp${i}.ind
#debug purpose only
if [[ $linenumber == 5000 ]]; then
i=i+1
linenumber=0
echo "CODEPAGE:850" >> ${index_dir}/afp${i}.ind
fi
#debug purpose only
echo "finished processing for $linenumber"
linenumber=linenumber+1
done < ${metadata_file_name}
log "removing the temp file containing the indexed fields"
rm -rf tempfile
rm -rf ${index_dir}/afp*.ind
mv "${trigger_dir}/${trigger_file_prefix}.indexinprogress" "${trigger_dir}/${trigger_file_prefix}.indexed"
log "*************************************************************************************************"
log "********Script***completed**at***$(date '+%m/%d/%Y %H:%M:%S')*************************************"
log "*************************************************************************************************"
- The attempts at a solution (include all code and scripts):
Included.
- Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):
Utkal University, IND.
Note: Without school/professor/course information, you will be banned if you post here! You must complete the entire template (not just parts of it).