Hello Unix experts.
I'm looking for some input as to how I can improve the performance of the following awk command. It works well on a smaller file but the performance decreases greatly when dealing with a larger file (>1GB in size).
The awk command is to search a compressed file for a particular segment type (based on the input parameters) and generate separate compressed segment files. In the example below, the script will generate 2 separate compressed files (AM04.dat.gz, AM0G.dat.gz).
Snippet of the script "test.sh":
Usage: test.sh B8957ETD AM04,AM0G 13 5
# Accept input parms
source_filename="$1"
search_string_list="$2"
search_col_pos="$3"
search_str_len="$4"
#
# Define search_string_list_variable to be comma separated
IFS=","
search_col_pos=`expr "$search_col_pos" + 0`
search_str_len=`expr "$search_str_len" + 0`
search_elements=''
zcat $source_filename | awk -v search_col_pos=$search_col_pos -v search_str_len=$search_str_len -v search_elements="$search_elements" -v tgt_path="${CREDIT_CARD_SOURCE_DIR}/" 'BEGIN{RS="\r?\n"} {src_substring=substr($0, search_col_pos, search_str_len)} {cmd="gzip > " tgt_path src_substring ".dat.gz"} index(search_elements, src_substring)>0 { print $0 | cmd }'