Thank you for taking the time to look at this and provide input.
To start, I am not a linux/unix expert but I muddle through the best I can.
I am also in no way shape or form a programmer. Please keep that in mind as you read this script.
This script is designed to find all files in a given directory that begin with "asalog", find lines containing a specific word and then process down those lines and output just the needed information. These files are currently zipped. The files are stored on remote ZFS storage. Copying all of the files down to the local system at once then unzipping is not feasible do to storage limitations. The script works as designed but it is very slow to do the task.
Please look over the code and suggest ways that I could improve its speed. The last run took 238 minutes to complete.
Due to access limitations I have to work within BASH, I do not have the option (nor the knowledge) to utilize perl, python, etc.
Any help is welcome as well as comments on the script as it sits. It has been cobbled together by remembering programming structure learned taking Turbo Pascal in high school (many years ago) and lots of google searches.
echo Search started at:
date +"%m/%d/%Y %T"
# Displays the start up information and the start time
find /var/network_logs/gc/archive/asalog* -mtime -7 -exec zcat {} \; | awk '/Built/&& !/10.10.120.145/{print $10, $11, $15, $18;}' | sed -e 's!/! !g' -e 's!:! !g' | awk '{if ($1 == "inbound") print $1, $2, $3, $4, $6, $7, $8; else if ($1 == "outbound") print $1, $2, $6, $7, $3, $4, $5;}' | awk '!seen[$0]++ {print}' >> /home/kenneth.cramer/asa/GC_ports.txt
# Finds all files with that begin with the name asalog that were written in the last 7 days. It then reads the files line by line looking
# for any lines containing the word Built but not the 10.10.120.145 IP address and prints out the 7th, 8th, 12th and 15th words in the line
# It then looks for any "/" slashes or ":" colons in the four words and replaaces them with spaces.
# The script now prints out the needed words from the line and then writes only unique lines to the output file.
echo
echo
echo
echo Sorting data into proper files.
# Displays that the script is now sorting the information
awk '{if ($1 == "inbound" && $2 == "TCP") print $2, $3, $4, $5, $6, $7 >> "/home/kenneth.cramer/asa/GC_tcpinbound.txt"; else if ($1 == "inbound" && $2 == "UDP") print $2, $3, $4, $5, $6, $7 >> "/home/kenneth.cramer/asa/GC_udpinbound.txt"; else if ($1 == "outbound" && $2 == "TCP") print $2, $3, $4, $5, $6, $7 >> "/home/kenneth.cramer/asa/GC_tcpoutbound.txt"; else if ($1 == "outbound" && $2 == "UDP") print $2, $3, $4, $5, $6, $7 >> "/home/kenneth.cramer/asa/GC_udpoutbound.txt";}' /home/kenneth.cramer/asa/GC_ports.txt
# The script now reads the file ports2.txt and sorts the data into 4 files based on it finding "Inbound or Outbound" and "TCP or UDP" in the line.
echo
echo
echo
echo Compressing files for transport
tar -czvf /home/kenneth.cramer/asa/GC_ports.tgz /home/kenneth.cramer/asa/GC_*.txt
# Compresses the output files into a single file for transport off the machine.
echo Process completed for Gold Camp at:
date +"%m/%d/%Y %T"
echo
echo
times