I saw a couple of posts here referencing how to handle more than one input field separator in awk. I figured I would share how I (just!) figured out how to turn this line in a logfile:
90000000000000000000010001 name D0.90000000000103787900010001QF840840916070000007085814Y216254@D1111111111111111=1107xxxxxxxxxxxxxxxx919MENCHIES
into this format:
90000000000000000000010001,name,840840916070000007085814Y216654,1111111111111111,1107,919MENCHIES
I have an entire script since this is just one step in a process of turning logs into useful information, but heres the relevant portion.
#Author: kinksville
#Date: April 24, 2008
#Revised: April 24, 2008
#Revision: Revision 1.00
#Other files: cclookup.s, cclookup.rep
#Changelog:
#April 24, 2008: Initial creation of the script.
#
#End changelog.
BEGIN {
FS="[ \. QF \@D = x]+"
OFS = ","
}
#First iteration of the @D search, stripping out the . character and inserting a OFS.
/\@D/ { #Search for any line containing the string @D
report2="cclookup.rep2"; #Define report2 variable.
report="cclookup.rep"; #Define report variable.
num_cclookup++; #Get number of auth requests.
print $1, $2, $5, $6, $7, $8 > report;
print $0 > report2;
} #End of the @D search.
The key is the fact that awk will accept a regular expression as file separator. This regexp FS="[ \. QF \@D = x]+" matches spaces, the . the string QF, the string @D, the =, and the character x. The + after the trailing bracket is the key, since that allows for 1 or more instances of any of the characters matched by the regexp.
That means that x and xxxxxx are both treated as a single field separator.
I still need to work on the output, since now I need to trim the name off the end of the last field. Unfortunately the number in the last field can range anywhere from 9999999 to 1 and that is the part that I want to preserve. Maybe a [^0-9]+ expression?