My attempts to demonstrate what I thought was happening with simpler examples on the command line were flawed and introduced additional erros (although I did duplicte the issues using Awk -F, I did not post those examples) but I don't believe my "real" code has the same flaws.
I believe that I am quoting the shell varibes correctly (or else the readlines in my code below would fail) and by specifying an alternate delimiter in my AWK statement the issues with whitespace in array indicies are not comming from either splitting or my understaning of splitting.
First I am reading in a csv file that may have non-seperating commas, whitespace, and other potentially problematic characters inside double quoted fields. So I start off with a double-quote counting parser and an alternative delimiter (the old ASCII unit seperator form puncard/paper tape days).
delim=$'\037' # ASCII Unit Seperator (US)
awk '{ quote=0; for(i=1;i<=length;i++)
{ ch=substr($0, i, 1)
if ( ch == "\"" ) quote=( ++quote % 2)
else if ( quote == 0 && ch == ",") ch="'"$delim"'"
printf ch }
print ""
}' "$scratchDir""My_Input_File" |
Next I have a rather cryptic and very long awk command routine that transposes data and then stacks 21 variable columns into two columns (a variable name column and a variable value column and adds some other valus form shell variables to the pipe). It is very long and very complicates (as well as cryptic) so Iwill omit that part (it is working exactly as expected).
The next part of a code manipulates the data based on values in three different configuration files. I think you will find that I am quoting the external file names from BASH correctly. If I don't get the nesting of single and double quotes exactly right the readlines fail.
limit file contains a list of spec limits for different parameters (in the first column). For both areas considered sensitive and areas considered insensitive (two different columns). Then there is a file with a list of sensitive areas. finally there is a file with a list of known issues that I wish to substitute.
Edit: In reponse to you the question in your edit, these config files I am reading in are comma seperated values. They do not have the issues of commas inside quotes but one of them does have spaces inside rows of a few columns.
# Look up and apply limits.
awk -F "$delim" 'BEGIN { OFS=FS
# Read in Vibrtation Limits
getline < "'"$limitFile"'" # Header Row
while (getline < "'"$limitFile"'") { split( $0, a, ","); vLim[a[1]]=a[2]; vLim[a[1]"Sen"]=a[3] }
close("'"$limitFile"'")
# Read in Sensitive Bay List
getline < "'"$bayListFile"'" # Header Row
while (getline < "'"$bayListFile"'") { split( $0, a, ","); sBL[a[1]a[2]]="Sen" }
close("'"$bayListFile"'")
# Read in Known Fail List
getline < "'"$failListFile"'"; getline < "'"$failListFile"'"; getline < "'"$failListFile"'" # Header Rows
while (getline < "'"$failListFile"'") { split( $0, a, ","); i=a[1]a[2]a[3]a[4]a[5]; gsub ( " ", "", i ); failMessage[i]=a[8]
fs=a[6]; sub ( "^$", "0000 01 01 00 00 00", fs ); failStart=fs
fe=a[7]; sub ( "^$", "9999 12 31 23 59 59", fe ); failEnd=fe }
close("'"$failListFile"'") }
NR == 1 { print $0 }
NR > 1 { if ( $9 ~ /Hz/ ) { limit=vLim[$9sBL[$2$3]]; $11=limit # works great since $9 $2 and $3 never have whitespace
if ( $10 > limit) { $12 = "Fail"; i=$2$3$4$5$9; gsub ( " ", "", i ) # sometimes $3, $4, or $5 have spaces. My code now works with the gsub removing spaces from the index but my purpose in posting was to better understand how Awk handles whitespace in indexes ("help me understand more" not "help me write a script")
if ( i in failMessage ) { now = mktime(year" "month" "day" "hour" "min" "sec) #I'm still writing this time aware part and it is not part of my question
now = mktime(2015 01 01 00 00 00) # debug
if ( now >= mktime(failStart) && now <= mktime(failEnd) ) {
$12 = "Known Fail"; if ($7 == "\"\"") gsub ( "\"$", "Known Fail: "failMessage"\"", $7 ) #column 7 is a double quoted sting. When "null" it is actually ""
else gsub ( "\"$", "|Known Fail: "failMessage"\"", $7 ) } } } #I convert "|" into DOS style new lines at a later time in my code. When there is already a comment in $7 I want a new line between my known failure mode message
else $12 = "Pass"; print $0 }
}' |
# Remove alternative delimiter
sed -e 's/'"$delim"'/,/g' > "$scratchDir""My Output File"
Of course a ton of code before and after what I included exists but it is outside the context of my question
Just as a reminder of the context of my question, the code works now that I am removing spaces from the parts in PINK using the gsub commands. My question is educational (what does awk do with whitespace in a index assignment)?
Also in general my questions in UNIX.com are not even about getting something to work but getting something to work efficiently when parsing huge files. In this particular project I am dealing with about 10,000 files that end up in a ~200k line database but in some other projects I am dealing with more than 50 million lines of data. In the case of this particular project I got the running time down from 70 min for a year of data to less than 5 min for 2 years of data. This involved moving a lot of BASH code to AWK, eliminating utility calls (using only built-ins in the interest of speed, even if more complex ) and file interactions in any kind of loop or itterative part and timing alternate versions of portions of my code in AWK, PERL, SED, BASH etc and picking the best performing one. I appologise if my questions about understanding more about how something works or if there are alternatives are being misinterperated as "this is broken, show me an implementation" types of questions. Generally when I ask these questions I have something working but I have a suspicion that there is a better/faster way.
Mike