Quick question related to KSH expressions (not unix regular expressions).
I am trying to craft a pattern that will correctly identify lines that match the following CSV text in a case statement:
filename.txt, filename.txt, alpha, nnnn, nnnn, nnnn, Free form text
Originally I simply used an expression like *,*,*,*,*,*,* in the following case statement:
case ${LINE} in
# Expression 1..n are informational and specific enough that the
# expressions work well
expression 1..n)
... match expressions 1..n logic ... ;;
# CSV lines contain 7 fields and 6 commas
*,*,*,*,*,*,*)
... match valid CSV line logic ... ;;
# Malformed CSV lines or any other not matching my list of expressions
*)
... malformed CSV line or other mismatch ... ;;
esac
Problem:
I found that the *,*,*,*,*,*,* CSV expression matches cases such as these:
field1, field2, field3, field4, field5, field6, field7, field8, field9
field1, field2, field3, field4, field5, field6
field1, field2, field3, field4, field5, field6, field7,,,,,,,
,field1, field2, field3, field4, field5, field6, field7
I have tried numerous variations and have ended up with this expression:
case ...
...
@(*)@(,)@(*) ) ...
...
esac
I can match more precisely and this nails the smallest CSV list of "text, text" but I still have to incorporate some comma counting logic that I don't want to include.
The commas and/or asterisks are causing me complications with various expressions that I have tried (essentially * matches commas). Production code is very hard to change where I work once implemented so I'd like to nail down a very precise expression now and let the final *) expression trap all malformed lines. What am I doing wrong?
By the way, I have no control of the data file provided me so changes to my data source won't happen.