How to match n number of newline character(\n) in lex and yacc

vishwa787 · November 7, 2008, 1:00am

Hi ,

I need to develop a parser which should match something like

text a=5 " a=20";
text a=..."
a=20";

3 text a=..."
a=20
b=34
c=12
";

I have used this regular expression in my Lex file to generate the tokens:

\".\s*.*\s.\"

which matches " followed by anything and a newline etc.
The problem with the above regular expression is that it can take at most 2 new line(\n) characters because the . fails to match (\n) and breaks at that point. so i had to explicitly specify (\s*) to match the newline in between quotes(" "). if my file contains something as:

text a=10 "
b=25
c=30
d= 40
e=11
";
the above regular expression(\".\s*.*\s.\") fails to match it.

Can anyone write the regular expression which can take any any number of new lines inside the quotes (" ") and which will help to solve my problem

Please Help

otheus · November 10, 2008, 5:44am

The <newline> is not matched by the "period" operator. You have to specifically include it. \s Doesn't match a space in lex.

\"(.|[ \t\v\n])*\"

should work on :

1: ""
2: "blah blah"
3: "

"
4: " blah "

vishwa787 · November 13, 2008, 8:08am

Hi

Thanks for your reply otheus.

But when i use the regular expression
\"(.|[ \t\v\n])*\"I get a fatal error as "input buffer overflow, can't enlarge buffer because scanner uses REJECT"

otheus · November 13, 2008, 8:26am

It's been 14+ years since I touched lex. Anyone else want to try here?

vishwa787 · November 13, 2008, 9:05am

Thanks a lot otheus.

I modified the expression as \"([a-zA-Z0-9_$?\.=~<>{}#@`!$]|[ \t\v\n])\"

It works well for my requirement.

Thanks a lot

otheus · November 13, 2008, 9:24am

Hrm, but it looks like you would miss spaces between words surrounded by quotes. Does it match:

"this is a long string"

vishwa787 · November 14, 2008, 12:41am

Hi,

Yes it matches strings such as "this is a long string" .

Dont know how it is able to parse even though i have not included a space character in the expression [a-zA-Z0-9_$?\.=~<>{}#@`!$]

I have even used another condition as :

[ \t]+ /* ignore whitespace */;

in my lex file .

Does this interfere with the expression used i.e \"([a-zA-Z0-9_$?\.=~<>{}#@`!$]|[ \t\v\n])\"

Thanks
Vishwa

otheus · November 14, 2008, 4:54am

I see now that your original string is correct. But you don't need the first *. The * after the parenthesis takes care of it. The reason it works is because of |. It means either the first character range or the second character range. Really, it's redundant: you should be able to put everything into one set of brackets.

vishwa787 · November 14, 2008, 5:50am

Thanks for your timely help. It was of great help to me.

Keep up the good job