How to match n number of newline character(\n) in lex and yacc

Hi ,

I need to develop a parser which should match something like

  1. text a=5 " a=20";

  2. text a=..."
    a=20";

3 text a=..."
a=20
b=34
c=12
";

I have used this regular expression in my Lex file to generate the tokens:

\".\s*.*\s.\"

which matches " followed by anything and a newline etc.
The problem with the above regular expression is that it can take at most 2 new line(\n) characters because the . fails to match (\n) and breaks at that point. so i had to explicitly specify (\s*) to match the newline in between quotes(" "). if my file contains something as:

text a=10 "
b=25
c=30
d= 40
e=11
";
the above regular expression(\".\s*.*\s.\") fails to match it.

Can anyone write the regular expression which can take any any number of new lines inside the quotes (" ") and which will help to solve my problem

Please Help

The <newline> is not matched by the "period" operator. You have to specifically include it. \s Doesn't match a space in lex.

\"(.|[ \t\v\n])*\"

should work on :

1: ""
2: "blah blah"
3: "

"
4: " blah "

Hi

Thanks for your reply otheus.

But when i use the regular expression
\"(.|[ \t\v\n])*\"I get a fatal error as "input buffer overflow, can't enlarge buffer because scanner uses REJECT"

It's been 14+ years since I touched lex. Anyone else want to try here?

Thanks a lot otheus.

I modified the expression as \"([a-zA-Z0-9_$?\.=~<>{}#@`!$]|[ \t\v\n])\"

It works well for my requirement.

Thanks a lot

Hrm, but it looks like you would miss spaces between words surrounded by quotes. Does it match:

"this is a long string"

Hi,

Yes it matches strings such as "this is a long string" .

Dont know how it is able to parse even though i have not included a space character in the expression [a-zA-Z0-9_$?\.=~<>{}#@`!$]

I have even used another condition as :

[ \t]+ /* ignore whitespace */;

in my lex file .

Does this interfere with the expression used i.e \"([a-zA-Z0-9_$?\.=~<>{}#@`!$]|[ \t\v\n])\"

Thanks
Vishwa

I see now that your original string is correct. But you don't need the first *. The * after the parenthesis takes care of it. The reason it works is because of |. It means either the first character range or the second character range. Really, it's redundant: you should be able to put everything into one set of brackets.

Thanks for your timely help. It was of great help to me.

Keep up the good job