Perl, RegEx - Help me to understand the regex!

I am not a big expert in regex and have just little understanding of that language.
Could you help me to understand the regular Perl expression:

^(?!if\b|else\b|while\b|[\s\*])(?:[\w\*~_& ;]+?\s+){1,6}([\w:\*~_&]+\s*)\([^\);]*\)[^\{;] *?(?:^[^\r\n\{]*;?[\s]+){0,10}\{ 

------
This is regex to select functions from a C/C++ source and defined in UltraEdit (if interested where it is from)
It works.
I able to use it in my perl script by applying 'm' - multyline regex and having whole file as one string (by 'undef $/')
------
I am going to show here my understanding how much I have and will ask where I do not know what is happening.
------
Please, correct me, if I am wrong in any assumption and give me an idea where I do not have any!
Thanks!

So, as I understanding this regex so far, is:

#initial:
"^(?!if\b|else\b|while\b|[\s\*])(?:[\w\*~_&]+?\s+){1,6}([\w:\*~_&]+\s*)\([^\);]*\)[^\{;]*?(?:^[^\r\n\{]*;?[\s]+){0,10}\{"

#by pieces:
# 1.
"^(?!if\b|else\b|while\b|[\s\*]) # - on beginning DOES NOT have words: 'if','esle','while' or (' ' and '*')
                                 # I guess, '(?!' part means do not select this part (defined by stuff in '(...)')
# 2.
(?:[\w\*~_&]+?\s+){1,6}          # - allowed beginning: any alpha-numeric or '*','~','_' and '&' one or more (shortest) (by +?),
                                 #   followed by ' 's (1 or more) (so, should be a word) , repeating from 1 to 6 times
                                 # Again: '(?:' - do not save it in final selection
# 3.
([\w:\*~_&]+\s*)\([^\);]*\)      # world-chars(plus '*~_&) one+ times; space{0,} (saved as $1), followed by '(...)' without ';' inside
# 4.
[^\{;]*?                         # - no ';' and '{' - any time repeated, but shortest (by *?)
                                 # so, IS IT anything between <func_nm>(..)  and   {...}  ?  So, comments only?
# 5.
(?:^[^\r\n\{]*;?[\s]+){0,10}     # ????  - This I do not understand:
                                 # what is '^[^' ? - beginning and NOT-block?  How it could be beginning? Is it in multyline selection means
                                 # anything on new line?
                                 # After that the \r\n - so, line change.  After 'on beginning' no line change?! So, one new line is fine, but
                                 # two is not???  Seems, nonsense.  How to understand?
                                 #  - Followed by ';' ?!?!  Statement between <fnc_nm>(...) and {...} ?!?!?  - Nonsense?!?!
                                 # after that '?' -so, shortest?
                                 # - folowed by spaces, at least one; and it could be up to 10 times (but do not save it (by (?: on beginning)
                                 # This understanding seems to me unreasonable.
                                 # Help me to get it!
# 6.
\{"                              # Finaly, followed by '{'

Thanks!

(?:^[^\r\n\{]*;?[\s]+){0,10}
(){0,10} # a grouping that could repeat from 0 to 10 times max
?: # do not capture this group (do not save in memory to use later)
^ # match start of the line
[^\r\n\{]* # math any character that is not a \r, \n or \{, 0 or more times
;? # match a ; if exist
[\s]+ # match any white space one or more times
1 Like

Thanks, Aia!
So, it seems, I did understand pretty correct.
That means, my unclear, acctualy, is to the logic.

What the reason to restrict new-line (the \r,\n) on line beginning in C/C++ ?!
I am about the '^[^\r\n\{]*' in the #5 part: there is no any restriction in C/C++ ot get any number of new-line that does not brake a word!

How that could be having '<anything>;' between <func_nm>(..<params>..) and the {...} - the function body?! - that I see by #4 and beginning #5 :

  • [^\{;] *?(?:^[^\r\n\{]*;?
  • especially, finished by ';'?! And up to 10 times?!

That RegEx is searching a function declaration in a C/C++ source.
How those regulation could be useful in that task?