Regex to return string - Why match is greedy?

I want to obtain this value as a result:

$sql_array = array(
		'SELECT'	=> 'f.*',
		'FROM'		=> array(
			FORUMS_TABLE		=> 'f'
		),
		'LEFT_JOIN'	=> array(),
	); 

Having this code:
space=" "

  regex_needle=".*?^$space\); *?$"
        # I seek needle in prohledavany_block
        if [[ $prohledavany_block =~ $regex_needle ]]; then
            echo "Found: ${BASH_REMATCH[0]}" 
        else
            echo "Needle not found"
        fi

$prohledavany_block is this text (shorten)

$sql_array = array(
		'SELECT'	=> 'f.*',
		'FROM'		=> array(
			FORUMS_TABLE		=> 'f'
		),
		'LEFT_JOIN'	=> array(),
	);

	if ($config['load_db_lastread'] && $user->data['is_registered'])
	{
		$sql_array['LEFT_JOIN'][] = array('FROM' => array(FORUMS_TRACK_TABLE => 'ft'), 'ON' => 'ft.user_id = ' . $user->data['user_id'] . ' AND ft.forum_id = f.forum_id');
		$sql_array['SELECT'] .= ', ft.mark_time';
	}
	else if ($config['load_anon_lastread'] || $user->data['is_registered'])
	{
		$tracking_topics = $request->variable($config['cookie_name'] . '_track', '', true, \phpbb\request\request_interface::COOKIE);
		$tracking_topics = ($tracking_topics) ? tracking_unserialize($tracking_topics) : array();

		if (!$user->data['is_registered'])
		{
			$user->data['user_lastmark'] = (isset($tracking_topics['l'])) ? (int) (base_convert($tracking_topics['l'], 36, 10) + $config['board_startdate']) : 0;
		}
	}

	if ($show_active)
	{
		$sql_array['LEFT_JOIN'][] = array(
			'FROM'	=> array(FORUMS_ACCESS_TABLE => 'fa'),
			'ON'	=> "fa.forum_id = f.forum_id AND fa.session_id = '" . $db->sql_escape($user->session_id) . "'"
		);

(it is a bit longer)

The problem is that I am getting nothing with this:

regex_needle=".*?^$space\); *?$"

And greedy results with this:

regex_needle=".*?$space\); *?$"

which is faulty because i want to insert new line or to seek newline. This must to return the block

Can you help to fix the regex?

Why .*? is greedy?

Why ^$space does not find the feedline or newline?

The non-greedy quantifier *? is PCRE, not Posix.
man bash
tells the [[ =~ ]] uses a Posix ERE, so only supports greedy matches.

^$space I don't know at all.
The Posix character class is [[:space:]]

The following can make sense:

regex_needle="[^[:space:]]+[[:space:]]*\); *$"

I don't understand what actually should be returned from your text sample...
A RE works inside a line; it cannot return several lines.

2 Likes

$space is a variable which contains the same string which is before the variable name $sql_array in php code. I have a bash script which I didn't show to you, which copied the spaces to space variable. Then I "want to find" the end of the array which is variable space and ");" or "$space);" . So the text I want to obtain starts with $sql_array = array( and finishes with );. But I have already found the block so it is saved in $prohledavany_block as I have said. So I need only to find the end of the block having the needle "$space);". As a result whole sql_array should be save in the result. Important: In the original code for some reason there is problem with formating and the first line is not visible: $sql_array = array( which is pretty important to understand it. So it may not make sense until I modified the code in question.

@visitor123, posting screenshots (especially those with colour-coded text) are a pain to read, do a simply copy/paste of the actual code, that way everyone can read (and modify if required when responding).
Also, showing what you ACTUALLY EXPECT would be a big help, (even though it has to be hand typed) since you know what you are expecting, that helps eliminate ambiguties etc.

Finally, for clarity is this code in bash/perl/php/lua/?

tks

I have seen this and your fix attempt. As a moderator I did a further fix: the triple-backticks markdown must be on separate lines, one before and one after the code block.

Now I have an idea what you want.
The ^ must be at the beginning of the RE in order to mark the beginning of the line.
I see you have a multi-line block captured in a variable (that is, sigh, certainly not the best idea in shell).
A RE (here, ERE) is not made for multi-line and embedded newline characters. Nevertheless you can try if a literal newline works:

regex_needle=".*
$space\); *
"

Or with a nl variable:

nl=$'\n'
regex_needle=".*${nl}$space\); *${nl}"
1 Like

Thank you for your solution. Something new and important I have learned now. This will save me a lot of troubles in future scripting.

May I share here in the thread a code how I am getting the space before sql_array? I would like to share because the code work perfectly, but when I move it to separate function, it stopped working - meaning, that instead of getting the space, it returns complete php code which is in the variable prohledavany_block. Just let me know if is it OK to send it here or to create new question for this. I could fit in the context of this question, but this is on a bash function using awk to get the space.

Make a new thread!