Help with sed substitution / regex

Jedimark · March 17, 2014, 2:24pm

Hi all, please can anyone show me how to use sed and regular expressions to achieve the following.

If a line contains a capital A followed by exactly 5 or 6 characters followed by an angled bracket then insert an asterix before the angled bracket.

So:

XCONFIGA12345<X

Becomes:

XCONFIGA12345*<X

Many thanks in advance.

Mark

Don_Cragun · March 17, 2014, 2:40pm

Is this a homework assignment?

Jedimark · March 17, 2014, 3:02pm

Haha, not quite - work!
We have a content checker to detect uuencoded strings before it enters a very strict environment and I need to detect that rule above to prevent what could be a uuencoded string.

---------- Post updated at 02:02 PM ---------- Previous update was at 01:43 PM ----------

So I think the regex would be:

A.{5,6}<

But I am very inexperienced with sed so any help on that part would be much appreciated.

Don_Cragun · March 17, 2014, 3:04pm

You could try something like:

sed -e 's/\(A......\)</\1*</' -e 's/\(A.....\)</\1*</' file

or

sed -e 's/\(A.\{5,6\}\)</\1*</' file

both of which (with the following text in file ):

XCONFIGA1234<X
XCONFIGA12345<X
XCONFIGA123456<X
XCONFIGA1234567<X

produce the output:

XCONFIGA1234<X
XCONFIGA12345*<X
XCONFIGA123456*<X
XCONFIGA1234567<X

Jedimark · March 17, 2014, 3:36pm

Thanks, and I got your alternative on my own too - so I'll go to bed happy tonight.

I'm guessing the ( stuff ) puts the expression into a group called \1 - does that mean I could define other groups and use \2 \3 etc?

Mark

Don_Cragun · March 17, 2014, 4:22pm

Yes. What you called a "group", the standards refer to as a "subexpression". In cases where subexpressions are nested, the opening \( sequence determines the subexpression number. The number of subexpressions allowed in a BRE isn't usually limited, but back references ( \digit ) in the replacement string can only reference the 1st nine subexpressions. (In a replacement string, \10 refers to the string match by the 1st subexpressoin followed by a 0 ; not the string matched by the 10th subexpression.)