Curly braces in sed

tostay2003 · June 18, 2015, 2:50am

Hi,

I have below command in one of the script. Can you please let me know what does the curly braces do over here \{1,\}. The remaining part of the code atleast I am able to understand.

sed -n 's/.*\-\([0-9])\{1,\}\)\-.*/\1/p'

Scrutinizer · June 18, 2015, 4:01am

Hi, it means 1 or more occurrences of the preceding character or sub-expression, in this case: ) , so 1 or more closing parentheses..

bakunin · June 18, 2015, 4:59am

Scrutinizer is right. You can use this device to "multiply" a previous expression, similar to a "*", but with added functionality. For instance:

X            # matches exactly one single "X"
X*           # matches any number of "X"s, including zero
X\{3\}       # matches exactly 3 "X"s
X\{1,\}      # matches any number of "X"s, from 1 up
X\{,5\}      # matches up to 5 "X"s
X\{3,5\}     # matches 3 to 5 "X"s, hence either "XXX", "XXXX" or "XXXXX"

Notice, that, instead of a single character like "X" here, you can also modify complex expressions with that modifier. For instance:

|[^|]*          # matches a "field" in tabular, pipe-separated data
                # i.e. "|field1|field2|field3....."
 
\(|[^|]*\)\{3\} # matches 3 such fields

I hope this helps.

bakunin

tostay2003 · June 18, 2015, 5:47am

sed -n 's/.*\.\([0-9]\{1,\}\)\..*/\1/p'

I had it slightly wrong, but I got the idea of \{1,\}

However when I have the value such as

echo "testing.123.xyz.456.txt" | sed -n 's/.*\.\([0-9]\{1,\}\)\..*/\1/p'

I am getting value "456" instead of first pattern. Is there anywhere that I am going wrong.

Aia · June 18, 2015, 6:14am

echo "testing.123.xyz.456.txt" | sed -n 's/.*\.\([0-9]\{1,\}\)\..*/\1/p'

The issue is that the red part of the regex matches the part of the string. The `.*' will try to match as much as it can.

Perhaps, modifying your regex a bit:

echo "testing.123.xyz.456.txt" | sed -n 's/[^.]*\.\([0-9]\{1,\}\)\..*/\1/p'

[^.]* : keep matching any char that is not a literal period. Stops when it does.

Scrutinizer · June 18, 2015, 6:44am

Or to get the first field with numbers, try:

sed -n 's/^\([^0-9.].\)*\([0-9]*\).*/\2/p'

or if is always the second field, try:

cut -d. -f2

tostay2003 · June 18, 2015, 7:55am

aia:

echo "testing.123.xyz.456.txt" | sed -n 's/.*\.\([0-9]\{1,\}\)\..*/\1/p'
The issue is that the red part of the regex matches the part of the string. The `.*' will try to match as much as it can.

Perhaps, modifying your regex a bit:
echo "testing.123.xyz.456.txt" | sed -n 's/[^.]*\.\([0-9]\{1,\}\)\..*/\1/p'
[^.]* : keep matching any char that is not a literal period. Stops when it does.

Great.. This works... is ^ in square brackets used as a negation?

bakunin · June 18, 2015, 10:14am

Exactly:

[AB]     # matches "A" or "B"
[^AB]    # matches any character except "A" or "B"

You need this quite often, because matches in sed are always "greedy". Suppose the following text:

AXXBABABABXABABABAB

the expression /A.*B/ will match the whole string, not just the first four characters! If several possiblities for a match exist always the longest possible one will be taken:

AXXBABABABXABABABAB
A<------ .* ----->B

If you want to match only up to the first "B" you need to:

AXXBABABABXABABABAB
A[^B]*B
AXXB

I hope this helps.

bakunin

tostay2003 · June 24, 2015, 7:40am

Hi,

Thanks for the detailed description.

I tried changing the below sed to fetch the first available numeric, but was unable to get the results.

echo "testing.123.xyz.456.txt" | sed -n 's/[^.]*\.\([0-9]\{1,\}\)\..*/\1/p'

Ideally, my code has to fetch any numeric value that comes first and matching a pattern.

Pattern -> testing.123.xyz.*.txt 
Actual Value -> testing.123.xyz.456.txt 
----> This should result in 123

If pattern DOES NOT have a * only first available value needs to be extracted

abc123testing456 ----> should result in 123
123abctesting456 ----> should result in 123

Any ideas how we can change the code to achieve the above result

RudiC · June 24, 2015, 9:37am

Not quite sure what you mean by "above result". However, try

sed -rn 's/[^0-9]*([0-9]{1,})[^0-9]*.*/\1/p' file

tostay2003 · June 24, 2015, 12:57pm

In all the below cases (for eg.) the result should be "1234"

abc.1234.xyz.456.999
abc.1234testing456 
abc1234testing456 
1234abctesting456

Result :1234

I do not have system in front of me. I can check it tomorrow.

Does the above code which you have given return

"23"

. Please correct me If I am wrong whether the values 1 and 4 gets absorbed by

[^0-9]

RudiC · June 24, 2015, 2:53pm

Result of sed script in post#10 applied to your last sample:

Aia · June 24, 2015, 10:21pm

$ cat numbers.file
abc.1234.xyz.456.999
abc.1234testing456 
nonumbershere
abc1234testing456 
voidofnumbers
1234abctesting456

$ perl -nle '/(\d+)/ and print $1' numbers.file 
1234
1234
1234
1234

Scrutinizer · June 25, 2015, 9:12am

Regular sed:

sed -n 's/^[^0-9]*\([0-9]\{1,\}\).*/\1/p' file

GNU awk:

awk -v FPAT='[0-9]+' 'NF{print $1}' file

Regular awk:

awk 'match($0,/[0-9]+/){print substr($0,RSTART,RLENGTH)}' file