Cut out string in bash script

cootue · February 21, 2010, 5:57pm

Hi all,
I'm trying to extract string from variable in BASH. That's probably trivial for grep but I couldn't figure it out.

I want to get name, there's sometimes digit after it, but it should be left out.

STRING=http://name5.domain.com:8000/file.dat

Could someone help me with that?
Any solution will do.

alister · February 21, 2010, 6:09pm

The following works fine if there's always a digit after "name" and none embedded within it.

$ STRING=http://name5.domain.com:8000/file.dat
$ n=${STRING%%[0-9]*}
$ n=${n#*//}
$ echo $n
name

durden_tyler · February 21, 2010, 6:12pm

$ 
$ STRING=http://name5.domain.com:8000/file.dat
$ 
$ echo $STRING | perl -lne '/http:\/\/(.*?)\d.*/ && print $1'
name
$ 
$

tyler_durden

alister · February 21, 2010, 6:27pm

The following will extract the first component of the url and strip any trailing digits that may be present (leading or embedded numbers are left intact):

echo "$STRING" | awk -F'\\/\\/|\\.' '{sub(/[0-9]+$/, "", $2); print $2}'

@durden tyler

That perl will not work if the trailing digit is not present or if there's a digit embedded elsewhere in the name. The latter may be unlikely, but the original query states that the digit is not always present.

Cheers,
Alister

cootue · February 21, 2010, 6:35pm

Thanks guys for fast response!

Maybe I wasn't clear enough about that digit, but yeah, alister's solution works in both cases.

Cheers.

alister · February 21, 2010, 6:40pm

One more

echo $STRING | sed 's|^http://\([^.]*\)\..*$|\1|; s|[0-9]*$||'

Alister

durden_tyler · February 21, 2010, 8:27pm

Sorry, my bad.

Here's a file that contains variations of the URL, with the digit at the end, at the beginning, in between, and non-existent.

$ 
$ cat -n f6
     1  http://name5.domain.com:8000/file.dat
     2  http://na5me.domain.com:8000/file.dat
     3  http://5name.domain.com:8000/file.dat
     4  http://name.domain.com:8000/file.dat
$ 
$

Here's the Perl equivalent of your second awk script i.e. remove digit at the end (if present), but retain digit in-between or at the beginning (if present). Otherwise just fetch the word between "//" and "." that does not have any digits.

$ 
$ perl -lne '/http:\/\/(.*?)(\d)*\..*/ && print $_,"\t=>\t",$1' f6
http://name5.domain.com:8000/file.dat   =>      name
http://na5me.domain.com:8000/file.dat   =>      na5me
http://5name.domain.com:8000/file.dat   =>      5name
http://name.domain.com:8000/file.dat    =>      name
$ 
$

Since nothing has been mentioned about the case when the digit is at the beginning or in-between, I shall consider a couple of cases.

Case 1:
The digit at the beginning is to be removed.
The digit in-between has to be retained.
The digit at the end is to be removed.
Which means - "remove digits from the ends, retain it in the middle".

$ 
$ perl -lne '/http:\/\/(\d)*(.*?)(\d)*\..*/ && print $_,"\t=>\t",$2' f6
http://name5.domain.com:8000/file.dat   =>      name
http://na5me.domain.com:8000/file.dat   =>      na5me
http://5name.domain.com:8000/file.dat   =>      name
http://name.domain.com:8000/file.dat    =>      name
$

Case 2:
The digit at the beginning is to be removed.
The digit in-between has to be removed.
The digit at the end is to be removed.
Which means - "remove digits from the ends and the middle".

$ 
$ perl -lne 'if (/http:\/\/(\d)*(.*?)(\d)*\..*/){ ($x = $2) =~ s/\d//g; print $_,"\t=>\t",$x}' f6
http://name5.domain.com:8000/file.dat   =>      name
http://na5me.domain.com:8000/file.dat   =>      name
http://5name.domain.com:8000/file.dat   =>      name
http://name.domain.com:8000/file.dat    =>      name
$

tyler_durden

cootue · February 22, 2010, 1:54am

Thanks durden_tyler,
that's more ways to do it that I'll ever need to have

cootue · February 26, 2010, 3:27pm

Sorry for little bit of necro...

I have extension to my problem:
protocol doesn't have to be 'http', so it could be anything before '://' signs.

Also, second way more simple thing: how to get url protocol, again anything before '://url.com/blablabla'

durden_tyler · February 26, 2010, 7:55pm

Not sure how that fits in here. "necro" = corpse, death, dead tissue etc. (Necro | Define Necro at Dictionary.com) But anyway, I digress.

So here's the same file "f6" again, with a few non-http protocols thrown in:

$
$ cat -n f6
     1  http://name5.domain.com:8000/file.dat
     2  http://na5me.domain.com:8000/file.dat
     3  http://5name.domain.com:8000/file.dat
     4  http://name.domain.com:8000/file.dat
     5  ftp://name5.domain.com:8000/file.dat
     6  ftp://na5me.domain.com:8000/file.dat
     7  ftp://5name.domain.com:8000/file.dat
     8  ftp://name.domain.com:8000/file.dat
$
$

The Perl script below fetches the word that is between "://" and ".".
If this word has one or more digits at the end, then are left off.
If this word has one or more digits in the middle, or at the beginning, they are retained.

$
$ perl -lne '/^.*?:\/\/(.*?)\d*\..*$/; print $_,"\t=>\t",$1' f6
http://name5.domain.com:8000/file.dat   =>      name
http://na5me.domain.com:8000/file.dat   =>      na5me
http://5name.domain.com:8000/file.dat   =>      5name
http://name.domain.com:8000/file.dat    =>      name
ftp://name5.domain.com:8000/file.dat    =>      name
ftp://na5me.domain.com:8000/file.dat    =>      na5me
ftp://5name.domain.com:8000/file.dat    =>      5name
ftp://name.domain.com:8000/file.dat     =>      name
$
$

Here's the Perl script that fetches the protocol from the URL -

$
$ perl -lne '/^(.*?):\/\/(.*?)\d*\..*$/; print $_,"\t=>\t",$1' f6
http://name5.domain.com:8000/file.dat   =>      http
http://na5me.domain.com:8000/file.dat   =>      http
http://5name.domain.com:8000/file.dat   =>      http
http://name.domain.com:8000/file.dat    =>      http
ftp://name5.domain.com:8000/file.dat    =>      ftp
ftp://na5me.domain.com:8000/file.dat    =>      ftp
ftp://5name.domain.com:8000/file.dat    =>      ftp
ftp://name.domain.com:8000/file.dat     =>      ftp
$
$

HTH,
tyler_durden

cootue · February 27, 2010, 12:49am

Thanks,
durden_tyler, looks like perl is way to go

And 'necro', like... to revive topics from the dead. Or maybe it was some other word? Hehe.