sed help with underscore problems

YogaBija · March 4, 2014, 8:21am

Hello,

I have spent a couple of hours trying to answer this myself, so forgive me if the answer is simple but I have tried.

I have a text file generated from svn log output which contains a list of files.

Two regexps im using are

[a-zA-Z0-9]*

and

[a-zA-Z0-9_]*

They both work but some lines has a mixture of both formats. I have tried sed -e and other patterns, some produce the same results (simpler patterns) and some dont at all. This is to be expected.

Problem is the two example patterns are like this

a12_fdgdfg/proga
a12/progb
a11_dsfsdf/progc
a11/progd

I need to extract out from a text list as above the following

proga
progb
progc
progd

Im looking for one regexp that would do both in the context of sed

Im sure this must be simple enough for some one who ise familair with regexp, but I just started today and need some help.

Many thanks in advance

RavinderSingh13 · March 4, 2014, 8:26am

Hello,

It's a request please use the code tags while posting commands and code as per forum rules. Following may help you in same.

awk 'gsub(/.*\/+/,X) 1' file_name

Output will be as follows.

proga
progb
progc
progd

EDIT: One more solution for same with sed .

 sed 's/\(.*\/\)\(.*\)/\2/g'  file_name

Output will be as follows.

proga
progb
progc
progd

EDIT: one more solution by basename .

while read line
do
basename $line
done < "file_name"

Thanks,
R. Singh

YogaBija · March 4, 2014, 8:31am

Many thanks for the reply. I will use code tags in future.

Franklin52 · March 4, 2014, 8:32am

awk -F/ '{print $NF}' file

or

sed 's/.*\///' file

YogaBija · March 4, 2014, 10:22am

Actually based on your reply using the () and no 2 etc, I think I should have used a fuller example. My mistake, I had thought the answer to my simplified question would have been enough.

I have a script

Pattern='   [A-Z] \/Group\/Subgroup\/[.]*\/'

# $interim is generated by a special svn log command 
cat $Interim | sed "s/$Pattern/.\//g" | sed "s/.src//g"  | sort | uniq -u >$Unique

I am happy to use a fully in-line style as follows

cat $Interim | sed "s/   [A-Z] \/Group\/Subgroup\/[.]*\//.\//g" | sed "s/.src//g"  | sort | uniq -u >$Unique

I'm trying to see how I can integrate the provided solution into my wider example. Wish I had shown the fuller answer, was hoping I could apply it to my issue.

Many thanks

---------- Post updated at 10:13 AM ---------- Previous update was at 08:42 AM ----------

Hello Franklin52 and everyone else.

One thing that confuses me is that the following pattern

[a-zA-Z0-9_]*

Doesn't seem to work correctly, I thought the * meant that any number or none of the characters were matched. So why would hhh_bbb be processed and hhh be ignored?

Im still having problems integrating the solutions into the fuller example I gave. I am unable to use php as it's not on the server. I could use perl but would really like to understand why the underscore is making things so tricky. I realise HP-UX sed isn't normally as fully featured as other versions.

Thanks

---------- Post updated at 10:22 AM ---------- Previous update was at 10:13 AM ----------

Let me share a full problem so you can see more clearly;

Example file to process

   A /Branch/Subbranch/x9_llll/something/xx/SourceA
   M /Branch/Subbranch/x23_llll/else/dir/SourceB
   M /Branch/Subbranch/x49/else/dir/subdir/SourceC
   M /Branch/Subbranch/x1/else/dir/subdir/SourceD

As far as I can tell the pattern

[a-zA-Z0-9_]*

Should cope with both x11_lll and x11

When I remove the underscore I see a mixture of stripped down programs only and unprocessed lines as follows;

   M /Branch/Subbranch/x24_lll/else/dir/SourceH
   M /Branch/Subbranch/x24_lll/something/dir/SourceJ
./something/dir/subdir/SourceX
./else/dir/subdir/SourceY

When I keep the underscore the resultant file looks okay but misses all the lines that had x11 only (as opposed to x11_lll). In the last code fragment the last two lines are correct in final output im looking for.

Scrutinizer · March 4, 2014, 10:54am

Have you tried using [[:alnum:]_]* instead of [a-zA-Z0-9_]* ?

RudiC · March 4, 2014, 1:04pm

In your post #5, [.]* means zero or more literal dots, which is not what you intend to match. .* in turn would be a greedy match, removing too many characters. Try instead

sed "s/   [A-Z ] \/Branch\/Subbranch\/[^/]*\//.\//g" file
                                        ^--- any number of non-slash chars

MadeInGermany · March 5, 2014, 12:38pm

Better readable is another separator that is not used in the expression string

sed "s#   [A-Z ] /Branch/Subbranch/[^/]*/#./#" file

g option is not needed unless the whole expression should match more than once within one line.