Remove string between number and character

hello !

I have to remove string between a number and set of characters. For example,

35818    -stress  - -       -stress  - - - - - - DB-3754 
44412  caul kid notify      DB-3747 
54432  roberto -, notify   DB-3725 
55522  aws _ _int _ _classified 2_a _a 2_m _m 2_classified 2_search _search 2_shoing _shoing 2_windows _windows shp_ shp_ DB-3755 DB-3721 DB-3361 DB-3327 DB-3688 DB-3750 DB-3677 DB-3728 DB-3730

From the above, what I need is,

35818    DB-3754 
44412    DB-3747 
54432   DB-3725 
55522    DB-3755 DB-3721 DB-3361 DB-3327 DB-3688 DB-3750 DB-3677 DB-3728 DB-3730

From the examples that I saw in here I know how to remove string using sed as below

sed -s '/[0-9]{6}/,/DB-/d' 

but that is not working.

Any help is truly appreciated.

Also, one more thing if I may, instead of one to many, how do I convert it to one to one i.e., how do i make this -

55522    DB-3755 DB-3721 DB-3361 DB-3327 DB-3688

to

55522    DB-3755 
55522   DB-3721 
55522   DB-3361 
55522   DB-3327 
55522   DB-3688

Hello ManoharMa,

kindly use code tags as per forum rules for your commands/codes/Input_file into your posts.
Could you please try following and let me know if this helps.

awk '{sub(/[a-zA-Z][^DB]*/,"");sub("-DB","DB")} 1'   Input_file

EDIT: Above command will not change last DB line with multiple words, so if you want to have all DB lines into single column then following may help you too.

awk '{sub(/[a-zA-Z][^DB]*/,"");sub("-DB","DB");num=split($0, array," ");for(i=2;i<=num;i++){print array[1],array}}'    Input_file

Thanks,
R. Singh

1 Like

How about

awk '{C = NF; while ($C ~ /^DB-/) print $1, $(C--)}' file
35818 DB-3754
44412 DB-3747
54432 DB-3725
55522 DB-3730
55522 DB-3728
55522 DB-3677
55522 DB-3750
55522 DB-3688
55522 DB-3327
55522 DB-3361
55522 DB-3721
55522 DB-3755
1 Like

RavinderSingh13 and RudiC,

I can't be thankful enough for trying to help me here. Maybe I am not being clear enough.

so, here is what the input -

Code:

35818    -stress  - -       -stress  - - - - - - DB-3754 
44412  caul kid notify      DB-3747 

here is the expected output -

Code:

35818  DB-3754 
44412  DB-3747

Code:

awk '{sub(/[a-zA-Z][^DB]*/,"");sub("-DB","DB")} 1' 

gave me,

35818
44412

and all the related DB stuff is lost.

What in the result of the proposal in post#3 are you unhappy with?

awk '{C = NF; while ($C ~ /^DB-/) print $1, $(C--)}' file
35818 DB-3754
44412 DB-3747
1 Like

RudiC,

I am not a regular unix user and as and when I find interesting issues that my wife finds me to help her with, I try and fail and come and ask you folks.

You guys are keeping a very nice show out here in trying to help us folks.

Truly grateful for the help extended. !!!!

1 Like

A bit more robust is to have the min and max. And a for loop seems more clear.

awk '{for (C=NF; C && $C~/^DB-/; C--) print $1, $C}' file

And there is low effort to check each field and print in the correct order

awk '{for (C=2; C<=NF; C++) if ($C~/^DB-/) print $1, $C}' file

Clean and show:

perl -nle '@p=/^(\d+).*?(?=DB)(.*)/ and print "@p"' example
35818 DB-3754
44412 DB-3747
54432 DB-3725
55522 DB-3755 DB-3721 DB-3361 DB-3327 DB-3688 DB-3750 DB-3677 DB-3728 DB-3730

Clean and convert:

perl -nle '($k,$v) = /^(\d+).*?(?=DB)(.*)/; for(split (/\s+/, $v)){print "$k $_"}' example
35818 DB-3754
44412 DB-3747
54432 DB-3725
55522 DB-3755
55522 DB-3721
55522 DB-3361
55522 DB-3327
55522 DB-3688
55522 DB-3750
55522 DB-3677
55522 DB-3728
55522 DB-3730