Remove matching pattern on each line with number variations

martinsmith · February 16, 2017, 2:04am

Hello folks!

I have a file containing lines like this

Something text 18:37Remove This: 1,111"Keep this text"
Some more text 19:37Remove This: 222"Keep this text"
More text 20:50Remove This: 3,333Keep this text
And more text 25:50Remove This: 44,444Keep this text

I would like to replace everything from ( Remove This: 1,111 ) or whatever number pattern follows with a tab space so the output would be:

Something text 18:37	"Keep this text"
Some more text 19:37	"Keep this text"
More text 20:50	Keep this text
And more text 25:50	Keep this text

Something text 18:37[TABSPACE]"Keep this text"
Some more text 19:37[TABSPACE]"Keep this text"
More text 20:50[TABSPACE]Keep this text
And more text 25:50[TABSPACE]Keep this text

Thank you for your help!

rbatte1 · February 16, 2017, 3:15am

Hello martinsmith,

I have a few to questions pose in response first:-

Is this homework/assignment? There are specific forums for these.
What have you tried so far?
What output/errors do you get?
What OS and version are you using?
What are your preferred tools? (C, shell, perl, awk, etc.)
What logical process have you considered? (to help steer us to follow what you are trying to achieve)

Most importantly, What have you tried so far?

There are probably many ways to achieve most tasks, so giving us an idea of your style and thoughts will help us guide you to an answer most suitable to you so you can adjust it to suit your needs in future.

We're all here to learn and getting the relevant information will help us all.

Kind regards,
Robin

martinsmith · February 17, 2017, 6:35pm

Hi Robin,

Thank you!

It's not a homework assignment. I'm just trying to clean up some data which i had extracted.

Basically i tried some commands with sed and awk but could not figure this out. I'm not a programmer so i don't know much about working my way around this I thought maybe someone here might have a solution for this hopefully.

Basically what i'm trying to accomplish is the following.

I have a data file which is about 40K lines long so it would take me forever to clean it up manually.

In my file i have data structured like this:

Random text 18:37Same text: 1,111"Some random text"
More random text 19:37Same text: 222"Some more random text"

After "Same text:" the numbers are randomized from 1 to 50,000.

I would like to replace everything from "Same text: RANDOMIZED NUMBER" with a tab space so the output can be:

Random text 18:37	"Some random text"
More random text 19:37	"Some more random text"

I'd appreciate any help for this if there is a way.

Thanks so much again

Scrutinizer · February 17, 2017, 7:32pm

With GNU sed, try:

sed 's/Remove This: [0-9,]*/\t/' file

with regular sed you need to use a real TAB character, which can be entered with CTRL-V TAB

sed 's/Remove This: [0-9,]*/    /' file

its any awk try:

awk '{sub(/Remove This: [0-9,]*/, "\t")}1'  file

martinsmith · February 17, 2017, 8:01pm

scrutinizer:

With GNU sed, try:
sed 's/Remove This: [0-9,]*/\t/' file
with regular sed you need to use a real TAB character, which can be entered with CTRL-V TAB
sed 's/Remove This: [0-9,]*/    /' file
its any awk try:
awk '{sub(/Remove This: [0-9,]*/, "\t")}1'  file

Thank you Scrutinizer!

All of your solutions worked perfectly and solved the problem.

Many Thanks for your help! It is so much appreciated