Replace everything but pattern in a line using sed

flatley · April 5, 2012, 7:11pm

I have a file with multiple lines like this:

<junk><PATTERN><junk><PATTERN><junk>
<junk><PATTERN><junk><PATTERN><junk><PATTERN><junk>

Note that

There might be variable number occurrences of PATTERN in a line.
<> are just placeholders, they do not form part of the pattern.

I need to replace all the <junk> with a nice separator, like a comma.

So I am left with, in the above case

,PATTERN,PATTERN,
,PATTERN,PATTERN,PATTERN,

In my case, PATTERN has the pattern "AB:" followed by 9 digits.

I am able to replace the pattern with comma, using the following command -

sed 's/\(AB\:[0-9]\{9\}\)/,/g' <input_file>

So I thought replacing everything BUT the pattern should be just one step further - but I am not able to find a solution. Isn't there a way to do this using sed, using the ! operator? I am not able to put my finger on the solution. Any help would be greatly appreciated!

Thanks

cjcox · April 5, 2012, 7:46pm

Sometimes you can use some tricks to add extra data to break patterns out like this.. using for example, control characters (needs to be a character not found in the data)... consider:

sed -e 's/PATTERN/^BPATTERN^B/g' <test2.txt | tr '\012\002' '\002\012' | grep -v '^PATTERN$' | tr -d '\012' | tr '\002' '\012'

Where your data is in test2.txt and in the first sed the ^B are literally Ctrl-B characters... this breaks the patterns out onto lines to themselves and replaces newlines with Ctrl-B and stitches things back together afterwords.

Scrutinizer · April 5, 2012, 8:48pm

You are not per chance only interested in the patterns themselves or do they need to remain on their original lines? Otherwise if your system has grep -o you can do this:

grep -Eo 'AB:[0-9]{9}' infile

and if you system does not have grep -o, you could do this:

sed 's/AB:[0-9]\{9\}/\n&\n/g' infile | grep -E 'AB:[0-9]{9}'

or for older sed:

sed 's/AB:[0-9]\{9\}/\
&\
/g' infile | grep -E 'AB:[0-9]{9}'

codecaine · April 5, 2012, 9:17pm

awk 'gsub("<junk>",",")' Your_File

flatley · April 6, 2012, 9:40am

Thanks for all the replies!

Scrutinizer, I need all the patterns that occurred in a line, and they need to remain on the original lines, with the junk filtered out. I did try out the grep -o option, but then I couldn't tell which line a given pattern corresponds to.

codecaine, the junk follows no particular pattern, so I can't delete the junk using sed.

cjcox, I see what you mean - and how it would work. I was wondering if there would be a more straightforward solution. If I can replace all occurrences of a pattern with a string of my choice, surely I'd expect to be able to replace everything BUT the pattern with a small tweak of the command? Or is it not that straightforward?

Thanks again for all the help, really appreciate it!