using sed to get rid of duplicated columns...

fedora · April 10, 2008, 3:34pm

I can not figure out this one, so I turn to unix.com for help, I have a file, in which there are some lines containing continuously duplicate columns, like the following

adb abc abc asd adfj
123 123 123 345
234 444 444 444 444 444 23

and the output I want is

adb abc asd adfj
123 345
234 444 23

Is it possible using sed to do this?

oh, btw, I thought this should work,

sed 's/(\([^ ]\)+ )+/\1/' file , but it does not...

era · April 10, 2008, 3:55pm

You need to have backrefs already in the first part to make sure you are actually replacing repeats, otherwise it will simply reduce every line to one token (or two, if you require a trailing space after the replacement). Also you need to make up your mind on whether your sed requires a backslash before grouping parentheses or not.

sed 's/\([^ ]* \)\1*/\1/g' file

I used * instead of +; if your sed understands the plus, then by all means use that.

fedora · April 10, 2008, 4:03pm

Thanks, I messed up, time to go over sed and regexp again....

era:

You need to have backrefs already in the first part to make sure you are actually replacing repeats, otherwise it will simply reduce every line to one token (or two, if you require a trailing space after the replacement). Also you need to make up your mind on whether your sed requires a backslash before grouping parentheses or not.
sed 's/\([^ ]* \)\1*/\1/g' file
I used * instead of +; if your sed understands the plus, then by all means use that.

fedora · April 10, 2008, 4:16pm

hmm, a second though, it seems that something is still wrong

>cat /tmp/test
123 123 123 345
akljsdfaljskd 7878 7878 7878 7878 123
akljsdfaljskd 7878 7878 7878 7878 123 123 123 345 234 345 345

>sed 's/\([^ ]* \)\1*/\1/g' /tmp/test
123 345
akljsdfaljskd 7878 123
akljsdfaljskd 7878 123 345 234 345 345

note the last line, the duplicate "345" are still there

fedora · April 10, 2008, 4:21pm

hmm, that last column does not have "space", which is why...

sumit207 · November 28, 2008, 9:45am

Can you explain the syntax you have given in details

sed 's/\([^ ]* \)\1*/\1/g' file

Thanx