I've got a file consisting of fields separated by commas, I need a sed or awk command that will delete all spaces between two commas as long as there are only spaces between the commas.
eg
,abc, ,sd , ,dr at
would become
,abc,,sd ,,dr at
I have tried sed -e 's/, .*,/,,/g' but it does not work.
Leading and trailing space: fair game or verboten? This does just all-spaces:
sed '
s/, *,/,,/g
s/, *,/,,/g
'
Narrative: Substitute for comma, then space, then any number of spaces, then a comma: a comma comma. Repeat for adjscent fields of spaces (can be in second piped sed for speed).
This does leading and trailing, too:
sed '
s/, */,/g
s/ *,/,/g
'
Narraitve: Substitute for a comma, then a space, then any number of spaces: a comma (trim leading spaces). Substitute for a space, then any number of spaces, then a comma: a comma (trim trailing spaces).
this misses odd instances with adjacent spaces, since you used the opening , as a trailing , -- You must do it twice.
---------- Post updated at 05:28 PM ---------- Previous update was at 05:27 PM ----------
Also kills leading and trailing spaces, more than was requested originally, but gets past the double run need of patterns with two commas.
---------- Post updated at 05:34 PM ---------- Previous update was at 05:28 PM ----------
Does that work -- usually putting ^$ inside |(\) is NG.
You have to do it twice for ", , ,"!
None of us took care of the first or last field except your sed -r (but needs to run twice)! I would put '^' second in the (|) list, as less often true, for speed.
Hi, I found out this works by trying it out. Good point about the ^ in the second part and especially the ,,, or more situation. I hadn't thought of that...:o . I also realized that looking for * is less efficient than * . So I think the best option is to run twice, not by piping , but by using a loop and leaving the g flag in (out is less efficient):
sed ':a;s/\(,\|^\) *\(,\|$\)/\1\2/g;ta'
or GNU sed.
sed -r ':a;s/(,|^) *(,|$)/\1\2/g;ta'
or some older seds:
sed -e ':a' -e 's/\(,\|^\) *\(,\|$\)/\1\2/g;ta'
of course every space can be replaced by [ \t] if the need arises.
I am sorry but I dont agree some ideas with you and DGPickett
whats happening there?
I say if there is even other chars (expect null character ) in between pattern match then there is no diffrerent use twice in this example.
because we search spaces or tabs in between two commas..
I do not understand what you are referring to? A loop or running sed twice is necessary if you want to do a replace in two adjacent fields. This is because sed starts to do a replace at the character after where it stopped the last time. If you include the second comma in the pattern then the field that follows is not matched because the first comma was matched the previous time.
A couple of suggestions leave out the second comma, but then all whitespace gets deleted even in fields where the are other characters, which is not what the OP was asking.
I added the possibility of doing the replace not only in fields between comma's, but also for the two fields that have only one comma, namely the first and the last. Which is not precisely what the OP asked, but likely what he requires, since he speaks of fields separated by commas.
I dont understant you too..
i say already when necessary or "not necessary" twice usage ..
and i explain with examples when DGPickett says in this example usage is mandatory..I answer to him related this..I hope this is enough.
And lets come your examples
I said loop is not necessary
It was not a matter of which method is faster, but of which method actually works. Have a look at the 4th field that contains spaces and the letters sd, and what comes after that in post #17. That field should remain unchanged, but it is not, so IMO your original "non-loop" suggestion is not working properly. This needs to be done by either running the sed twice or through a loop like I suggested.
This time you are leaving your single step approach and you are presenting yet another alternative, with two search and replace statements (which you said earlier would not be necessary), which is like a loop with the use of the g flag.
One thing I noticed just now is that in post#11 in there was a g flag used but in the middle the sed -r option the g flag was accidentally missing. It is a little bit more efficient to leave that in as I noted in that same post. When we add the g-flag as was intended:
sed -r ':a;s/(,|^) *(,|$)/\1\2/g;ta'
There is no significant difference on my system.
---------- Post updated 01-11-10 at 00:13 ---------- Previous update was 31-10-10 at 23:00 ----------
I tested you suggestion but it does not work properly for the last field:
$ echo ' , abc, , sd , , , , ' | sed -r ':a;s/(,|^) *(,|$)/\1\2/g;ta' | od -c
0000000 , a b c , ,
0000020 s d , , , , \n
0000034
cuts the spaces in the last field like it should, whereas
$ echo ' , abc, , sd , , , , ' | sed -r 's/( *,|,|^) *(,|$)/\1\2/g;s/, *,/,,/g' | od -c
0000000 , a b c , ,
0000020 s d , , , ,
0000040 \n
0000042
See, one pass misses the adjacent field of spaces, and without the (), ^ and $ extended regex, the first and last fields are not done. You can work around that with two passes, space-space-asterisk to ignore the empty fields possibly from the first pass, and with extra commas: