Cannot get to convert multiple spaces to one space

npatwardhan · March 25, 2010, 4:53pm

Hi Guys,

I am using a Redhat Linux Centos machine and trying to convert multiple spaces in a file to one space. I am using:

sed '/./,/^$/!d' input_file > output_file

I also tried

cat -s

Both gave me no change in the output file.
I tried this on cygwin and it worked just fine. I am using the bash shell.

Is there any other "universal" way to do this?

Thanks.

EAGL · March 25, 2010, 5:06pm

Hi, I hope this works

awk '{gsub(/[ ]+/," ")}1' FILE

rdcwayx · March 25, 2010, 7:46pm

awk '{$1=$1}1' FILE
tr -s ' '
sed 's/\( \+\)/ /g'

alister · March 26, 2010, 11:28am

The AWK proposal should definitely not be used. The goal is to squeeze multiple spaces into one, but that AWK oneliner will delete all leading and trailing whitespace (even if it's just a single space). Worse, it will delete any tabs in the data.

A nitpick regarding the sed, \+ I believe is a gnu basic regular expression extension. In standardized basic regular expressions, a plus sign is an ordinary character and a backslash will not make it into a multiplier. Instead of x\+, it would be more portable to use either x\{1,\} or xx*

I vote for the tr solution

Cheers,
Alister

asalman.qazi · March 26, 2010, 11:56am

can somebody explain the working of the awk option


awk '{$1=$1}1' FILE

what is {$1=$1}1 in the above option doing . Thanks in advance

EAGL · March 26, 2010, 12:20pm

Hi Alister,

After seing your comment i have tested my code below with leading and trailing spaces and/or with tabs again, fortunately it worked well in all of this cases
But good to remember that we can use "tr" too.

awk '{gsub(/[ ]+/," ")}1' FILE

alister · March 26, 2010, 2:12pm

AWK programs are pairs of patterns and actions. Either the pattern or the action may be absent, but not both. A missing pattern evaluates to true for all lines. A missing action (the code withing curly braces) defaults to "{print $0}", which prints the line.

That code represents two pattern-action pairs.

The first pair is "{$1=$1}". This pair is missing the pattern, so it will match on every line read. The action is the code between the curly braces. For each line read, it assigns the value of $1 to $1. This does not change the field's value, but it does cause $0 to be recomputed, with each field separated by OFS (output field separator, whose default value is a space). This leads to the loss of leading and trailing whitespace, and replacement of whatever strings were originally field separators (one space, multiple spaces, tabs, combinations thereof) with one OFS.

The second pair is simply the traling one, "1", nothing more. The pattern is the "1", which is a boolean true and will cause its corresponding action to execute for every line read by AWK. The action however is missing, so it defaults to printing the line (as stated above). You see this idiom used often by AWKers trying to save a few characters; it is shorthand for printing the current line ($0).

Cheers,
Alister

ygemici · March 26, 2010, 3:27pm

mutiple spaces convert to one space

 
sed 's/  */ /' FILE

also if your text has tab spaces and convert to one space

 
sed -e 's/  */ /g'  -e 's/\t/ /g' FILE

And if you want save the changes to file you can use -i parameter

for example

 
sed -i -e 's/  */ /g'  -e 's/\t/ /g' FILE