Search for an undetermined number of spaces

bathtime · February 24, 2018, 9:29am

I would like to find an undetermined number of spaces and shorten them to one space. I am running Debian in mksh. Script running in #!/bin/sh. Sorry to not include all code. The program is too big and involves an online file... too much hassle to solve a simple issue.

Ex.,

I start with (pretend the periods are spaces),

"This . . . . sentence has . . . . . . an undetermined . . number of . spaces between . . . . . . each word."

The result would be:

"This sentence has an undetermined number of spaces."

What I have so far is working but extremely poor code:

        {gsub (/  /, " ")}
        {gsub (/  /, " ")}
        {gsub (/  /, " ")}
        {gsub (/  /, " ")}
        {gsub (/  /, " ")}
        {gsub (/  /, " ")}
        {gsub (/  /, " ")}
        {gsub (/  /, " ")}

I've tried several combinations to no effect. I've been reading up on awk in a tutorial, but so far no mention of this situation.

RudiC · February 24, 2018, 9:56am

Did you consider the

tr -s

command?

bathtime · February 24, 2018, 10:09am

This looks like a great program for one/two job operations, and I'll likely use it for other tasks now that I know it exists; but, I wanted to stick with awk and keep the programs to a minimum.

I should have mentioned that.

Scrutinizer · February 24, 2018, 10:11am

Try:

awk '{$1=$1}1' file

What is does is translate the default field separator (one or more space or TAB characters) to the default output separator (a single space).

bathtime · February 24, 2018, 10:20am

I was just about to post that I was not looking for the above code. I have used this, and it collapses return keys and other useful characters.

I only want to reduce spaces beyond a single space to a single space using awk.

Sorry for the runaround. :o

RudiC · February 24, 2018, 10:21am

Try also

awk '{gsub (/ +/, " ")}1' file

or

awk '{gsub (/  */, " ")}1' file

bathtime · February 24, 2018, 10:38am

Both work perfectly! I was so close; I had only to add another space before the asterik in the second method and it would have worked!

Thank you all!

Scrutinizer · February 24, 2018, 11:05am

No worries . Another option may be to just adjust the field separator:

awk -F' *' '{$1=$1}1'

or perhaps a tiny tiny bit more efficient:

awk -F'  +' '{$1=$1}1' file

But what if you have a space and a TAB together? Do you leave the space or delete it?

bathtime · February 24, 2018, 5:19pm

scrutinizer:

No worries . Another option may be to just adjust the field separator:
awk -F' *' '{$1=$1}1' 
or perhaps a tiny tiny bit more efficient:
awk -F'  +' '{$1=$1}1' file 
But what if you have a space and a TAB together? Do you leave the space or delete it?

As far as I know there are no tabs to deal with in the program, thankfully!

I ended up using this code:

awk '{gsub (/ +/, " ")}1' file

But if your code is more efficient then I could use it instead. I just don't know how to append it to the end of my script:

awk ... blah blah blah ... {print $mytexttobedespaced} ... -F' +' '{$1=$1}1'

I don't want to use another pipe, and I cannot use this at the beginning of the code as it's a job that needs to wait til the end. Either way, the previous code works well enough.

bakunin · February 25, 2018, 2:29am

Just in case there are tabs:

awk '{gsub (/[<space><tab>]+/, " ")}1' file

Replace "<space>" and "<tab>" with literal space/tab characters when using this.

I hope this helps.

bakunin

abdulbadii · February 25, 2018, 9:16am

echo "This    sentence has     an undetermined    number of   spaces between 
  each word."| sed -r 's/\s+/ /g'

Don_Cragun · February 25, 2018, 9:45am

If one doesn't want to use bakunin's suggestion because one is afraid the someone reading the code might not notice the literal <space> or literal <tab> in the ERE (and isn't willing to add a comment noting that the characters in that ERE are a literal <space> and <tab>), one could also use:

awk '{gsub (/[[:blank:]]+/, " ")}1' file

which does the same thing in most locales and is self-documenting. And, in locales where additional characters are members of the blank character class, you might also want this global substitution to produce replacements for them as well. If your code is likely to be used in locales like this, you should seriously consider whether you want to hard code these two characters or you want to use the character class definition.