Replacing multiple spaces in flat file

Greetings all

I have a delimited text file (the delimiter is ';') where certain fields consist of many blanks e.g. ; ; and ; ;

Before I separate the data I need to eliminate these blanks altogether.

I tried the sed command using the following syntax:

sed -i 's/; *;/;;/g' <filename>

The * indicates that however many spaces are found between two semi-colons should be replaced by nothing so the string to replace with is an empty string surrounded by two semi-colons.

However I see that this only works on ASCII files whereas my files are ISO-8859. In these files this command doesn't detect multiple spaces with the * character.

Would anyone have some ideas?

Thanks & regards

Tony

Try ;[[:space:]]*;

Did you mean the command?

sed -i 's/;[[:space:]]*;/;;/g' <filename>

If so it didn't work? FYI. originally i was depicting the space with literally a space or ' ' and so the command took the form:

sed -i 's/; *;/;;/g' <filename>

Neither work at the moment.:confused:

Please post a hexdump / od of your file.

awk -F; '{ for(N=1; N<=NF; N++) sub(/^[ \r\n\t]*$/, "", $N); } 1' OFS=";" inputfile > outputfile

Here is the content of the file:

AAA;1; ; ;      ;  ;XXX
  BBB;2; ; ;             ;  ;YYY

That is neither a hexdump nor an od... it's not even in code tags.

Could you try my example? It should handle most whitespace, I hope.

Hello Corona

Your command does not seem to work. I admit I'm pretty new to UNIX so I copied it as it is only substituting input file and output file.

Is there anything else that needs to be substituted?

Thanks again

Tony

You need to provide details - like the error message and/or output you're getting.

Moderator comments were removed during original forum migration.

1 Like

Hello Corona

Thank you for your reply. Here are more details:

Command used:

awk -F '{ for(N=1; N<=NF; N++) sub(/^[ \r\n\t]*$/, "", $N); } 1' OFS=";" file1 > file2

Output:

[admdecdev@IPLSID02 QUOT]$ awk -F '{ for(N=1; N<=NF; N++) sub(/^[ \r\n\t]*$/, "", $N); } 1' OFS=";" file1 > file2
awk: OFS=;
awk:     ^ syntax error
awk: cmd. ligne:1: Each rule must have a pattern or action component

There is a huge difference between what Corona688 suggested:

awk -F; '{ for(N=1; N<=NF; N++) sub(/^[ \r\n\t]*$/, "", $N); } 1' OFS=";" inputfile > outputfile

and what you showed above:

 awk -F '{ for(N=1; N<=NF; N++) sub(/^[ \r\n\t]*$/, "", $N); } 1' OFS=";" file1 > file2

Corona688's code uses semicolon as a field separator and specifies an awk script to read input from inputfile and write output to outputfile . Your version of his script uses his entire awk script as a field separator.

2 Likes

I think the semicolon must be escaped -F";"

2 Likes

That was exactly it. With just the semicolon the command failed and I thus though maybe Corona had left the semi colon after awk inadvertently.

Now that I've put the semicolon within quotes, it works like a charm.

Thanks Corona, Don & MadeInGermany.

I do have a last question though - is it possible to maintain the same filename with a single command? That is erase the file with the blanks trimmed and name it as the original file without having to go through multiple steps like delete original file and rename new file as original?

Yes, with perl.
Perl not only has -i but also has a "look-ahead" feature in its regexp, so we can use the /g modifier:

perl -i -pe 's/;\s+(?=;)/;/g' inputfile

---------- Post updated at 02:19 PM ---------- Previous update was at 01:11 PM ----------

Just seeing it does not empty the first and last fields.
The most comprehensive and correct expression seems to be

perl -i -lpe 's/(^|;)\s+((?=;)|$)/$1/g' inputfile
1 Like

Again... works like a charm.. Thanks a lot