Remove leading and trailing spaces from a file

Hi,

I am trying to remove leading and trailing spaces from a file using awk but somehow I have not been able to do it.

Here is the data that I want to trim.

07/12/2017 15:55:00               |entinfdev        |AD ping Time ms          |      .474|      1.41|      .581|green        |flat
07/12/2017 15:55:00              |entinfdev                         |CPU Busy%             |         1|         2|         1|green      |flat
07/12/2017 15:55:00        |entinfdev            |Collected at                |1499888700|1499889000|1499889300|grey        |flat
07/12/2017 15:55:00      |entinfdev     |FS /tmp Used%             |         1|         1|         1|green  |flat
07/12/2017 15:55:00 |entinfdev    |FS /var/tmp Used%            |        74|        74|        74|orange    |flat

And I want the data to be looked like this.

07/12/2017 15:55:00|entinfdev|AD ping Time ms|.474|1.41|.581|green|flat
07/12/2017 15:55:00|entinfdev|CPU Busy%|1|2|1|green|flat
07/12/2017 15:55:00|entinfdev|Collected at|1499888700|1499889000|1499889300|grey|flat
07/12/2017 15:55:00|entinfdev|FS /tmp Used%|1|1|1|green|flat
07/12/2017 15:55:00|entinfdev|FS /var/tmp Used%|74|74|74|orange|flat

Please let me know if there a better way to do it.

Thank you.

Seems to be an easier job for sed than awk. Show us the awk you have, maybe we can work with that.

Try using sed. First create a file like demo.sed containing

s/ *\| */|/g

and run it with

sed -f demo.sed myFileToConvert >convertedFile

Of course "demo" can be replaced with something meaningful in your shop.

What this does is delete any spaces on either side of the "|" character, as many times as needed for each line.

HTH

I tried the following awk command.

awk 'BEGIN{FS=" * *"; OFS="|"} {$1=$1; print}' temp_dbinfo.7880

However its inserting 'pipes' where it is not required as shown below. Also in some places it is inserting double pipes instead of one.

07/12/2017|15:55:00||entinfdev||AD|ping|Time|ms|||.474||1.41||.581|green||flat
07/12/2017|15:55:00||entinfdev||CPU|Busy%|||1||2||1|green||flat
07/12/2017|15:55:00||entinfdev||Collected|at||1499888700|1499889000|1499889300|grey||flat
07/12/2017|15:55:00||entinfdev||FS|/tmp|Used%|||1||1||1|green||flat

I want the date column and few other columns to be printed without the '|' between the date and the time, Something like,

07/12/2017 15:55:00
AD ping Time ms

Try:

awk '{for(i=1; i<=NF; i++) sub(/^[ \t]+|[ \t]+$/,x,$i)}1' FS=\| OFS=\|  file
1 Like

How about

awk 'gsub(/ *\| */, "|")+1' file

@RudiC: I'd say that would work if there are only spaces and no tabs and there is no leading space in the first field and no trailing space in the last field.

Building on what RudiC suggested and avoiding the issues mentioned by Scrutinizer (which also applies to the sed code suggested by wbport), you could try:

awk 'gsub(/[[:space:]]*\|[[:space:]]*/, "|")+gsub(/^[[:space:]]+|[[:space:]]+$/, "")+1' temp_dbinfo.7880

If you are using a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk .

It worked perfectly for my requirement. Thank you Scrutinizer!

Would you spare few minutes to explain the awk command that you posted.

Thanks.

Hi, you are welcome. It means the following:

  • The awk code use "|" as both input and output field separator (using the FS and OFS awk variables).
  • The for loop iterates over those fields
  • for every field the gsub command replaces leading space (one or more space characters, space or TAB) ^[ \t]+ and trailing space [ \t]+$ by the empty string (contained in uninitialized variable "x").
  • The "1" means "print the record"..
  • This is repeated for every line/record in the file