Remove leading and trailing spaces from a file

svajhala · July 12, 2017, 4:51pm

Hi,

I am trying to remove leading and trailing spaces from a file using awk but somehow I have not been able to do it.

Here is the data that I want to trim.

07/12/2017 15:55:00               |entinfdev        |AD ping Time ms          |      .474|      1.41|      .581|green        |flat
07/12/2017 15:55:00              |entinfdev                         |CPU Busy%             |         1|         2|         1|green      |flat
07/12/2017 15:55:00        |entinfdev            |Collected at                |1499888700|1499889000|1499889300|grey        |flat
07/12/2017 15:55:00      |entinfdev     |FS /tmp Used%             |         1|         1|         1|green  |flat
07/12/2017 15:55:00 |entinfdev    |FS /var/tmp Used%            |        74|        74|        74|orange    |flat

And I want the data to be looked like this.

07/12/2017 15:55:00|entinfdev|AD ping Time ms|.474|1.41|.581|green|flat
07/12/2017 15:55:00|entinfdev|CPU Busy%|1|2|1|green|flat
07/12/2017 15:55:00|entinfdev|Collected at|1499888700|1499889000|1499889300|grey|flat
07/12/2017 15:55:00|entinfdev|FS /tmp Used%|1|1|1|green|flat
07/12/2017 15:55:00|entinfdev|FS /var/tmp Used%|74|74|74|orange|flat

Please let me know if there a better way to do it.

Thank you.

Scott · July 12, 2017, 5:12pm

Seems to be an easier job for sed than awk. Show us the awk you have, maybe we can work with that.

wbport · July 12, 2017, 5:25pm

Try using sed. First create a file like demo.sed containing

s/ *\| */|/g

and run it with

sed -f demo.sed myFileToConvert >convertedFile

Of course "demo" can be replaced with something meaningful in your shop.

What this does is delete any spaces on either side of the "|" character, as many times as needed for each line.

HTH

svajhala · July 12, 2017, 5:29pm

I tried the following awk command.

awk 'BEGIN{FS=" * *"; OFS="|"} {$1=$1; print}' temp_dbinfo.7880

However its inserting 'pipes' where it is not required as shown below. Also in some places it is inserting double pipes instead of one.

07/12/2017|15:55:00||entinfdev||AD|ping|Time|ms|||.474||1.41||.581|green||flat
07/12/2017|15:55:00||entinfdev||CPU|Busy%|||1||2||1|green||flat
07/12/2017|15:55:00||entinfdev||Collected|at||1499888700|1499889000|1499889300|grey||flat
07/12/2017|15:55:00||entinfdev||FS|/tmp|Used%|||1||1||1|green||flat

I want the date column and few other columns to be printed without the '|' between the date and the time, Something like,

07/12/2017 15:55:00

AD ping Time ms

Scrutinizer · July 13, 2017, 1:07am

Try:

awk '{for(i=1; i<=NF; i++) sub(/^[ \t]+|[ \t]+$/,x,$i)}1' FS=\| OFS=\|  file

RudiC · July 13, 2017, 1:20am

How about

awk 'gsub(/ *\| */, "|")+1' file

Scrutinizer · July 13, 2017, 1:58am

@RudiC: I'd say that would work if there are only spaces and no tabs and there is no leading space in the first field and no trailing space in the last field.

Don_Cragun · July 13, 2017, 3:01am

Building on what RudiC suggested and avoiding the issues mentioned by Scrutinizer (which also applies to the sed code suggested by wbport), you could try:

awk 'gsub(/[[:space:]]*\|[[:space:]]*/, "|")+gsub(/^[[:space:]]+|[[:space:]]+$/, "")+1' temp_dbinfo.7880

If you are using a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk .

svajhala · July 13, 2017, 1:39pm

It worked perfectly for my requirement. Thank you Scrutinizer!

Would you spare few minutes to explain the awk command that you posted.

Thanks.

Scrutinizer · July 13, 2017, 4:56pm

Hi, you are welcome. It means the following:

The awk code use "|" as both input and output field separator (using the FS and OFS awk variables).
The for loop iterates over those fields
for every field the gsub command replaces leading space (one or more space characters, space or TAB) ^[ \t]+ and trailing space [ \t]+$ by the empty string (contained in uninitialized variable "x").
The "1" means "print the record"..
This is repeated for every line/record in the file