Removing extra unwanted spaces

hi,

i need to remove the extra spaces in the 2nd field.

Sample:

abc|bd |bkd123 .. 1space
abc|badf  |bakdsf123 .. 2space
abc|bqe   |bakuowe .. 3space

Output:

abc|bd|bkd123
abc|badf|bakdsf123
abc|bqe|bakuowe

i used the following command,

nawk -F\~ 'OFS=FS { gsub(" ", "", $2) }1'

but it doesnt remove if it has more than one space

sed 's/ .*|/|/' <file.txt>

output:

abc|bd|bkd123 .. 1space
abc|badf|bakdsf123 .. 2space
abc|bqe|bakuowe .. 3space

You using wrong field seperator friend.
Try using this one :

$ awk -F\| 'OFS=FS {gsub(" ","",$2);print}' test

Input :

$ cat test
abc|bd     |bkd123
abc|badf   |asdasf132e2
abc|bqe      |badfafa

Output :

abc|bd|bkd123
abc|badf|asdasf132e2
abc|bqe|badfafa

Your command should work once you give it the correct field separator (maybe a bit tricky though for "|"; see other recent posts)

I guess

tr -d ' ' <infile

would do too much as it removes all spaces in the entire file...

try this...

$ cat file
abc|bd |bkd123 .. 1space
abc  |badf  |bakdsf123 .. 2space
abc |bqe   |bakuowe .. 3space

$awk -F "|" '{ gsub(" ","",$2)}1' OFS=\| file

abc|bd|bkd123 .. 1space
abc  |badf|bakdsf123 .. 2space
abc |bqe|bakuowe .. 3space

This will work only for the first occurrence in a line, because you omitted the "g" in hte commands options, probably an oversight. But even given this it will work only in the sample case of trailing blanks, but will fail in case of blanks within values:

|first|second  |third fourth|

will give you:

|first|second|third|

This one would work as expected, though (replace "<spc>" and "<tab>" with literal spaces/tabs):

sed 's/[<spc><tab>][<spc><tab>]*|/|/g' <file.txt>

I hope this helps.

bakunin

Both behaviors may very well be correct.

Regards,
Alister

---------- Post updated at 11:51 AM ---------- Previous update was at 11:47 AM ----------

For sed, I would suggest the following, which is resilient to trailing spaces in the first field:

sed 's/ *|/|/2'

Regards,
Alister

True. Re-reading the thread-O/Ps goal description i have to admit it is not at all clear what he really wants and i might have misunderstood him.

For sed, I would suggest the following, which is resilient to trailing spaces in the first field:

sed 's/ *|/|/2'

[/quote]

hmm, it will change the second instance of trailing blanks in any field, not the trailing blanks in the second field, no?

I suggest the following, revised regexp, which will only change trailing blanks and only so in the second field:

sed 's/^\([^|]*|[^<spc><tab>]\)[<spc><tab>]*|/\1|/' /path/to/input

Arguably one of the most fire-proof regexps i have ever written. ;-))

bakunin

No. It will remove trailing spaces (if present) only in the second field. The first field will always match, since the spaces are optional (there is only one space in my regular expression). The second match will always be the second field.

I believe you omitted a * quantifier, otherwise your re will only match a second field with a single non-blank followed by blanks. But including the * could lead to a problem with greediness spilling over into subsequent fields if the second field has no blanks. The first [^<spc><tab>] should be [^|] . Regardless, it will fail to remove trailing blanks if there is an embedded blank.

If trailing spaces in the second field need to be deleted, I believe my suggestion is both accurate and robust.

Regards,
Alister

2 Likes

You know what? You are right. Upon taking a nights sleep and a fresh look i come to the conclusion your solution is more robust than mine.

Most of my colleagues start having something else to do when i say "sed", so i enjoy to be able to discuss regexps for a change.

bakunin

1 Like