SED help - cleaning up code, extra spaces won't go away

Matthias03 · April 24, 2010, 8:44am

Hello,

W/in the script I'm working on, I have a need to take a column from a file, and format it so I can have a variable that will egrep for & invert the regex from another file.

My solution is this:

VAR=`awk -F, '{print $2}' $FAIL | sed 's/-i/\|/g'`
VAR2=`echo $VAR | sed 's/ //g;s/^.\{1\}//g'`

egrep "$VAR2" file.txt >> newfile.txt
egrep -v "$VAR2" file.txt >> newfile.txt

The above works, but it's just ugly sed code.

All together...
input.txt:

-h a, -i b, -j c
-h d, -i e, -j f
-h g, -i h, -j i

VAR=`awk -F, '{print $2}' input.txt | sed 's/-i/\|/g'`

... echo $VAR, prints: | b | e | h

So I'm running the variable again, through "another filter":

VAR2=`echo $VAR | sed 's/ //g;s/^.\{1\}//g'`

... echo $VAR2, prints b|e|h
... which is what I'd need for a good egrep command.

What I don't understand, why 'this' isn't working:

VAR=`awk -F, '{print $2}' input.txt | sed 's/-i/\|/g;s/^.\{1\}//g;s/ //g'`

???
It'll produce something like this:
b e h
... I'm losing the "|", and the spaces are still there?

Can someone please help me understand what I'm doing wrong w/ SED? I can get the results I want, but it's ugly. Is what I'm doing correct? Lastly... if there's nothing wrong w/ how I'm doing things... is there a better or more efficient way?

Thanks everybody.

vgersh99 · April 24, 2010, 9:04am

nawk -f matt.awk input.txt file.txt

matt.awk:

BEGIN {
  FS=","
}
FNR==NR {
   match($2, "[^ ][^ ]*$")
   str=substr($2, RSTART)
   regex=(!regex)?str:regex "|" str
   next
}
$0 !~ regex

durden_tyler · April 24, 2010, 9:16am

matthias03:

...
What I don't understand, why 'this' isn't working:
VAR=`awk -F, '{print $2}' input.txt | sed 's/-i/\|/g;s/^.\{1\}//g;s/ //g'`
???
It'll produce something like this:
b e h
... I'm losing the "|", and the spaces are still there?
...

Well, I do see the "|" characters in my output:

$ 
$ cat -n input.txt
     1    -h a, -i b, -j c
     2    -h d, -i e, -j f
     3    -h g, -i h, -j i
$ 
$ awk -F, '{print $2}' input.txt | sed 's/-i/\|/g;s/^.\{1\}//g;s/ //g'
|b
|e
|h
$

But if I assign the quoted output of that pipeline to a shell variable, then spaces are introduced.

$ 
$ VAR="`awk -F, '{print $2}' input.txt | sed 's/-i/\|/g;s/^.\{1\}//g;s/ //g'`"
$ 
$ echo $VAR
|b |e |h
$

The command pipeline's output did *not* have spaces at the end:

$ 
$ awk -F, '{print $2}' input.txt | sed 's/-i/\|/g;s/^.\{1\}//g;s/ //g' | od -bc
0000000 174 142 012 174 145 012 174 150 012
          |   b  \n   |   e  \n   |   h  \n
0000011
$

So I guess the shell replaces those newlines by blank spaces when it is assigned to a (shell) variable:

$ 
$ VAR=`awk -F, '{print $2}' input.txt | sed 's/-i/\|/g;s/^.\{1\}//g;s/ //g'`
$ echo $VAR
|b |e |h
$ 
$ ## or quoted
$ VAR="`awk -F, '{print $2}' input.txt | sed 's/-i/\|/g;s/^.\{1\}//g;s/ //g'`"
$ echo $VAR
|b |e |h
$ 
$ echo $VAR | od -bc
0000000 174 142 040 174 145 040 174 150 012
          |   b       |   e       |   h  \n
0000011
$

Here's another way to extract a pipe-delimited output from the file using plain awk:

$ 
$ cat -n input.txt
     1    -h a, -i b, -j c
     2    -h d, -i e, -j f
     3    -h g, -i h, -j i
$ 
$ awk '{sub(",","",$4); x = NR==1 ? $4 : x"|"$4} END{print x}' input.txt
b|e|h
$

tyler_durden

alister · April 24, 2010, 11:28am

The value of $VAR does contain the newlines. It may seem that the shell is converting newlines to spaces, but it is not. Since the variable expansion is unquoted, it's splitting the result into words at each of those newlines (and spaces and tabs, assuming a default value for IFS). echo never sees the newlines. echo does its job, printing its arguments as space-delimited list. If VAR is double-quoted, echo will be invoked with one argument which will contain newlines.

$ VAR=`awk -F, '{print $2}' input.txt | sed 's/-i/\|/g'`
$ echo $VAR
| b | e | h
$ echo "$VAR"
 | b
 | e
 | h

Sidenote: Just as echo isn't seeing any of the newlines (because the shell "consumed" them during the field splitting step), echo is not seeing any of the spaces either. You are just less likely to miss them since echo prints a space to delimit its arguments, which are often space delimited to begin with. (Although you would notice that multiple-consecutive spaces are squeezed into one.)

$ VAR='a b c'
$ # Gives the impression that field splitting did not happen, but it did.
$ echo $VAR
a b c
$ VAR='a          b              c'
$ echo $VAR
a b c
$ echo "$VAR"
a          b              c

Regards,
Alister

Scrutinizer · April 24, 2010, 11:48am

In short, try using:

awk -F, '{printf $2}' input.txt

alister · April 24, 2010, 1:23pm

matthias03:

What I don't understand, why 'this' isn't working:
VAR=`awk -F, '{print $2}' input.txt | sed 's/-i/\|/g;s/^.\{1\}//g;s/ //g'`
???
It'll produce something like this:
b e h
... I'm losing the "|", and the spaces are still there?

Can someone please help me understand what I'm doing wrong w/ SED? I can get the results I want, but it's ugly. Is what I'm doing correct? Lastly... if there's nothing wrong w/ how I'm doing things... is there a better or more efficient way?

Thanks everybody.

I cannot reproduce the "b e h" pipe-loss result.

My results:

$ cat input.txt 
-h a, -i b, -j c
-h d, -i e, -j f
-h g, -i h, -j i
$ VAR=`awk -F, '{print $2}' input.txt | sed 's/-i/\|/g;s/^.\{1\}//g;s/ //g'`
$ echo $VAR
|b |e |h

Personally, I'm partial to ...

awk -F'[, ]+' '{print $4}' input.txt | paste -sd\| -
b|e|h

... and ...

awk -F'[, ]+' '{printf("%s", (NR!=1 ? "|" : "") $4)}' input.txt
b|e|h

Regards,
Alister