Confusing sed error message

Ralph · December 16, 2018, 6:57am

This situation is extracted from a larger context. My intention for now is to escape the forward slashes in the path of a filename. (Ultimately the LINEs will come from a file.)

while read LINE ; do
        sed 's/\//\\\//g' <<< "$LINE"    # ok
        escaped=`sed 's/\//\\\//g' <<< "$LINE"`   # error message
        echo $escaped
done <<here
==: dir1/dir2/file1 dir3/dir4/file2
==: dir5/dir6/file3 dir1/file4
==: dir3/file5 dir3/file6
==: dir1/file4 dir5/dir6/file3
==: dir3/dir4/file2 dir1/dir2/file1
==: dir3/file6 dir3/file5
here

While the direct sed output looks the way I want the next line where I try to assign that to a variable gives me an error message:

==: dir1\/dir2\/file1 dir3\/dir4\/file2
sed: -e expression #1, char 9: unknown option to `s'

What's the problem?

--- Post updated at 11:57 AM ---

Well... when I use $() instead of the `` it works as it should:

while read LINE ; do
        sed 's/\//\\\//g' <<< "$LINE"
#        escaped=`sed 's/\//\\\//g' <<< "$LINE"`   # doesn't work
        escaped=$(sed 's/\//\\\//g' <<< "$LINE")   # works
        echo $escaped \<-\$escaped
done <<here
==: dir1/dir2/file1 dir3/dir4/file2
==: dir5/dir6/file3 dir1/file4
==: dir3/file5 dir3/file6
==: dir1/file4 dir5/dir6/file3
==: dir3/dir4/file2 dir1/dir2/file1
==: dir3/file6 dir3/file5
here

Output:

==: dir1\/dir2\/file1 dir3\/dir4\/file2
==: dir1\/dir2\/file1 dir3\/dir4\/file2 <-$escaped
==: dir5\/dir6\/file3 dir1\/file4
==: dir5\/dir6\/file3 dir1\/file4 <-$escaped
etc.

But what's the difference between the two? The Bash Reference doesn't mention any.

nezabudka · December 16, 2018, 8:17am

There are spaces in the string

escaped=`sed 's/\//\\\//g' <<< "$LINE"`   # error message

change so

escaped="$(sed 's/\//\\\//g' <<<"$LINE")"

separator in s (subtitution) can be any character for example % or |

excuse me may be

read -r

Ralph · December 16, 2018, 8:39am

Right. But my main question is now why does this work
$(sed 's/\//\\\//g' <<< "$LINE")
but not this:
`sed 's/\//\\\//g' <<< "$LINE"`

nezabudka · December 16, 2018, 8:51am

it's works

escaped=`sed 's|/|\\\/|g' <<< "$LINE"`

and so

escaped=`sed 's/\//\\\\\//g' <<< "$LINE"`

Ralph · December 16, 2018, 10:15am

This
escaped=`sed 's|/|\\\/|g' <<< "$LINE"``
and this
escaped=`sed 's/\//\\\\\//g' <<< "$LINE"`

is not this

`sed 's/\//\\\//g' <<< "$LINE"`

Thanks for your suggestions, though. I'll look into it. For now I will have to stick to

$(sed 's/\//\\\//g'  <<< "$LINE")

--- Post updated at 03:15 PM ---

Actually, what I'm trying to do is remove duplicate pairs from a file like this:

==: dir1/dir2/file1 dir3/dir4/file2
==: dir5/dir6/file3 dir1/file4
==: dir3/file5 dir3/file6
==: dir1/file4 dir5/dir6/file3
==: dir3/dir4/file2 dir1/dir2/file1
==: dir3/file6 dir3/file5

I find out it doesn't really work if I redirect the file into a while-loop that uses read to read a line, like this:

while read $LINE ; do
   swap column 2 with column 3
   remove swapped line from file (using sed)
done < file

I got the idea because while read works line by line from the beginning of the file the swapped line is always located behind the other one so if I remove it read will never see it. But apparently the entire original file is still available to read no matter what I remove.

Is there is a better approach?

Scrutinizer · December 16, 2018, 10:28am

Backticks have been deprecated for a long time. They offer no advantage over $( ... ) and have quoting nesting and escaping issues.

Unless you are writing for a legacy shell like pre-Posix Bourne shell, use $( ... ) instead. I never use them.

Unless you need further line level processing in shell, you could of course use:

sed 's|/|\\/|g' << "here"
...
here

If you need the line processing in a loop, calling an external program with each iteration is expensive..
If you use bash, ksh93 or zsh as a shell you could use something like this (parameter expansion):

while read LINE
do
  escaped=${LINE//\//\\/} 
  echo "$escaped"
done << "here"
...
here

-or-

Feed the sed output into a loop (and use read's -r option):

{
  sed 's|/|\\/|g' << "here"
...
here
} |
while read -r LINE
do
  echo "processing ${LINE}"
done

--
Note: as was suggested a different delimiter removes the need for the escape for the forward slash, which makes the code more readable. In the examples given there was one escape too many:

sed 's|/|\\/|g'

Also, to prevent headaches, it is recommended to quote variable expansions, so use:

echo "$escaped"

or better yet:

printf "%s\n" "$escaped"

Scrutinizer · December 16, 2018, 4:26pm

ralph:

[..]
--- Post updated at 03:15 PM ---

Actually, what I'm trying to do is remove duplicate pairs from a file like this:
==: dir1/dir2/file1 dir3/dir4/file2
==: dir5/dir6/file3 dir1/file4
==: dir3/file5 dir3/file6
==: dir1/file4 dir5/dir6/file3
==: dir3/dir4/file2 dir1/dir2/file1
==: dir3/file6 dir3/file5
I find out it doesn't really work if I redirect the file into a while-loop that uses read to read a line, like this:
while read $LINE ; do
   swap column 2 with column 3
   remove swapped line from file (using sed)
done < file
I got the idea because while read works line by line from the beginning of the file the swapped line is always located behind the other one so if I remove it read will never see it. But apparently the entire original file is still available to read no matter what I remove.

Is there is a better approach?

Assuming the fields in your input file are whitespace separated, you could try this approach:

awk '!A[$2,$3]++ && !A[$3,$2]++' file

RudiC · December 16, 2018, 4:43pm

ralph:

...
while read $LINE ; do
.
.
.
I got the idea because while read works line by line from the beginning of the file the swapped line is always located behind the other one so if I remove it read will never see it. But apparently the entire original file is still available to read no matter what I remove.

Is there is a better approach?

You might want to read the input data into several variables, not just the one $LINE . Like while read V1 DIR1 DIR2 V2 , and the operate on the two DIR variables...

Ralph · December 17, 2018, 5:00pm

Thanks. That worked well.

I also modified my script to avoid duplicates in the first place, using an array to save filenames and compare them to incoming new ones.

It is reassuring that the results are the same as with awk '!A[$2,$3]++ && !A[$3,$2]++' file .

Now spending some time with awk to figure out what that actually does...

--- Post updated at 10:00 PM ---

Hm...
I converted this, with some trial and error, into what I think an awk program would look like:

#!/usr/bin/awk -f
{
#       !A[$2,$3]++ && !A[$3,$2]++
        !A[$2 " " $3]++ && !A[$3 " " $2]++
}
END {
        for ( i in A ){
                if ( A == 1 )
                print A, i;
        }
}

Two questions:
1) If I keep the A[$2,$3] and A[$3,$2] then the output produces a funny character between the two filenames but the command line version works fine. What is the problem?
2) How does the command line version know to print only those keys (i) for which the count is 1?

I'm using GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)

(I'll figure it out somehow but it's getting late and it doesn't hurt to ask.)
Thanks.

nezabudka · December 18, 2018, 7:00am

awk does not print a key equal to 1. It's print when the key is zero. Post increment ++
Value is first equal to 0 and then 1, 2, etc.
0 == fals == no print
1 == true == print
but the sign of inversion ! makes so
!0 == true == print
!1 == false == no print
!2 == false == no print
when you enclose the script in brackets you can not put an inversion sign
In the body it does not work

Ralph · December 18, 2018, 7:25am

The keys are '$3,$2' and '$2,$3'. Right? The values after the entire file has been processed are either 2 or 1.

What I mean is when I leave out the END block nothing get's printed.

#!/usr/bin/awk -f
{
       !A[$2,$3]++ && !A[$3,$2]++
}

What awk program will deliver the same result as the one-liner awk !A[$2,$3]++ && !A[$3,$2]++ file ?

(I suspect I'm missing something obvious.)

nezabudka · December 18, 2018, 8:15am

I probably corrected my message when you answered
Excuse me. re-read it please

awk '1' file
awk '0' file
awk '!0' file

--- Post updated at 13:09 ---
print array

awk '{A[$2" "$3]++; A[$3" "$2]++} END {for(i in A) print i}'

--- Post updated at 13:15 ---

in

'!A[$0]++'

awk doesn't print the array, it just gets the true or false for output or not for evry line

Scrutinizer · December 18, 2018, 11:14am

ralph:

The keys are '$3,$2' and '$2,$3'. Right? The values after the entire file has been processed are either 2 or 1.

What I mean is when I leave out the END block nothing get's printed.
#!/usr/bin/awk -f
{
   !A[$2,$3]++ && !A[$3,$2]++
}
What awk program will deliver the same result as the one-liner awk !A[$2,$3]++ && !A[$3,$2]++ file ?

(I suspect I'm missing something obvious.)

You would need to leave out the braces:

#!/usr/bin/awk -f
!A[$2,$3]++ && !A[$3,$2]++

Everything in awk has the form condition{action} . If the condition evaluates to 1 then the action is performed. If the condition is omitted then the default condition is 0, so the action is not performed. If the action is omitted then - if the condition is 1 - the default action is performed, which is {print $0} , which is "print the record", by default a line of the input file.

Since there is just a condition with the default action, in this case there are no braces.