Handle special characters in awk -F

Hello Folks,

Need to bisect strings based on a subset.
Below works good.

echo /a/b/c/d | awk -F"/c/d$" '{print $1}'
/a/b

However, it goes awry with special characters.

echo /a/b/c+/d | awk -F"/c+/d$" '{print $1}'
/a/b/c+/d

Desired output:

/a/b

Escaping the special characters didn't help as well

echo "/a/b/c\+/d" | awk -F"/c\+/d$" '{print $1}'
/a/b/c\+/d

All the arguments get their values from variables.
Help on how to handle special characters here.

What is the real source of the strings you are processing? (If you are echoing constant strings into awk , it would make much more sense to remove the awk and just echo the string you want.)

Again, if you can modify the input string as well as the ERE, why are you using awk to modify it instead of just changing the echo to begin with.

Furthermore, if you are always trying to remove a fixed string from the end of an input line, use match() to find the fixed string instead of worrying about modifying special characters in an extended regular expression.

If you really need to use an ERE to split fields, give us a clear specification of what characters might be in the input that are special in an ERE.

For the specific examples your provided you could try:

echo /a/b/c+/d | awk -F"/c[+]/d$" '{print $1}'
/a/b

and:

echo /a/b/c\+/d | awk -F"/c[\][+]/d$" '{print $1}'
/a/b

Also note that using echo to feed data that might start with a minus sign or might contain a backslash character can produce radically different output depending on what shell you're using and on what system you're using when you use that shell.

Thanks for the inputs.

Both the input string & ERE are dynamically generated.
It's basically a folder path which need to be bisected based on current directory.
Both folder path & current directory depend on the machine & application being used.

We were using this happily for sometime now.
Recently a guy added directories with '++' where its breaking. It cannot have extreme cases as it need to be a directory name.

Do we have other options here?
Can parse the variable argument & change the special characters as shown above if nothing else works.

FWIW, this would work:

echo /a/b/c+/d | awk -F'/c\\+/d$' '{print $1}'

Single quotes and double escape of the +-character...

Or with double quotes:

echo /a/b/c+/d | awk -F"/c\\\+/d$" '{print $1}'

In effect it will require parsing the regular expression of awk & escaping it.
It will be the last option as value will be contained in a variable & will be dynamic.

More options up anyone's sleeve :slight_smile:

For maximum portability assume FS to be something simple. However, since the world has gone Linux (gawk)... realize that FS can be a single character or if not, then it's a regex. So... you want:

echo '/a/b/c+/d' | awk -F'/c[+]/d$' '{print $1}'

Which returns:

/a/b

Anything that doesn't require escaping the special characters will be nice.
Awk is not a must & any other utility will do.
Found awk to be giving the best results till now though.

you're likely using this in a shell program that already contains this string in a variable? if not, where is the string, a stream? a file?

how i'd solve this in shell variable:

$ remove='/c+/d'; var=/a/b/c+/d; echo "${var%"$remove"}"
/a/b

OK. So we now know that you have to variables. Let's call them WholePath and TrailingPath . And, you want to set another variable (let's call it ParentPath ) to the initial part of $WholePath with the contents of $TrailingPath removed from its end.

There are several ways to do this, including but not limited to: awk , expr , sed , and various shell parameter expansions. Some are much simpler and some are much faster than others depending on additional information that you haven't shared with us yet:

  1. What operating system are you using?
  2. What shell are you using?
  3. What is your shell's version number? (For instance, if you shell is bash or ksh , what is the output from bash --version or ksh --version , respectively?)
  4. Is $TrailingPath always present in $WholePath ? Or, do you need to determine if $TrailingPath is present in $WholePath and set ParentPath to the first part of $WholePath if it is present and to the entire contents of $WholePath if it isn't?

Bingo, the precise requirement.

  1. OS are FreeBSD & Ubuntu Linux 12.x
  2. Bash
  3. Version 3.2.39 & higher
  4. That's not a mandate. TrailingPath can or cannot be present in the WholePath

So, neutronscott's suggestion should do what you want:

#!/bin/bash
WholePath="/a/b/c+/d"
TrailingPath="/c+/d"
ParentPath=${WholePath%"$TrailingPath"}
printf '    with matching TrailingPath: "%s"\n' "$ParentPath"
TrailingPath="/no/match"
ParentPath=${WholePath%"$TrailingPath"}
printf 'with non-matching TrailingPath: "%s"\n' "$ParentPath"

which produces the output:

    with matching TrailingPath: "/a/b"
with non-matching TrailingPath: "/a/b/c+/d"

Thanks
Works good with the initial tests.

Will integrate this with our main logic & get back in case of any issues.
Thanks again :slight_smile: