Piping fails in locale other than English

adam.wis · July 12, 2012, 9:59am

Hi,

I am new to shell scripting and unix in general, and I am running into a problem.

I need to grep my script for a line which is a delimiter for an encoded file. I would like to get the line number of this delimiter so that i can use "tail" to get the encoded data and decode it.

I am piping the output of my "grep -n" command into the cut command in order to get the line number. This works when the locale is "C" or "en_US." However, when I change the locale to "cs_CZ" for example, this fails.

PAX_PAYLOAD: is the delimeter.

payload_marker_line=$(grep -n '^PAX_PAYLOAD:$' $0 | cut -d ':' -f 1)
payload_start=$((payload_marker_line + 1))

I am getting this error.

grep: input file "|": EDC5129I No such file or directory.
grep: input file "cut": EDC5129I No such file or directory.
grep: input file "-d": EDC5129I No such file or directory.
grep: input file ":": EDC5129I No such file or directory.
grep: input file "-f": EDC5129I No such file or directory.
grep: input file "1": EDC5129I No such file or directory.

Is there any reason that | is being associated with a file name??

Thank you for any and all help!!

Corona688 · July 12, 2012, 10:11am

What's your system? What's your shell?

adam.wis · July 12, 2012, 10:22am

I'm running on z/OS using an sh shell.

jim_mcnamara · July 12, 2012, 10:59am

$0 is the name of the script you are running, not the parameter

ex:

# myscript.sh
echo "the code executing this is $0"
echo "my first parameter is $1"

$>  ./myscript.sh  t.lis
the code executing this is ./myscript.sh
my first parameter is t.lis
$>

adam.wis · July 12, 2012, 11:08am

right. The encoded file is appended to the end of the script I am running.

jim_mcnamara · July 12, 2012, 11:53am

You should not append to a running script. Period. You append to an output file.
When you execute a shell script, it should remain unchanged during the run.

If that is what you meant....

adam.wis · July 12, 2012, 12:31pm

Ah. Thanks for the tip. I will keep that in mind.

I removed the encoded file into its own file, yet I am still having the same issue whenever a piping symbol is reached. it says:

|: EDC5129I No such file or directory.

Does anyone know why this works fine in English but not in any other locale?

---------- Post updated at 12:31 PM ---------- Previous update was at 12:25 PM ----------

Also, just to clarify, the encoded file I have is appended to the script before it runs, not durring. No changes are made to the running script, the encoded file is simply read and decoded. Im not sure if this makes a difference.

alister · July 12, 2012, 12:54pm

The output you're seeing is what one would see if the pipe symbol were to be quoted.

grep -n '^PAX_PAYLOAD:$' $0 '|' cut -d ':' -f 1

Seeing the entire script and knowing exactly how it's invoked (especially the value of $0) would be helpful. Also, further details of the environment may help. Which sh exactly is being used, for starters.

A shot in the dark (although I don't see how it could be the cause, it's good practice): try double-quoting $0.

Regards,
Alister

---------- Post updated at 12:54 PM ---------- Previous update was at 12:40 PM ----------

Some light may be shed on the problem if you enable tracing at the top of the script (or at least before the problematic section is entered). set -x enables tracing, set +x disables it.

Regards,
Alister

adam.wis · July 12, 2012, 4:34pm

Thanks alister. Turning tracing on was a great idea.

I was able to get around my previous issue by finding an alternative to piping. However, I have another spot where I am also using a pipeline and I am seeing the same issue.

tail -n +$license_start $0 | head -n $length | uudecode -o /dev/stdout

(I am using a combination of head and tail to get encoded data between two line numbers and then piping that into uudecode. There is probably a better way to do this but for now id like to solve this piping issue.)

Alister, I took your advice and turned tracing on.
In en_US or C for the locale I get this:

+ tail -n +46382 ./install.sh
+ head -n 2217
+ uudecode -o /dev/stdout

(and then the decoded file)

In other locales like cs_CZ i get this:

+ tail -n +46382 ./install.sh | head -n 2217 | uudecode -o /dev/stdout

as if it is all one command...

I am very new to shells, encoding etc. Sorry, i do not know exactly which sh I am using.

However, I noticed that when I change the locale from cs_CZ to cs_CZ.UTF-8 piping works. Is this just an encoding problem?

Corona688 · July 12, 2012, 4:44pm

This is probably just an encoding problem. Shells interpreting pipes as literal characters is extremely odd.

adam.wis · July 12, 2012, 4:52pm

If this is just an encoding problem, is there a way to fix it? Is it a matter of my script not having the right encoding? I tried using file -i to determine its encoding but all I get is "regular file."

I suppose I can get around piping by storing the output of each command in a temp file, but is this good practice?

Corona688 · July 12, 2012, 5:02pm

Not a good practice, no. Shells expect ASCII or something compatible with ASCII. Your encoding must be something strange which disagrees with ASCII on what | means.

Try writing up the file from scratch after you've set your encoding to "C" or UTF8 or some other ASCII-compatible encoding.

alister · July 12, 2012, 7:00pm

Honestly, I don't know. I have never experienced this problem. Please make sure to report back if you determine the exact cause.

Regards,
Alister

fpmurphy · July 12, 2012, 10:43pm

The z/OS shell expects an EBCDIC environment. The pipe is a variant character (one of 13 widely-used characters that can vary between EBCDIC flavours) which is represented by different bytes values depending on the locale which is set. This is why a pipe can work in one locale and not in another.

Internally, the z/OS shell expects the script to be encoded in IBM-1047. If the active character set is not IBM-1047, then the script is transcoded to IBM-1047 before execution. The transcode operation modifies a specific set of bytes in the script, i.e. only variant characters are changed to their IBM-1047 equivalents.

To display the variant byte values for the current locale

locale -ck LC_SYNTAX.

alister · July 12, 2012, 11:56pm

Now that's how you drop a cluestick on a thread.

adam.wis · July 13, 2012, 9:41am

Thanks fpmurphy! I'm glad I finally know what's going on.

I also found some additional info:

blog.SOAL.org Writing Portable Shell Scripts on z/OS UNIX

The suggestion here is to create a separate script that will detect the codeset and convert IBM-1047 script into that codeset.

I think I already know the answer, but is there any way (in good practice) to make this work from one script?

Corona688 · July 13, 2012, 1:00pm

Wow. I had no inlking that EBCDIC was actually required by anything anymore...

Thank you for the correction.