Safely parsing parameters

Corona688 · March 11, 2010, 11:01am

I have a string like

root=/dev/sda3 noacpi foo "Baz mumble"

which I would like to separate into tokens like a shell does. This would be easily done with eval but that would open a security hole big enough to drop a cow through, injecting arbitrary code would be easy as pie. How can I parse this into tokens without using the eval command and potentially running embedded commands?

kshji · March 11, 2010, 12:42pm

set -- root=/dev/sda3 noacpi foo "Baz mumble"
echo $*
echo $#
for token $*
do
    echo $token
done

cfajohnson · March 11, 2010, 12:53pm

There's nothing wrong with using eval:

string='root=/dev/sda3 noacpi foo "Baz mumble"'
eval "set -- $string"  ## the tokens are now in the positional parameters

printf "%s\n" "$@" ## display each parameter on a new line

Corona688 · March 11, 2010, 3:50pm

There's plenty wrong with using eval.

$ string='root=/dev/sda3 noacpi foo "Baz mumble" `echo muahahahaha >&2`'
$ eval "set -- $string"
muahahahaha
$

Now imagine if someone fed it `find /dev -type b -exec dd if=/dev/urandom of={}`.

I don't want my strings to be able to execute arbitrary code like this.

kshji, your way always splits on spaces, so it doesn't work either.

cfajohnson · March 11, 2010, 4:00pm

corona688:

There's plenty wrong with using eval.
$ string='root=/dev/sda3 noacpi foo "Baz mumble" `echo muahahahaha >&2`'
$ eval "set -- $string"
muahahahaha
$
Now imagine if someone fed it `find /dev -type b -exec dd if=/dev/urandom of={}`.

I don't want my strings to be able to execute arbitrary code like this.

If you put that into the string variable and execute the line I posted, nothing will happen other than the tokens being placed into the positional parameters. The code in $string will not be executed.

$ string='find /dev -type b -exec dd if=/dev/urandom of={}'
$ eval "set -- $string"
$ printf "%s\n" "$@"
find
/dev
-type
b
-exec
dd
if=/dev/urandom
of={}

There are no ill effects.

Corona688 · March 11, 2010, 4:18pm

You're missing the point in assuming I'm in complete control of the input here. The input string is arbitrary.

cfajohnson · March 11, 2010, 4:20pm

Where do you see backticks? I didn't post any. The code I posted is perfectly safe.

Corona688 · March 11, 2010, 4:57pm

You're missing the point. I need to handle arbitrary strings. So what if your string doesn't have backticks?

drewk · March 18, 2010, 3:07pm

If you you run your parameters through awk or perl, you can also break them apart if you do not want to use eval.

perl -nle 'BEGIN {map {print} @ARGV; exit;}' root=/dev/sda3 noacpi foo "Baz mumble" `echo muahahahaha >&2`
muahahahaha
root=/dev/sda3
noacpi
foo
Baz mumble

Note that the backticks you used in your example are being executed prior to being fed to the perl script. Whatever the user can execute using backticks on your command line, he would have the privileges to execute directly before he executed your script.

The security "hole" only exists if you elevate privileges in your script and then have a way to execute arbitrary code, no?

If you are still concerned, perl or awk can split arbitrary strings just like the shell inside of the interpreter, but this is not entirely trivial.

You would just need to decide which expansions you would want to support and which not:

Shell Expansions - Bash Reference Manual

methyl · March 18, 2010, 6:16pm

Where exactly and precisely and unambiguously is "string" stored ?

What is the context of "string" ... ?
If it is something to do with unix or unix shell, what Operating System and version do you have, and what shell is involved?
Where did "string" come from?
What code was used to process "string"?

drewk · March 18, 2010, 6:28pm

me or the original poster?

methyl · March 18, 2010, 6:48pm

Sorry drewk, addressed at O/P Corona688.

The original post is unbelievably vague from someone who is concerned about someone executing arbitary code on a unix/Linux? system. Perhaps the post comes from a potential hacker, perhaps not? (I know otherwise).

We have no context. This might be a server open to the Internet offering unsolicited users to type whatever they like. If this is the case I would issue "shutdown -i0 -g0 -y" and crush the server.

On a more practical note. First process and validate any potential unix commands outside of shell.

Corona688 · March 23, 2010, 11:18am

Sorry, I didn't notice this reply.

They are strings being fed into the kernel commandline itself, and being processed by my initramfs system by a full-fledged BASH shell. It occurred to me that splitting at the shell level like this was both very powerful and perilous, so I wondered if there was a general solution to this whole class of problems.

The perl solution looks very nice. It wouldn't be hard to feed it backticks instead of processing them first the way I get the data from the kernel. Unfortunately perl is a bit weighty to cram into an initramfs bootstrap loader. But on second thought -- doesn't perl have backticks too?

I don't think my original post was "unbelievably vague". The problem is the same no matter what the ultimate purpose -- splitting arguments intelligently in a shell without permitting any expansions or substitutions. Whether or not the code is executing with elevated permissions, this isn't the sort of thing you want to allow just incidentally.

To process and evaluate the commands I must first divide them so I know what it would actually be doing, otherwise I'm just doing ad-hoc "injection rejection". I could write my own char-by-char shell parser inside the shell I suppose but this seems overkill. I could also make an escape-everything regex to make the string safe before eval-ing it but it's hard to prove there's absolutely no holes or omissions in a system like that. Or I could just strip out all dollar signs and backticks, but what if someday I need to pass a literal backtick for some reason?

I was hoping there was some obvious and more elegant way I was missing I suppose. Oh well, thanks for your responses.

Corona688 · January 13, 2012, 1:50pm

It's taken a bit but I've thought of a better way to parse strings like this into name-value pairs:

var1="asdf" var2=qwerty var3="string with spaces" var4

Putting it through eval could execute untoward things, but xargs understands quotes too:

$ xargs printf "%s\n" <<EOF
var1="asdf" var2=qwerty var3="string with spaces" var4
EOF
var1=asdf
var2=qwerty
var3=string with spaces
var4
$

Exactly what I want actually -- something powerful enough to understand arguments in quotes, but dumb enough to not actually evaluate everything.

So in BASH I can do this:

STRING="VAR=\"VALUE\" VAR2 VAR3='asdf'"

while IFS="=" read KEY VALUE
do
        echo "Variable $KEY is value $VALUE"
done <<<$(xargs printf "%s\n" "${STRING}")

In other shells, I'd use a temp file:

STRING="VAR=\"VALUE\" VAR2 VAR3='asdf'"
echo "$STRING" | xargs printf "%s\n"> /tmp/$$
while IFS="=" read KEY VALUE
do
        ...
done < /tmp/$$
rm -f /tmp/$$

methyl · January 14, 2012, 11:18am

Sorry to be a pedant, or just plain thick! My question in post #10 still applies, but I'll rephrase it.
How did we arrive at the situation in post #1. i.e. What code, parameters or whatever produced or defined "root". I can achieve the assignment with backslashes but I just wondered whether is a free-standing command, a line from an parameter file or just (as I now suspect) a visual representation of what is in the environment variable without any syntax intended.

I have had a similar problem when writing a script to search thousands of alien scripts written to no particular standard. It was important that the search process never executed arbitary code.

Corona688 · January 16, 2012, 10:54am

A string saved in /boot/grub/grub.conf. Once the system boots, it gets read back out of the kernel via /proc/cmdline and processed by my bootloader, to decide which real root device should be used depending on the device ID string or the like.

In trying to find a safe way to process it I realized I'd happened upon a general class of problems that's difficult to tackle in shell -- processing quoting for strings which have somehow landed in the shell with real, actual quotes intact.

I'm pretty happy xargs can do it.