which I would like to separate into tokens like a shell does. This would be easily done with eval but that would open a security hole big enough to drop a cow through, injecting arbitrary code would be easy as pie. How can I parse this into tokens without using the eval command and potentially running embedded commands?
string='root=/dev/sda3 noacpi foo "Baz mumble"'
eval "set -- $string" ## the tokens are now in the positional parameters
printf "%s\n" "$@" ## display each parameter on a new line
If you put that into the string variable and execute the line I posted, nothing will happen other than the tokens being placed into the positional parameters. The code in $string will not be executed.
Note that the backticks you used in your example are being executed prior to being fed to the perl script. Whatever the user can execute using backticks on your command line, he would have the privileges to execute directly before he executed your script.
The security "hole" only exists if you elevate privileges in your script and then have a way to execute arbitrary code, no?
If you are still concerned, perl or awk can split arbitrary strings just like the shell inside of the interpreter, but this is not entirely trivial.
You would just need to decide which expansions you would want to support and which not:
Where exactly and precisely and unambiguously is "string" stored ?
What is the context of "string" ... ?
If it is something to do with unix or unix shell, what Operating System and version do you have, and what shell is involved?
Where did "string" come from?
What code was used to process "string"?
The original post is unbelievably vague from someone who is concerned about someone executing arbitary code on a unix/Linux? system. Perhaps the post comes from a potential hacker, perhaps not? (I know otherwise).
We have no context. This might be a server open to the Internet offering unsolicited users to type whatever they like. If this is the case I would issue "shutdown -i0 -g0 -y" and crush the server.
On a more practical note. First process and validate any potential unix commands outside of shell.
They are strings being fed into the kernel commandline itself, and being processed by my initramfs system by a full-fledged BASH shell. It occurred to me that splitting at the shell level like this was both very powerful and perilous, so I wondered if there was a general solution to this whole class of problems.
The perl solution looks very nice. It wouldn't be hard to feed it backticks instead of processing them first the way I get the data from the kernel. Unfortunately perl is a bit weighty to cram into an initramfs bootstrap loader. But on second thought -- doesn't perl have backticks too?
I don't think my original post was "unbelievably vague". The problem is the same no matter what the ultimate purpose -- splitting arguments intelligently in a shell without permitting any expansions or substitutions. Whether or not the code is executing with elevated permissions, this isn't the sort of thing you want to allow just incidentally.
To process and evaluate the commands I must first divide them so I know what it would actually be doing, otherwise I'm just doing ad-hoc "injection rejection". I could write my own char-by-char shell parser inside the shell I suppose but this seems overkill. I could also make an escape-everything regex to make the string safe before eval-ing it but it's hard to prove there's absolutely no holes or omissions in a system like that. Or I could just strip out all dollar signs and backticks, but what if someday I need to pass a literal backtick for some reason?
I was hoping there was some obvious and more elegant way I was missing I suppose. Oh well, thanks for your responses.
It's taken a bit but I've thought of a better way to parse strings like this into name-value pairs:
var1="asdf" var2=qwerty var3="string with spaces" var4
Putting it through eval could execute untoward things, but xargs understands quotes too:
$ xargs printf "%s\n" <<EOF
var1="asdf" var2=qwerty var3="string with spaces" var4
EOF
var1=asdf
var2=qwerty
var3=string with spaces
var4
$
Exactly what I want actually -- something powerful enough to understand arguments in quotes, but dumb enough to not actually evaluate everything.
So in BASH I can do this:
STRING="VAR=\"VALUE\" VAR2 VAR3='asdf'"
while IFS="=" read KEY VALUE
do
echo "Variable $KEY is value $VALUE"
done <<<$(xargs printf "%s\n" "${STRING}")
In other shells, I'd use a temp file:
STRING="VAR=\"VALUE\" VAR2 VAR3='asdf'"
echo "$STRING" | xargs printf "%s\n"> /tmp/$$
while IFS="=" read KEY VALUE
do
...
done < /tmp/$$
rm -f /tmp/$$
Sorry to be a pedant, or just plain thick! My question in post #10 still applies, but I'll rephrase it.
How did we arrive at the situation in post #1. i.e. What code, parameters or whatever produced or defined "root". I can achieve the assignment with backslashes but I just wondered whether is a free-standing command, a line from an parameter file or just (as I now suspect) a visual representation of what is in the environment variable without any syntax intended.
I have had a similar problem when writing a script to search thousands of alien scripts written to no particular standard. It was important that the search process never executed arbitary code.
A string saved in /boot/grub/grub.conf. Once the system boots, it gets read back out of the kernel via /proc/cmdline and processed by my bootloader, to decide which real root device should be used depending on the device ID string or the like.
In trying to find a safe way to process it I realized I'd happened upon a general class of problems that's difficult to tackle in shell -- processing quoting for strings which have somehow landed in the shell with real, actual quotes intact.