The "read" command misinterprets file names containing spaces

The "read" command, which is built into bash, takes words from the standard input. However, "read" is not good at taking file names if the file names contain spaces. I would like my bash script to ask the user to enter file names, which may contain spaces. Can you think about any technique for this situation?

Suppose that the current directory has the following two files. Each file name contains a space.

Summer View.txt
Winter View.txt

Upon receiving these two file names, the "read" command misinterprets as if there were four files.

read -p "Enter files: "
for i in $REPLY ; do
  echo $i
done

-----

Summer\ View.txt Winter\ View.txt

When the above line was passed to "read",
the following output was returned.

Summer
View.txt
Winter
View.txt

-----

"Summer View.txt" "Winter View.txt"

When the above line was passed to "read",
the following output was returned.

"Summer
View.txt"
"Winter
View.txt"

I also tried single quotes, which ended up with the same erroneous output as double quotes.

-----

Is there any solution that will correctly interpret the above kinds of inputs as two files?

The delimiter can be changed from space to other characters by setting $IFS. However, I would rather stick on space for delimiter, because all characters including all symbols seem to be legal for file names or path names on modern GUI OS's.

If these file names are arguments to a script, then they would be correctly interpreted as two files. Consider the following script whose name is MyScript.

for arg ; do
  echo $arg
done

MyScript Summer\ View.txt Winter\ View.txt
and
MyScript "Summer View.txt" "Winter View.txt"

correctly output the following two file names.

Summer View.txt
Winter View.txt

However, I need user interaction. I want the user to pass file names in the middle of a script. Please show me a solution that correctly interprets file names when the user passes file names containing spaces upon the prompt by the script.

Thanks a lot, in advance.

Update:

To reflect the feedbacks from posts #2-#6, let me add the following.

I am not interested in any suggestions that attempt to change the delimiter or IFS, even if the change is temporary.

The real problem here is that the "read" command does not understand quotes and backslashes as the shell does. What the "read" command has obtained needs to be re-parsed by correctly handling quotes and backslashes like the shell. Can you think about any technique for such re-parsing?

Perhaps, is there any other command that reads the input from the user and that parses quotes and backslashes as the shell does?

Thanks a lot, in advance.

Try this... Enter the file names separated by , (comma)

#!/bin/bash

read -p "Enter Files: " line

OLD_IFS=IFS
IFS=,

for i in $line
do
        echo $i
done

IFS=$OLD_IFS
 
ls -l
-rw-r--r-- 1 root root    7 2012-01-07 12:09 a b
-rw-r--r-- 1 root root    8 2012-01-07 12:11 c d

root@bt > ./run
Enter Files : a b,c d
a b
c d

HTH
--ahamed

As I wrote in the original post, file names may contain any symbols including commas. So, changing the delimiter to comma by $IFS would create a new problem, and thus would not work.

I'm afraid you didn't wrote that. What would help is for you to tell what character is not going to be present in the file names. Otherwise, there would be no reliable way to parse the user's input.
Assuming your file names do not contain a new line, which is a reasonable expectation, this should work:

IFS="\n"
typeset -a a
read -p "Enter first file name (an empty string will end input): "
while true
do
  as=${#a[@]}
  [[ -z "$REPLY" ]] && break
  a[$as]="$REPLY"
  read -p "Enter next file name : "
done
for i in "${a[@]}" ; do
  echo "$i"
done

Even a LF (\n, 0x0A) can be in file names on GUI OS's. Though it may be silly to put LF's in file names, it can accidentally happen when one copies from text and pastes it to a file name.

I thought about setting the delimiter to a non-printable character that is really unusual, such as ETX (0x03), ACK (0x06). However, it would be difficult for the user to input such non-printable characters at the read's prompt. I would like to let the user enter file names as he/she would enter on command line.

I discourage any attempts that resort to IFS.

The real problem is that the "read" command does not understand quotes and backslashes as the shell does. What the "read" command has obtained needs to be re-parsed by correctly handling quotes and backslashes like the shell. Can you think about any technique for such re-parsing?

Isolate and scrub your proc's $@ variable. If there's a good chance that someone might paste in a bum set of chars, you'd want to exclude the chance of it break your flow.

If you'd assign the inbound $@ to its own variable (such as my_var="${@}"), you can then scrub this new variable for bad chars before you'd try to parse it further.

Use bash to read the into an array:

read -p "Enter files: " -a vals
 
for((i=0; i < ${#vals[@]}; i++)) do
  echo "$i: ${vals}"
done

On entry, escape spaces with backslash:

$ ./getfiles.sh
Enter files: Summer\ View.txt Winter\ View.txt
0: Summer View.txt
1: Winter View.txt

Alternative approach. Ask the user to enter the filenames one by one.
This will work for filenames with or without space characters.
In this script, entering a blank filename means end-of-list and we "break" out of the "while true" loop.
Notice that we always have double quotes round a filename if it might contain space characters.

while true
do
        read -p "Enter filename: " filename
        if [ "${filename}""X" = "X" ]
        then
                break
        fi
        if [ -f "${filename}" ]
        then
               ls -lad "${filename}"
        else
               echo "File does not exist: ${filename}"
        fi
done

Thank you, Chubler_XL. It is a great progress that backslashes now work as I wanted. However, I still need more way to go. I would like the wildcard asterisks (*) to work. I also want quotes to work.

In other words, I would like my script to take file names (or pathnames) from the user like arguments to the "rm" command, except that my script does not take options. I would like my script to interpret backslashes, quotes, wildcard asterisks (*) and delimiting spaces in the same manner as arguments to the "rm" command.

The only real option I can think of here (short of implementing all these expansions yourself) is the use of the eval command.

Problem is this opens you up to command injection. For example the filename could be $(chmod 777 /home/lessnux/.profile)

#!/bin/bash
process_files()
{
 while [ $# -gt 0 ]
 do
    echo "filename: " $1
    shift
 done
}
 
read -rp "Enter files: " files
eval process_files $files
Enter files: "Summer View.txt" Winter\ View.txt /etc/pro*
filename:  Summer View.txt
filename:  Winter View.txt
filename:  /etc/profile
filename:  /etc/protocols

Thank you, Chubler_XL, for a further improvement. However, the code does not handle wildcard asterisks (*) very well, if the file intended to be matched contains a space.

Suppose that the current directory has the following four files. Also suppose that the name of the above code supplied by Chubler_XL is "readtest.sh" and located at the parent directory of the current one.

Summer View.txt
Winter View.txt
scene 1.sh
scene 2.sh

Because it is dangerous to experiment with "rm", let us use "ls" instead. I would like to allow the user to input file names in the same manner as arguments to the "ls" command, except that my script does not take options.

First, I gave the following line to "ls" and "readtest.sh".

"Summer View.txt" Winter\ View.txt scene*.sh

The results were a success with "ls" and a failure with "readtest.sh". The "readtest.sh" script interpreted "scene" and "1.sh" as two separate files.

$ ls -1 "Summer View.txt" Winter\ View.txt scene*.sh
Summer View.txt
Winter View.txt
scene 1.sh
scene 2.sh
$ 
$ 
$ ../readtest.sh
Enter files: "Summer View.txt" Winter\ View.txt scene*.sh
filename:  Summer View.txt
filename:  Winter View.txt
filename:  scene
filename:  1.sh
filename:  scene
filename:  2.sh

Second, I gave the following line to "ls" and "readtest.sh".

"Summer View.txt" Winter\ View.txt "scene*.sh"

Both "ls" and "readtest.sh" failed. The "readtest.sh" script mistook as if "scene 1.sh scene 2.sh" were a single file with a long name.

$ ls -1 "Summer View.txt" Winter\ View.txt "scene*.sh"
ls: scene*.sh: No such file or directory
Summer View.txt
Winter View.txt
$ 
$ 
$ ../readtest.sh
Enter files: "Summer View.txt" Winter\ View.txt "scene*.sh"
filename:  Summer View.txt
filename:  Winter View.txt
filename:  scene 1.sh scene 2.sh

If "ls" correctly understands scene*.sh without quotes for "scene 1.sh" and "scene 2.sh", then I would like the script to understand it in the same manner as "ls".

If "ls" does not understand "scene*.sh" with quotes, then it will not matter to me whether the script understands it or not.

Many thanks, in advance.

It's got nothing to do with ls. ls doesn't know what * means.

The shell script understands that, if you're doing globbing, you want literal unmangled filenames, so * gets you literal unmangled filenames. The shell also understands that "thing in spaces" means a literal string and doesn't split it.

If your actual string contains quotes, that's an entirely different thing than a quoted string, though. To get that string in shell you'd have to do "\"thing in quotes\"". The shell doesn't unwrap the second layer of quotes because it thinks you want them, and won't do that kind of double think unless you tell it to -- which is a good thing, because if it did process everything it found in a variable, someone could type `sudo rm -Rf /` into your input and wipe your system.

That sort of shenanigans is why I wouldn't reccomend using eval to force the shell to evaluate the quotes, either. It'd technically work but would be a frightening security hole.

xargs also processes quotes, though! I've spent more time fighting that feature than using it, forcing xargs to use raw filenames containing spaces or quotes, but it might actually be useful here...

# Green is typed input, red is what xargs prints
$ xargs printf "%s\n"
"file with spaces" file_without_spaces
^D
file with spaces
file_without_spaces
$

So:

printf "Enter list of files: "
read LINE

echo "$LINE" | xargs printf "%s\n" > /tmp/$$
while read FILENAME
do
        echo "Got filename $FILENAME"
done < /tmp/$$
rm -f /tmp/$$

As you can tell LessNux we are very concerned with the security implications of using eval against user entered data. There are a few limited situations where using eval could be OK but in general it's pretty unsafe.

If you really need to do wildcard filename expansion/matching on userdata perhaps something in perl eg:

#!/usr/bin/perl
use strict;
use warnings;
my @files = glob("Winter\\ View.txt \"Summer View.txt\" scene*");
foreach my $file (@files) {
        print "Filename: $file\n";
}
exit 0;
1 Like

My BASH script was supposed to call a command that takes file names (or pathnames). Suppose that the name of the command is TweakFile, and that its syntax is
TweakFile FILE...

After the perl function glob() parsed and expanded file names, how can I pass it to TweakFile on BASH?

Many thanks, in advance.

---------- Post updated at 10:27 AM ---------- Previous update was at 10:19 AM ----------

Does xargs parse and expand file names without execution?

Many thanks, in advance.

As in, does it execute things in backticks?

No, it doesn't, it should be much safer than eval.