Dealing with white spaces in bash scripts

venmx · March 16, 2013, 4:26pm

I'm trying to search for all files in directory with particular GID then change the GID to match the UID of each file:

#!/bin/sh

for i in $(find /dump -gid 200 | sed 's/\ /\\\ /g' | sed 's/\&/\\\&/g'); do
  chgrp $(ls -ln ${i} | awk '{print $3}') ${i}
done

I'm using sed to deal with spaces and special characters.

I get a clean output from the find command when run on its own; I also get the desired result when I run chgrp and substitute a line from output of find into each instance of variable ${i}.

But when I run the script, I get many errors and not all the files/directories have been chgrp as desired.

Here's an excerpt of the errors I'm seeing:

chgrp: missing operand after `/dump/aaa36/.evolution/memos/config'
Try `chgrp --help' for more information.
chgrp: missing operand after `/dump/aaa36/.evolution/calendar/config'
Try `chgrp --help' for more information.
chgrp: missing operand after `/dump/aaa36/.evolution/tasks/config'
Try `chgrp --help' for more information.
chgrp: missing operand after `/dump/aaa36/.evolution/cache'
Try `chgrp --help' for more information.
ls: cannot access /dump/aaa36/untitled\: No such file or directory
chgrp: missing operand after `/dump/aaa36/untitled\\'
Try `chgrp --help' for more information.
ls: cannot access folder: No such file or directory
chgrp: missing operand after `folder'
Try `chgrp --help' for more information.
ls: cannot access /dump/aaa36/untitled\: No such file or directory
chgrp: missing operand after `/dump/aaa36/untitled\\'
Try `chgrp --help' for more information.
ls: cannot access folder/neutron_EDM.pdf: No such file or directory
chgrp: missing operand after `folder/neutron_EDM.pdf'
Try `chgrp --help' for more information.

Please tell me what I'm doing wrong?! Thanks

---------- Post updated at 08:26 PM ---------- Previous update was at 07:56 PM ----------

OK, solved my own problem...

It's because for loops process space as field separators. I found a neat way to get around this:

#!/bin/sh

SAVEIFS=$IFS
IFS=$(echo -en "\n\b")

for i in $(find /dump -gid 200 | sed 's/\s\+/\\ /g' | sed 's/\&\+/\\\&/g'); do
  chgrp $(ls -ln ${i} | awk '{print $3}') ${i}

done

IFS=$SAVEIFS

Thanks to: nixCraft (BASH Shell: For Loop File Names With Spaces)

Scrutinizer · March 16, 2013, 4:56pm

Another way:

find /dump -gid 200 |
while read i; do
  echo Do something with "$i"
done

Additional note: it is advisable to put double quotes around variable references.

venmx · March 16, 2013, 5:40pm

Wow nice, thanks Scrutinizer...

Your solution certainly does away with having to fix white spaces, which is neater.

In fact my previous solution failed because when I tried to list directories the output was "total 0" and you cannot chgrp that!

And due to the nature of the find command, I had to expand it into 3 parts. And it's probably not sensible to mess about with sed as this wouldn't account for all special characters:

#!/bin/sh

SAVEIFS=$IFS
IFS=$(echo -en "\n\b")

for i in $(find /dump -type f -gid 200); do
  chgrp `ls -ln "${i}" | awk '{print $3}'` "${i}"
done

for i in $(find /dump -type d -gid 200); do
  chgrp `ls -lnd "${i}" | awk '{print $3}'` "${i}"
done

for i in $(find /dump -type l -gid 200); do
  chgrp -h `ls -lnd "${i}" | awk '{print $3}'` "${i}"
done

IFS=$SAVEIFS

Yours looks like:

#!/bin/sh

find /dump -type f -gid 200 |
while read i; do
  chgrp `ls -ln "${i}" | awk '{print $3}'` "${i}"
done

find /dump -type d -gid 200 |
while read i; do
  chgrp `ls -lnd "${i}" | awk '{print $3}'` "${i}"
done

find /dump -type l -gid 200 |
while read i; do
  chgrp -h `ls -lnd "${i}" | awk '{print $3}'` "${i}"
done

alister · March 16, 2013, 8:47pm

venmx:

And due to the nature of the find command, I had to expand it into 3 parts ... <snip> ...

#!/bin/sh

find /dump -type f -gid 200 |
while read i; do
  chgrp `ls -ln "${i}" | awk '{print $3}'` "${i}"
done

find /dump -type d -gid 200 |
while read i; do
  chgrp `ls -lnd "${i}" | awk '{print $3}'` "${i}"
done

find /dump -type l -gid 200 |
while read i; do
  chgrp -h `ls -lnd "${i}" | awk '{print $3}'` "${i}"
done

There's no reason to resort to three different traversals of /dump. Regular files, directories, and softlinks can be visited and modified simultaneously:

find /dump -gid 200 \( -type f -o -type d -o -type l \) |
while read i; do
  chgrp -h `ls -lnd "${i}" | awk '{print $3}'` "${i}"
done

If you only need to support GNU tools (I'm making the assumption that you're using GNU find), a simpler, more efficient solution presents itself:

find /dump -gid 200 \( -type f -o -type d -o -type l \) -printf '%U:%p\n' |
while IFS=: read -r uid fname; do
  chgrp -h "$uid" "$i"
done

Regards,
Alister

venmx · March 17, 2013, 7:40am

Thanks alister! I'm learning something every day

So, your doing away with awk by using -printf option to format the output of find, then using read to set the variables that can be used by chgrp. Very nice.

But can you believe people actually have file/directory names of Windows paths?! And URL's and even ones with line breaks built-in! Special characters and spaces galore... amazing!

In the end, I had to set/reset the IFS variable and multiple sed substitution to bring most of them in line. But not all! I'm bored of it now, so I'll send the few offending names to their respective users to fix themselves.

#!/bin/sh

SAVEIFS=$IFS
IFS=$(echo -en "\n\b")

find /home -gid 200 \( -type f -o -type d -o -type l \) | sed -e 's/\\/\\\\/g;s/:/\\:/g;s/ /\\ /g;s/\n//g' |
while read i; do
  chgrp -vh `ls -lnd "${i}" | awk '{print $3}'` "${i}"
#  ls -lnd "${i}"
done

IFS=$SAVEIFS

Messy

alister · March 17, 2013, 11:44am

venmx:

But can you believe people actually have file/directory names of Windows paths?! And URL's and even ones with line breaks built-in! Special characters and spaces galore... amazing!

In the end, I had to set/reset the IFS variable and multiple sed substitution to bring most of them in line. But not all! I'm bored of it now, so I'll send the few offending names to their respective users to fix themselves.
#!/bin/sh

SAVEIFS=$IFS
IFS=$(echo -en "\n\b")

find /home -gid 200 $ -type f -o -type d -o -type l $ | sed -e 's/\\/\\\\/g;s/:/\\:/g;s/ /\\ /g;s/\n//g' |
while IFS= read -r i; do
  chgrp -vh `ls -lnd "${i}" | awk '{print $3}'` "${i}"
#  ls -lnd "${i}"
done

IFS=$SAVEIFS
Messy

Everything that you're doing with sed is utterly pointless and the IFS gymnastics only effect is to preserve leading and trailing spaces and tabs in filenames. If you remove everything that I've highlighted in bolded red and add what I've bolded in blue, the result is identical.

IFS does not affect data flowing through a pipe, so it will not have any effect on what's sent between find and sed. It can affect the result of the read command, but only when there are multiple variables being read to or when there is leading or trailing IFS whitespace. In this case, you are not using multiple variables and there is no leading IFS whitespace because \b is not whitespace and \n cannot possibly be seen during read 's field splitting step because it's used as the delimiter. Since "${i}", is quoted, the resulting filename will not undergo field splitting, so, again, the value of IFS is irrelevant. Finally, the command substition (ls | awk) which is unquoted does undergo field splitting, but since the result of that pipeline is always a series of digits, the value of IFS (\b\n) will not alter the result.

With regard to sed, every single backslash inserted by sed will be immediately removed by read . With -r, you can instruct read to not treat backslashes specially.

sed's s/\n//g will never, ever match. sed strips the newline as part of reading the line (replacing it upon output). The only way that there will ever be a newline in the text that sed's working with is if you insert it or use one of the sed commands which append (neither of which occurs here).

The code I suggested in my previous post can handle any filename so long as it does not contain a newline. Spaces? No problem. Tabs? No problem. backlashes? No problem. Colons? No problem. But newlines, nope. Why not? Because find | read is consuming newlines as delimiters.

Should you need to also handle newlines in filenames, with GNU tools, the following uses the null byte as delimiter (which is an illegal character in both UNIX and Windows pathnames), so it can handle anything:

find /dump -gid 200 \( -type f -o -type d -o -type l \) -printf '%U\0%p\0' | xargs -0n2 chgrp -vh

venmx · March 17, 2013, 11:58am

Very profound, thanks for detailed explanation. I'll need to digest this with frantic Googling for further reading; think I need more rigorous understanding of what I'm doing!

alister · March 17, 2013, 12:21pm

Your shell man page should document the steps taken to parse a command, although it may be dense and terse. Experimentation usually helps fill in the blanks. If not, you can always ask us.

Should it be of interest, here's a portable solution that should handle any filename:

find /dump -group 200 \( -type f -o -type d -o -type l \) -exec sh -c '
    for f; do
        chgrp -vh $(ls -lnd "$f" | awk "NR==1 {print \$3; exit}") "$f"
    done
' sh {} +

Most importantly, note that awk has been restricted to only the first line of output. If there is indeed a possibility of a filename having newlines, then uid is only the third field of the first line; the third field of subsequent lines will be some part of the multiline filename.

Also, I changed -gid to -group, which is more portable. However, it works slightly differently. At first it tries to lookup a group name. If the name doesn't exist, and if the name provided is numerical, then it will look up a group id.

Regards,
Alister