Spaces in filenames screwing things up...

Here's the code... obviously

#!/bin/bash

SRC=$1  

function main {

        validate 
        run

}

function validate {
        if [[ -z $SRC ]]; then
                printf "You need to supply a source.\n";
                exit 1;
        elif [[ -f $SRC ]]; then
                printf "This is a file! You need to specify a directory.\n";
                exit 1;
        fi
}

function run {

        FILES=`find $SRC -type f |sed 's_ _\ _g' |  head`

        for FILE in $FILES
        do
                printf $FILE"\n";
        done
}

main

Here's the output

/media/Data_Bucket/Audio/(val)Liam/Vampire_Sunrise_(Disc_1)/04_-_As_Is.mp3_CBR.mp3
/media/Data_Bucket/Audio/)eib(/BC
Recordings
(BCRUK001)/Dogfight.mp3
/media/Data_Bucket/Audio/)eib(/Digital
Nation/08
Infinity.MP3_CBR.mp3
/media/Data_Bucket/Audio/)eib(/mix
1/05
Planet
Dust.mp3
/media/Data_Bucket/Audio/)eib(/Unknown
Album/Bounce.mp3
/media/Data_Bucket/Audio/)eib(/Unknown
Album/Dustball
(Dubplate).mp3
/media/Data_Bucket/Audio/)eib(/Unknown
Album/eib
./sync-indus.bash: line 28: printf: -\: invalid option
printf: usage: printf [-v var] format [arguments]
running
man.mp3
/media/Data_Bucket/Audio/)eib(/Unknown
Album/eib
./sync-indus.bash: line 28: printf: -\: invalid option
printf: usage: printf [-v var] format [arguments]
Speedball.mp3
/media/Data_Bucket/Audio/)eib(/Unknown
Album/eib
./sync-indus.bash: line 28: printf: -\: invalid option
printf: usage: printf [-v var] format [arguments]
start
the
fire.mp3
/media/Data_Bucket/Audio/)eib(/Unknown
Album/Orient
Express.mp3

As you can see, the file name break up in to separate lines due to spaces and special characters. I've tried to use sed on line 24 to replace the spaces with the escape characters. But that didn't work.

Kinda clueless at this point.

Try:

function run {
        find $SRC -type f |sed 's/ /_/g'
}

See Useless Use of Backticks. Never ever use variables/backticks for open-ended lists. It's generally pointless, likely to truncate your data, and splits where it pleases instead of where you expect.

Since you're putting it in a loop anyway, you might as well save yourself the trouble and make it more direct.

The usual way to do this is

find ... | while read LINE
do
...
done

If you need to use the same results more than once, you could save find's output into a temporary file and read from that.

while read LINE
do
...
done < /tmp/filename
1 Like

That wouldn't work for me. I'm planning on processing the files in the script. So, I need to keep the name as the original.

---------- Post updated at 03:31 PM ---------- Previous update was at 03:14 PM ----------

So, would LINE be the "per entry" variable that I process? Like an i in:

for i in $LIST

I was never good at loops.

Would this be more "memory efficient"? Since it's in a file that would be one less thing for it stuff into memory? The targets can generate a very large list and the "server" doesn't really have that much ram; 256MB.

Yes, exactly, except you don't need to have the entire $LIST in a variable like that (a bad idea for the reasons given above).

It'd be far more efficient than cramming it into a variable. Depending on your shell, it's quite questionable whether you can cram 256 megs of text into a variable and expect it to even work no matter how much RAM you have.

1 Like

I'm one of those "BASH Punks". 256MB is how much memory I have in the "server". It's an old POS that I use to process things on instead of my workstation.

Yeah, I guess I'll go this route with my script then...

You can do a lot with 256 megs. :slight_smile:

Either use mktemp to generate a name for the temporary file, or use files like /tmp/$$-appname . This will allow several instances of the script to run without stomping over each other's temp files.

Well, I got other stuff going on too. It's also my media server, NFS server, Bittorrent seedbox, and a backup server to copy stuff off of my Dreamhost servers. "That's CRON. He fights for the Sys Ops."

It can do all that plus some reasonably hefty database stuff. Just don't ask it to do Wordpress.

Let's try putting double quotes round string variables and avoiding for because it cannot deal with lists containing space characters. Also remove many extraneous semi-colon characters and ensure that we run Test [ ] not a Conditional Expression [[ ]] . Positive test for Directory rather than assuming (wrongly) that everything which is not a file is a directory.
Not sure what the sed is for, so I left it out.

#!/bin/bash

SRC="${1}"   # Start Directory  

function main {

        validate 
        run

}

function validate {
        # Is parameter missing?
        if [ -z "${SRC}" ]
        then
                printf "You need to supply a source.\n"
                exit 1
        fi
        # Is parameter a directory?
        if [ ! -d "${SRC}" ]
        then
                printf "This is not a directory! You need to specify a directory.\n"
                exit 1
        fi
}

function run {

        find "${SRC}" -type f | while read filename
        do
                printf "${filename}\n"
        done
}

main

Ps. I must have missed the bit in post #1 which says what the script is meant to do!

Also, change printf to avoid issues with filenames starting with dash:

       printf "%s\n" "${filename}"
1 Like

Well, eventually the script is going to go through music library; $SRC, and analyze each file; which are all MP3. All the files that contain ID3V2 tags with a specifc genre will be copied into another directory in preparation for a rsync over a remote server. The new directory structure is going to be created according to the tags; Library => Artist => Album => Track. (I'll probably end up putting up another post on proper ASCII safe filename conversion later. But, that's off-topic.)

---------- Post updated at 09:33 PM ---------- Previous update was at 09:32 PM ----------

None of my filenames start with a dash; I know that for certian. If anything, the dash will likely be in the somewhere in the middle of the filename.

You may as well use printf correctly anyway. Aside from possible issues with a leading dash with GNU coreutils and bash printf implementations, there would be problems if anything in the filename looks like a format specifier or escape sequence.

Regards,
Alister

1 Like

Forgot to mention that you should use double quotes when calling the script if your parameter contains space characters:

./scriptnname "directory name"

Could you possibly educate me on how then? Feel free to PM me so we don't go off topic.

---------- Post updated at 02:34 AM ---------- Previous update was at 02:31 AM ----------

I believe in coding that type of flaw out. Hence the post.

---------- Post updated at 05:58 AM ---------- Previous update was at 02:34 AM ----------

My updated script... still having some issues

#!/bin/bash

SRC=$1
TMP=`mktemp`

function main {

	validate	
	run

}

function validate {
	if [[ -z $SRC ]]; then
		printf "You need to supply a source.\n";
		exit 1;
	elif [[ -f $SRC ]]; then
		printf "This is a file! You need to specify a directory.\n";
		exit 1;
	fi
}

function run {

	printf "Please wait...\n";
	find $SRC -type f -iname "*.mp3" > $TMP
	
	while read FILE
	do
		analyze_mp3
	done < $TMP

}

function analyze_mp3 {

	GENRE=`id3v2 -l "$FILE" | grep TCON | awk -F: '{printf $2"\n"}' | sed -e 's_^[[:space:]]*__g' | sed 's_([0-9][0-9][0-9])$__' | sed 's_([0-9][0-9])$__' | sed 's_([0-9])$__'`
	printf "FILE:\t$FILE\n";	
	printf "GENRE:\t$GENRE\n";

}

main

I keep getting errors like this.

FILE:   /media/Data_Bucket/Audio/Alex_M.O.R.P.H._&_Woody_van_Eyden/hardenergy.lv_(Disc_3)/03_-_A_State_Of_Trance_400_Pre-Reocrded_Guestmix_(18-04-09).mp3
GENRE:
./sync-indus.bash: line 38: printf: `_': invalid format character
FILE:   /media/Data_Bucket/Audio/Alex_Reece/100GENRE:
FILE:   /media/Data_Bucket/Audio/Alex_Twister/We_Will_Rock_You_(Remix)_[256_KBPS]_[Alex_Twister].mp3
GENRE:
./sync-indus.bash: line 38: printf: `_': invalid format character
FILE:   /media/Data_Bucket/Audio/Acetate/100GENRE:
FILE:   /media/Data_Bucket/Audio/Aceyalone_Chairman_Hahn/Reanimation/11_-_WTH_You.mp3
GENRE:
FILE:   /media/Data_Bucket/Audio/Act,_The/Too_Late_at_20/01_-_Too_Late_At_20.mp3
GENRE:  Powerpop

I'm thinking it might have something to do with the % in the file/directory names.

binary@bitslip:/media/Data_Bucket/Audio/Alex_Reece$ ls -l
total 4
drwx------ 1 binary binary 4096 2011-07-16 00:05 100%_Drum_&_Bass_(Disc_1)
binary@bitslip:/media/Data_Bucket/Audio/Alex_Reece$ cd ../Acetate/
binary@bitslip:/media/Data_Bucket/Audio/Acetate$ ls -l
total 0
drwx------ 1 binary binary 0 2011-07-16 00:01 100%_Drum_&_Bass_(Disc_2)

Is that something that is best dealt with using sed?

Not only is it not off-topic, incorrect use of printf is the root of your problem.

The first argument to printf is a format string. A conversion specifier (aka format specifier) is a sequence of characters within a format string which begins with a % .

What will be the format of the output of those printf commands? It's impossible to say. You have handed over control of the format string to external sources. How printf will behave and how many arguments it will require depend on the type and number of conversion specifiers in the format string. The type and number of specifiers in turn depends on the variable values $FILE and $GENRE.

You want to be very careful about what you allow into that first argument to printf.

You are correct.

No. There's no need to mangle the file names just to print them out. What you need to do is not allow arbitrary data into your format string.

Corrected printf statements:

	printf 'FILE:\t%s\n' "$FILE";	
	printf 'GENRE:\t%s\n' "$GENRE";

Note how the format string is now invariant. Whatever the value of the variables, the format string never changes (a point driven home by switching to strong single-quotes).

The same bug lurks in your awk one-liner:

Make $2 an argument and in the format string replace it with an appropriate conversion specifier (%s in this case).

In shell scripting, this type of error is typically nothing worse than garbled output, but a format string bug in a language like C can be a major security issue. For more info, see Uncontrolled format string.

Regards,
Alister

I tried to fix it but it gave me errors

Please wait...
awk: {printf %sn "$2"}
awk:         ^ syntax error
FILE:   /media/Data_Bucket/Audio/(val)Liam/Vampire_Sunrise_(Disc_1)/04_-_As_Is.mp3_CBR.mp3
GENRE:  
awk: {printf %sn "$2"}
awk:         ^ syntax error
FILE:   /media/Data_Bucket/Audio/)eib(/BC Recordings (BCRUK001)/Dogfight.mp3
GENRE:  
awk: {printf %sn "$2"}
awk:         ^ syntax error

The modified function...

function analyze_mp3 {

        GENRE=`id3v2 -l "$FILE" | grep TCON | awk -F: '{printf '%s\n' "$2"}' | sed -e 's_^[[:space:]]*__g' | sed 's_([0-9][0-9][0-9])$__' | sed 's_([0-9][0-9])$__' | sed 's_([0-9])$__'`
        printf 'FILE:\t%s\n' "$FILE";
        printf 'GENRE:\t%s\n' "$GENRE";

}

What am I missing? Yeah this conversion string thing is new to me.

You can't use single quotes inside double quotes. They don't nest -- it's treated as the end of the single quote.

Awk doesn't use single quotes anyway. Use double quotes.

You can't use variables inside quotes in awk. Get rid of those quotes around $2.

You forgot the comma after the first argument, and the brackets.

You can also get rid of that grep by putting the regex inside awk itself. That's done really easily.

awk -F: '/TCON/ { printf("%s\n", $2) }'

In fact, you can replace that entire enormous pipe-chain with it. awk is a whole programming language, not a glorified cut. And you can match the numbers in one regex instead of three by using ?, a specifier like * that means "zero or one of the previous character".

awk -F: '/TCON/ {
        gsub(/[ \t]*/, "", $2); # Strip whitespace
        gsub(/_[0-9]?[0-9]?[0-9]$/, "__", $2); # Replace _123 at the end with __
        printf("%s\n", $2); }'
1 Like

@Corona688 Think he wanted to strip leading whitespace, and a bracketed string of up to 3 digits from end of field 2 (he was using _ as the sed delimiter).

This slight change should cover it:

GENRE=`id3v2 -l "$FILE" | awk -F: '/TCON/ {
        gsub(/^[ \t]*/, "", $2); # Strip leading whitespace
        gsub(/\([0-9]?[0-9]?[0-9]\)$/, "", $2); # Remove bracketed string up to 3 digits from the end
        printf("%s", $2); }'`
2 Likes