Bad substitution issues.. but why?

i am trying to prepare a train and test dataset, for which i need to randomly split the data into corresponding folders (train,test)..

I began on a simple script, but seem to get som weird error messages, that i cannot make sense of?..

what am I doing wrong?

#!/bin/bash
RED='\033[0;31m'
NC='\033[0m' # No Color


if [[ $1 = "" ]]
	then
		echo -e "${RED}Missing Workspace name! -  Provide a name!${NC}"
	exit 1	
fi

if [[ $2 = "" ]]
	then 
		echo -e "${RED}Missing path to dataset - SPH files${NC}"
	#exit 1	
fi

if [[ $3 = "" ]]
	then 
		echo -e "${RED}Missing path to Utt!${NC}"
	#exit 1
fi		

WORKSPACE=$1
PATH_TO_DATASET=$2
PATH_TO_UTT=$3

#Create the folder 

mkdir ../${WORKSPACE}
 
cd ../${WORKSPACE}
ln -s ../wsj/s5/steps .
ln -s ../wsj/s5/utils .
ln -s ../../src .

cp ../wsj/s5/path.sh .

mkdir -p ../${WORKSPACE}/exp
mkdir -p ../${WORKSPACE}/conf
mkdir -p ../${WORKSPACE}/data
mkdir -p ../${WORKSPACE}/data/test
mkdir -p ../${WORKSPACE}/data/train
mkdir -p ../${WORKSPACE}/data/local
mkdir -p ../${WORKSPACE}/data/local/lang

# Modify if Help script is needed
#################################
# Change order of utterance and name! 
# python help_scripts/change_order_name_utt.py  /PATH/TO/UTTERANCE 

# Partition data randomly into train and test. 
SPLIT=0.5 #train/test split
NUMBER_OF_FILES = $(ls $PATH_TO_DATASET |  wc -l) ## number of directories in the dataset

for ((i=1; i<=$(NUMBER_OF_FILES); i++))
do
	ran = ${python -c "import random; print random.randdouble(0,1)"}
	echo ${ran}
done	

output

../workspace_setup.sh: line 54: NUMBER_OF_FILES: command not found
./workspace_setup.sh: line 56: NUMBER_OF_FILES: command not found
./workspace_setup.sh: line 56: ((: i<=: syntax error: operand expected (error token is "<=")

did you try this code

NUMBER_OF_FILES = ${ls ${PATH_TO_DATASET} |  wc -l )

before including it in a script?

In cases like this, please ALWAYS post the entire script, so the the error line (54 ?) can be located. Or, add line numbers.

It seems you are trying to deploy "command substitution", which is done with $(...) not - as you are doing - with ${...} .

Been a while, but I think your line 54 should be using () and not {}

NUMBER_OF_FILES = $(ls ${PATH_TO_DATASET} |  wc -l) ## number of directories in the dataset

As has already been said, command substitutions are surrounded by parentheses $( command ) not by braces ${ command } .

Once you fix that you'll run into the problem that there cannot be any spaces around the = in shell variable assignments.

Try changing:

NUMBER_OF_FILES = ${ls ${PATH_TO_DATASET} |  wc -l} ## number of directories in the dataset

for i in {1..${NUMBER_OF_FILES}}
do
	ran = ${python -c "import random; print random.randdouble(0,1)"}

to:

NUMBER_OF_FILES=$(ls ${PATH_TO_DATASET} |  wc -l) ## number of directories in the dataset

for i in {1..${NUMBER_OF_FILES}}
do
	ran=$(python -c "import random; print random.randdouble(0,1)")

this will work...

NUMBER_OF_FILES = $(ls $PATH_TO_DATASET |  wc -l) 

I suspect the for loop will not do as expected either...

for i in {1..${NUMBER_OF_FILES}}

Example, OSX 10.7.5, default bash terminal...

#!/bin/bash
# for_loop.sh
loop=10
for num in {1..${loop}}
do
	echo "$num"
done

Results:-

Last login: Thu Aug 25 19:29:16 on ttys000
AMIGA:barrywalker~> bash --version
GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin11)
Copyright (C) 2007 Free Software Foundation, Inc.
AMIGA:barrywalker~> 
AMIGA:barrywalker~> cd Desktop/Code/Shell
AMIGA:barrywalker~/Desktop/Code/Shell> chmod 755 for_loop.sh
AMIGA:barrywalker~/Desktop/Code/Shell> ./for_loop.sh
{1..10}
AMIGA:barrywalker~/Desktop/Code/Shell> _

You cannot use a variable in a

{  .. }

sequence

Try this instead:

for ((i=1; i<=NUMBER_OF_FILES; i++))
do

I posted the full code, and fixed some of the issues you mentioned with {}.. problem is now that my for loop can't see the variable.. how come?

Have you read posts 7 and 8?
I pointed it out and Scrutinizer tells you why.

Please add a new post to your thread when you change things. Do not go back and edit the 1st post in a thread after people have added other posts to the thread. People who have responded to your thread do not receive any notice when you edit a post in a thread. And, doing that makes it hard for someone else reading your thread for the first time to figure out what happened.

You can't just randomly change parentheses to braces and braces to parentheses:

$( command )

is for command substitution.

${variable}

is variable expansion.

And, as I said in post #5 in this thread, there can't be any spaces around the = in a shell variable assignment (before OR after) the <equal-sign>.

Maybe something like:

#!/bin/bash
RED='\033[0;31m'
NC='\033[0m' # No Color

if [[ $1 = "" ]]
then
	echo -e "${RED}Missing Workspace name! -  Provide a name!${NC}"
fi

if [[ $2 = "" ]]
then 
	echo -e "${RED}Missing path to dataset - SPH files${NC}"
fi

if [[ $3 = "" ]]
then 
	echo -e "${RED}Missing path to Utt!${NC}"
fi

if [[ $1 = "" ]] || [[ $2 = "" ]] || [[ $3 == "" ]]
then	
	exit 1
fi

WORKSPACE=$1
PATH_TO_DATASET=$2
PATH_TO_UTT=$3

#Create the folder 
mkdir ../${WORKSPACE}
 
cd ../${WORKSPACE}
ln -s ../wsj/s5/steps .
ln -s ../wsj/s5/utils .
ln -s ../../src .

cp ../wsj/s5/path.sh .

mkdir -p exp conf data/local/lang data/test data/train

# Modify if Help script is needed
#################################
# Change order of utterance and name! 
# python help_scripts/change_order_name_utt.py  /PATH/TO/UTTERANCE 

# Partition data randomly into train and test. 
SPLIT=0.5 #train/test split
NUMBER_OF_FILES=$(ls ${PATH_TO_DATASET} |  wc -l) # number of directories in the dataset

for ((i=1; i<=${NUMBER_OF_FILES}; i++))
do
	ran=$(python -c "import random; print random.randdouble(0,1)")
	echo ${ran}
done

Since I don't have a file hierarchy to copy as required by this script, it is totally untested, but this should be a step closer to what you want than what is now in post #1 in this thread.

1 Like

Hi Don...
I have just noticed the OP's original edit! ;o/

However should your line 58 read:-

 ran=$(python -c "import random; print random.randdouble(0,1)") 
1 Like

Regarding this line:

NUMBER_OF_FILES=$(ls ${PATH_TO_DATASET} |  wc -l) # number of directories in the dataset

If there are newline characters in any of the directories (and heaven forbid that there are!) the output of wc -l will be incorrect. May I suggest instead:

NUMBER_OF_FILES=$(ls -1b ${PATH_TO_DATASET} |  wc -l) # number of directories in the dataset

or:

NUMBER_OF_FILES=$(ls -1q ${PATH_TO_DATASET} |  wc -l) # number of directories in the dataset

The -b and -q switches suppress the output of "\n" as a character, replacing it with "?" or "\n" as strings respectively.The "-1" switch is redundant, but in my opinion makes it more readable - more obvious that we are expecting a single column out of ls . Either way, the output of wc -l is now the number of directories regardless of strange characters in their names.

Andrew

Hi kidi...

Be very aware of your python line:-

 python -c "import random; print random.randdouble(0,1)" 

Your code suggests Python 2.x.x, probably 2.7.x, and as far as I know the attribute " randdouble " does not exist.

IF, and this is a big IF, it exists in Version 3.5.x, (IIRC it does NOT in 3.4.x too), then it will crash out as the print STATEMENT for Version 2.x.x is now a FUNCTION in Version 3.x.x.