i am trying to prepare a train and test dataset, for which i need to randomly split the data into corresponding folders (train,test)..
I began on a simple script, but seem to get som weird error messages, that i cannot make sense of?..
what am I doing wrong?
#!/bin/bash
RED='\033[0;31m'
NC='\033[0m' # No Color
if [[ $1 = "" ]]
then
echo -e "${RED}Missing Workspace name! - Provide a name!${NC}"
exit 1
fi
if [[ $2 = "" ]]
then
echo -e "${RED}Missing path to dataset - SPH files${NC}"
#exit 1
fi
if [[ $3 = "" ]]
then
echo -e "${RED}Missing path to Utt!${NC}"
#exit 1
fi
WORKSPACE=$1
PATH_TO_DATASET=$2
PATH_TO_UTT=$3
#Create the folder
mkdir ../${WORKSPACE}
cd ../${WORKSPACE}
ln -s ../wsj/s5/steps .
ln -s ../wsj/s5/utils .
ln -s ../../src .
cp ../wsj/s5/path.sh .
mkdir -p ../${WORKSPACE}/exp
mkdir -p ../${WORKSPACE}/conf
mkdir -p ../${WORKSPACE}/data
mkdir -p ../${WORKSPACE}/data/test
mkdir -p ../${WORKSPACE}/data/train
mkdir -p ../${WORKSPACE}/data/local
mkdir -p ../${WORKSPACE}/data/local/lang
# Modify if Help script is needed
#################################
# Change order of utterance and name!
# python help_scripts/change_order_name_utt.py /PATH/TO/UTTERANCE
# Partition data randomly into train and test.
SPLIT=0.5 #train/test split
NUMBER_OF_FILES = $(ls $PATH_TO_DATASET | wc -l) ## number of directories in the dataset
for ((i=1; i<=$(NUMBER_OF_FILES); i++))
do
ran = ${python -c "import random; print random.randdouble(0,1)"}
echo ${ran}
done
output
../workspace_setup.sh: line 54: NUMBER_OF_FILES: command not found
./workspace_setup.sh: line 56: NUMBER_OF_FILES: command not found
./workspace_setup.sh: line 56: ((: i<=: syntax error: operand expected (error token is "<=")
As has already been said, command substitutions are surrounded by parentheses $( command ) not by braces ${ command } .
Once you fix that you'll run into the problem that there cannot be any spaces around the = in shell variable assignments.
Try changing:
NUMBER_OF_FILES = ${ls ${PATH_TO_DATASET} | wc -l} ## number of directories in the dataset
for i in {1..${NUMBER_OF_FILES}}
do
ran = ${python -c "import random; print random.randdouble(0,1)"}
to:
NUMBER_OF_FILES=$(ls ${PATH_TO_DATASET} | wc -l) ## number of directories in the dataset
for i in {1..${NUMBER_OF_FILES}}
do
ran=$(python -c "import random; print random.randdouble(0,1)")
Please add a new post to your thread when you change things. Do not go back and edit the 1st post in a thread after people have added other posts to the thread. People who have responded to your thread do not receive any notice when you edit a post in a thread. And, doing that makes it hard for someone else reading your thread for the first time to figure out what happened.
You can't just randomly change parentheses to braces and braces to parentheses:
$( command )
is for command substitution.
${variable}
is variable expansion.
And, as I said in post #5 in this thread, there can't be any spaces around the = in a shell variable assignment (before OR after) the <equal-sign>.
Maybe something like:
#!/bin/bash
RED='\033[0;31m'
NC='\033[0m' # No Color
if [[ $1 = "" ]]
then
echo -e "${RED}Missing Workspace name! - Provide a name!${NC}"
fi
if [[ $2 = "" ]]
then
echo -e "${RED}Missing path to dataset - SPH files${NC}"
fi
if [[ $3 = "" ]]
then
echo -e "${RED}Missing path to Utt!${NC}"
fi
if [[ $1 = "" ]] || [[ $2 = "" ]] || [[ $3 == "" ]]
then
exit 1
fi
WORKSPACE=$1
PATH_TO_DATASET=$2
PATH_TO_UTT=$3
#Create the folder
mkdir ../${WORKSPACE}
cd ../${WORKSPACE}
ln -s ../wsj/s5/steps .
ln -s ../wsj/s5/utils .
ln -s ../../src .
cp ../wsj/s5/path.sh .
mkdir -p exp conf data/local/lang data/test data/train
# Modify if Help script is needed
#################################
# Change order of utterance and name!
# python help_scripts/change_order_name_utt.py /PATH/TO/UTTERANCE
# Partition data randomly into train and test.
SPLIT=0.5 #train/test split
NUMBER_OF_FILES=$(ls ${PATH_TO_DATASET} | wc -l) # number of directories in the dataset
for ((i=1; i<=${NUMBER_OF_FILES}; i++))
do
ran=$(python -c "import random; print random.randdouble(0,1)")
echo ${ran}
done
Since I don't have a file hierarchy to copy as required by this script, it is totally untested, but this should be a step closer to what you want than what is now in post #1 in this thread.
NUMBER_OF_FILES=$(ls ${PATH_TO_DATASET} | wc -l) # number of directories in the dataset
If there are newline characters in any of the directories (and heaven forbid that there are!) the output of wc -l will be incorrect. May I suggest instead:
NUMBER_OF_FILES=$(ls -1b ${PATH_TO_DATASET} | wc -l) # number of directories in the dataset
or:
NUMBER_OF_FILES=$(ls -1q ${PATH_TO_DATASET} | wc -l) # number of directories in the dataset
The -b and -q switches suppress the output of "\n" as a character, replacing it with "?" or "\n" as strings respectively.The "-1" switch is redundant, but in my opinion makes it more readable - more obvious that we are expecting a single column out of ls . Either way, the output of wc -l is now the number of directories regardless of strange characters in their names.
Your code suggests Python 2.x.x, probably 2.7.x, and as far as I know the attribute " randdouble " does not exist.
IF, and this is a big IF, it exists in Version 3.5.x, (IIRC it does NOT in 3.4.x too), then it will crash out as the print STATEMENT for Version 2.x.x is now a FUNCTION in Version 3.x.x.