Scripts dont give a consistent output..

I have at moment created a setup script that is capable of setting up a workspace for an kaldi enviroment.

The script can be found here setup_base_file

I guess you wouldn't not be able to run it, without having Kaldi installed, but since this question more relates to scripting, than the kaldi framwork, would it not be nessesary to install it.

The problem is when i run the script

./workspace_setup.sh. 

The last command in this script the data directory it has created, sometimes it succeeds other times not, and the sometimes part is what bothers me... why is this the case. I usually test it be deleting all the files it has created by

rm -rf ../${WORKSPACE}

and then run the script again.

Does something in the script seem to do so?.. It quite annoys me that it works sometime and other times not....

example:

kidi@kidi-ThinkPad-T420s:~/kaldi-trunk/egs/setup_base_files$ ./workspace_setup.s - Pastebin.com

Interesting parts from example:

Line 144: utils/validate_data_dir.sh: Successfully validated data-directory data/train # Workspace validation sucessfull
Line 146: kidi@kidi-ThinkPad-T420s:~/kaldi-trunk/egs/setup_base_files$ rm -rf ../start/ # Deleting created workspace and all files in it

Line 147: kidi@kidi-ThinkPad-T420s:~/kaldi-trunk/egs/setup_base_files$ ./workspace_setup.sh "start" /home/kidi.. # Creating a new workspace
Line 290: utils/validate_data_dir.sh: file data/train/utt2spk is not in sorted order or has duplicates # Same validation as in line 144  Error encountered.  No change in procedure..

So you want to force delete the parent directory's ${WORKSPACE} .

 rm -rf ../${WORKSPACE} 

What happens IF the variable 'WORKSPACE' is a NULL?
What happens IF said variable is an invalid directory?
What happens IF the said directory does not allow access to delete.
Etc, etc...
What are the error report(s) you get?

1 Like

Ok.. that part might have been a bit unclear. I don't intent to force delete the parent directory. I just do it, to test whether the scripts provides an consistent output by removing what it has created.

${WORKSPACE} is in this case the name of the workspace = start.

Another script is suppose to validate the the directories within the workspace, as written in the first post. It approves it sometime and other times not.. why am I having this inconsistent output?

I would go though all these files and quote your variables.

I did not quote that you are deleting the parent directory bit intend to delete 'WORKSPACE' inside the parent directory. That said I am looking at your "workspace_setup.sh"...

if [ "$#" -ne "3" ]
then
        echo -e "${RED}USAGE : script.sh WORKSPACE DATASET_PATH UTT_PATH${NC}"
    exit 1
fi
 
 
if [[ $1 = "" ]]
    then
        echo -e "${RED}Missing Workspace name! -  Provide a name!${NC}"
    exit 1
fi
 
 
if [[ $2 = "" ]]
    then
        echo -e "${RED}Missing path to dataset - SPH files${NC}"
    exit 1
fi
 
 
if [[ $3 = "" ]]
    then
        echo -e "${RED}Missing path to Utt!${NC}"
    exit 1
fi

Firstly '$1' can NEVER be NULL if """$2""" and """$3""" exist. They just shift places so that '$2' becomes '$1' and so on.

cd ../../../

Ouch! Where does this go in the event of an error?
I tried it and it put me into my root directory, if you need the root drawer then why not call it as cd /
If you know what the absolute directory addresses are then why not use them?
Alternatively use the /tmp/your/directory/tree/ to test with thoroughly and change all sources to /full/path/to/your/directory/tree/
You create links to parents of parents too - ouch!
You are also calling Python scripts which I don't intend to check at this point.
Also '$1', '$2', '$3' are script arguments and if any one is wrong or not present do you not have a failsafe to prevent this scenario?
There are other bits and pieces that I, as an amateur, would not do too.

[quote=wisecracker;302981374]
I did not quote that you are deleting the parent directory bit intend to delete 'WORKSPACE' inside the parent directory. That said I am looking at your "workspace_setup.sh"...

if [ "$#" -ne "3" ]
then
        echo -e "${RED}USAGE : script.sh WORKSPACE DATASET_PATH UTT_PATH${NC}"
    exit 1
fi
 
 
if [[ $1 = "" ]]
    then
        echo -e "${RED}Missing Workspace name! -  Provide a name!${NC}"
    exit 1
fi
[..]/code]
Firstly '$1' can NEVER be NULL if """$2""" and """$3""" exist. They just shift places so that '$2' becomes '$1' and so on.
[..]
Note: here $1 is tested if it is the empty string. $1 CAN be the empty string, while $2 and $3 are not if we call the script like so:
./script "" value2 value3

or if IFS is set in a certain way..

1 Like

I was well aware of the double quotes but I was also aware of the fact that this is probably not going to be in the '$1' position...
All the more reason to NOT have things like ../../../
However the 'IFS' reason is new to me and would like to know more so thanks a lot. Off to look...

I agree that the cd statements deserve special attention.

  • The first cd is to the current directory, so that is variable and depends on the directory from which the script is called.
  • Every command, like cd, cp and mkdir should have a RC check to see if the command failed and appropriate action (like exit) should be taken otherwise..

--
@wisecracker: a bit theoretical and probably not relevant for this example, but:

IFS=,
PAR1=,b,c
./script $PAR1 $PAR2 $PAR3
1 Like

The "internal field separator" is a shell variable which tells the shell how separate "words" (=input fields) are separated from each other. Consider this command:

command arg1 arg2 arg3

Somehow the shell has to know why this is to parse as a call to "command" with three different arguments, "arg1", "arg2" and "arg3". The reason is that in fact the command looks (for the shells parser) like this:

<string1><IFS><string2><IFS><string3><IFS><string4>

and because the default value for IFS is a space the shell sees four strings, of which the first is a command and the others are arguments. This is why you have to quote "words" (commands, arguments) containing spaces: quotes turn off the mechanism inside them.

Still you can redefine this IFS to split "words" at other boundaries than the usual spaces. Consider the following input file:

word1 word2, word3 word4, word5 word6 
bla,1 bla,2 bla,3 bla,4 bla,5 bla,6

Now execute the following command (ksh, change "print -" to "echo" for bash):

while read w1 w2 w3 w4 w5 w6 ; do
     print - "w1: \"$w1\""
     print - "w2: \"$w2\""
     print - "w3: \"$w3\""
     print - "w4: \"$w4\""
     print - "w5: \"$w5\""
     print - "w6: \"$w6\""
done < ifs

Now change the while-line to:

while IFS=, read w1 w2 w3 w4 w5 w6 ; do

and run again to see the difference.

I hope this helps.

bakunin

Not quite. In the example given, this is always a sequence of class [[:blank:]] characters, irrespective of the value of IFS :

<string1>[[:blank:]]+<string2>[[:blank:]]+<string3>[[:blank:]]+<string4>

Where [[:blank:]]+ means one or more characters from the [[:blank:]] character class (i.e. space or TAB).
When a line is read, this is determined during token recognition where it is determined that these are word tokens.

Then the shell grammar is used to determine that this is a simple command where the first field is the command name and remaining fields are the arguments for the command.

---
IFS is only used in the following cases:

To illustrate the difference:

$ bla=1,2,3
$ IFS=,
$ printf "%s\n" 1,2,3  4     # No field splitting, IFS is not used, besides %s\n there are 2 fields that were determined as words during token recognition: 
1,2,3
4
$ printf "%s\n" $bla  4      # Field splitting after variable expansion, IFS is used. Besides %s\n there are now 4 fields, one determined as word during token recognition, the other field is split into 3 further fields
1
2
3
4
$ printf "%s\n" "$bla"  4    # Variable expansion between double quotes, so no field splitting, IFS is not used, besides %s\n  there are 2 argument fields that were determined as words during token recognition: 
1,2,3
4
$ printf "%s\n" "$bla  4"    # Variable expansion between double quotes, so no field splitting, IFS is not used, besides %s\n there is one argument field after quote removal
1,2,3  4
$ IFS=$' \t\n'
$ printf "%s\n" $bla  4      # Field splitting after variable expansion, IFS is used, but the comma is not part of IFS, so no split is performed, besides %s\n there are 2 argument fields that were determined as words during token recognition: 
1,2,3
4
3 Likes