Shell Variable visibility

dae · January 13, 2017, 10:47am

Dear All,

Saying, I have two distinct functions with the same goal (counting lines containing a specific pattern in a file MyFile).

To perform that operation, I used a "while loop" with two different syntax ("grep" command would be much more better in that case but this is not the concern in our case) !:

(1) cat MyFile | while read -r line;do; ... ;done;

(2) while read -r line; do; ...;done < MyFile;

Inside the while loop, I used a variable var to store the result.

My questions:

(1) is that the expected behaviour to not be able to get the value of var outside the while for the first syntax ?
(2) what kind of system's difference between both syntax ?

Thanks a lot for your attention,

cf. examples below:

[x004191a@xsnl11p317a log]$ echo $SHELL
/bin/bash
[x004191a@xsnl11p317a log]$ fnct_dae_V1 ()
> {
>
>   var=0
>
>   cat "STG_INSTNCE_COMPLTED_01_451_20170112193018.log" | while read -r line
>   do
>
>     if [[ $(echo "$line" | grep -i 'completed') ]]
>     then
>
>       ((var++))
>
>     echo "- INSIDE WHILE:var:$var"
>
>     fi
>
>   done
>
>   echo "- OUTSIDE WHILE:var:$var"
>
> }
[x004191a@xsnl11p317a log]$ fnct_dae_V1
- INSIDE WHILE:var:1
- INSIDE WHILE:var:2
- INSIDE WHILE:var:3
- INSIDE WHILE:var:4
- OUTSIDE WHILE:var:0
[x004191a@xsnl11p317a log]$
[x004191a@xsnl11p317a log]$
[x004191a@xsnl11p317a log]$
[x004191a@xsnl11p317a log]$
[x004191a@xsnl11p317a log]$ fnct_dae_V2 ()
> {
>
>   var=0
>
>   while read -r line
>
>   do
>
>     if [[ $(echo "$line" | grep -i 'completed') ]]
>     then
>
>       ((var++))
>
>     echo "- INSIDE WHILE:var:$var"
>
>     fi
>
>   done < STG_INSTNCE_COMPLTED_01_451_20170112193018.log
>
>   echo "- OUTSIDE WHILE:var:$var"
>
> }
[x004191a@xsnl11p317a log]$ fnct_dae_V2
- INSIDE WHILE:var:1
- INSIDE WHILE:var:2
- INSIDE WHILE:var:3
- INSIDE WHILE:var:4
- OUTSIDE WHILE:var:4
[x004191a@xsnl11p317a log]$

Peasant · January 13, 2017, 11:10am

In the first example, pipe | creates a subshell outside your current shell to execute a loop.
After the while loop is done, var is 0 in your current shell, since the increment was done in subshell.

The second example increments the var in current shell, so everything is known to the program.

Hope that helps
Regards
Peasant.

Scrutinizer · January 13, 2017, 3:47pm

A way to get around this (for demonstration sake, because

done < STG_INSTNCE_COMPLTED_01_451_20170112193018.log

is the superior approach to UUOC and there are better ways than using the grep -i inside the loop):

fnct_dae_V1 ()
{
  var=0
  cat "STG_INSTNCE_COMPLTED_01_451_20170112193018.log" | 
  {
    while read -r line
    do
      if [[ $(echo "$line" | grep -i 'completed') ]]
      then
        ((var++))
        echo "- INSIDE WHILE:var:$var"
      fi
    done
    echo "- OUTSIDE WHILE:var:$var"
  }
}

joker · January 13, 2017, 4:22pm

If you do not have a file but a command to be processed in the loop the corresponding construct will be helpful.

Instead of < file use < <( command )

dae · January 13, 2017, 6:36pm

Many thanks to all of you for your answers ...

Few comments I would like to add:

Peasant, I do not think a subshell is created when using that syntax: in fact, I show the PID for both functions and I do not notice a specific PID for "while loop" with the first function:

fnct_dae_V1:

[x004191a@xsnl11p317a log]$ fnct_dae_V1 ()
> {
>
>   echo "PID_Fonction: $$"
>
>   var=0
>
>   cat "STG_INSTNCE_COMPLTED_01_451_20170112193018.log" | while read -r line
>   do
>
>     if [[ $(echo "$line" | grep -i 'completed') ]]
>     then
>
>       ((var++))
>
>     echo "- INSIDE WHILE:var:$var"
>     echo "PID_WHILE: $$"
>
>     fi
>
>   done
>
>   echo "- OUTSIDE WHILE:var:$var"
>
> }
[x004191a@xsnl11p317a log]$ fnct_dae_V1
PID_Fonction: 15468
- INSIDE WHILE:var:1
PID_WHILE: 15468
- INSIDE WHILE:var:2
PID_WHILE: 15468
- INSIDE WHILE:var:3
PID_WHILE: 15468
- INSIDE WHILE:var:4
PID_WHILE: 15468
- OUTSIDE WHILE:var:0
[x004191a@xsnl11p317a log]$

fnct_dae_V2:

[x004191a@xsnl11p317a log]$ fnct_dae_V2 ()
> {
>
>   echo "PID_Fonction: $$"
>
>   var=0
>
>   while read -r line
>
>   do
>
>     if [[ $(echo "$line" | grep -i 'completed') ]]
>     then
>
>       ((var++))
>
>     echo "- INSIDE WHILE:var:$var"
>     echo "PID_WHILE: $$"
>
>     fi
>
>   done < STG_INSTNCE_COMPLTED_01_451_20170112193018.log
>
>   echo "- OUTSIDE WHILE:var:$var"
>
> }
[x004191a@xsnl11p317a log]$ fnct_dae_V2
PID_Fonction: 15468
- INSIDE WHILE:var:1
PID_WHILE: 15468
- INSIDE WHILE:var:2
PID_WHILE: 15468
- INSIDE WHILE:var:3
PID_WHILE: 15468
- INSIDE WHILE:var:4
PID_WHILE: 15468
- OUTSIDE WHILE:var:4
[x004191a@xsnl11p317a log]$

Scrutinizer, I applied your suggestion (illustration below with fnct_dae_V3) but, in that case, it seems that I can not keep the value of the variable with that syntax:

fnct_dae_V3:

[x004191a@xsnl11p317a log]$ fnct_dae_V3 ()
> {
>
>   echo "PID_Fonction: $$"
>
>   var=0
>
>   cat "STG_INSTNCE_COMPLTED_01_451_20170112193018.log" |
>
>   {
>
>     while read -r line
>     do
>
>       if [[ $(echo "$line" | grep -i 'completed') ]]
>       then
>
>         ((var++))
>
>       echo "- INSIDE WHILE:var:$var"
>       echo "PID_WHILE: $$"
>
>       fi
>
>     done
>
>   }
>
>   echo "- OUTSIDE WHILE:var:$var"
>
> }
[x004191a@xsnl11p317a log]$ fnct_dae_V3
PID_Fonction: 15468
- INSIDE WHILE:var:1
PID_WHILE: 15468
- INSIDE WHILE:var:2
PID_WHILE: 15468
- INSIDE WHILE:var:3
PID_WHILE: 15468
- INSIDE WHILE:var:4
PID_WHILE: 15468
- OUTSIDE WHILE:var:0
[x004191a@xsnl11p317a log]$

So, I allow myself to ask you one more time: how is it possible to lose the value of the variable if the whole execution use a unique PID (cf. fnct_dae_V1 and I assume fnct_dae_V3) ?

Thanks again,

jgt · January 13, 2017, 9:03pm

This thread may help you Sh vs bash Post: 302912242

Don_Cragun · January 13, 2017, 11:37pm

dae:

Many thanks to all of you for your answers ...

Few comments I would like to add:

Peasant, I do not think a subshell is created when using that syntax: in fact, I show the PID for both functions and I do not notice a specific PID for "while loop" with the first function:

fnct_dae_V1:

[x004191a@xsnl11p317a log]$ fnct_dae_V1 ()
> {
>
>   echo "PID_Fonction: $$"
>
>   var=0
>
>   cat "STG_INSTNCE_COMPLTED_01_451_20170112193018.log" | while read -r line
>   do
>
>     if [[ $(echo "$line" | grep -i 'completed') ]]
>     then
>
>       ((var++))
>
>     echo "- INSIDE WHILE:var:$var"
>     echo "PID_WHILE: $$"
>
>     fi
>
>   done
>
>   echo "- OUTSIDE WHILE:var:$var"
>
> }
[x004191a@xsnl11p317a log]$ fnct_dae_V1
PID_Fonction: 15468
- INSIDE WHILE:var:1
PID_WHILE: 15468
- INSIDE WHILE:var:2
PID_WHILE: 15468
- INSIDE WHILE:var:3
PID_WHILE: 15468
- INSIDE WHILE:var:4
PID_WHILE: 15468
- OUTSIDE WHILE:var:0
[x004191a@xsnl11p317a log]$

fnct_dae_V2:

[x004191a@xsnl11p317a log]$ fnct_dae_V2 ()
> {
>
>   echo "PID_Fonction: $$"
>
>   var=0
>
>   while read -r line
>
>   do
>
>     if [[ $(echo "$line" | grep -i 'completed') ]]
>     then
>
>       ((var++))
>
>     echo "- INSIDE WHILE:var:$var"
>     echo "PID_WHILE: $$"
>
>     fi
>
>   done < STG_INSTNCE_COMPLTED_01_451_20170112193018.log
>
>   echo "- OUTSIDE WHILE:var:$var"
>
> }
[x004191a@xsnl11p317a log]$ fnct_dae_V2
PID_Fonction: 15468
- INSIDE WHILE:var:1
PID_WHILE: 15468
- INSIDE WHILE:var:2
PID_WHILE: 15468
- INSIDE WHILE:var:3
PID_WHILE: 15468
- INSIDE WHILE:var:4
PID_WHILE: 15468
- OUTSIDE WHILE:var:4
[x004191a@xsnl11p317a log]$

Scrutinizer, I applied your suggestion (illustration below with fnct_dae_V3) but, in that case, it seems that I can not keep the value of the variable with that syntax:

fnct_dae_V3:

[x004191a@xsnl11p317a log]$ fnct_dae_V3 ()
> {
>
>   echo "PID_Fonction: $$"
>
>   var=0
>
>   cat "STG_INSTNCE_COMPLTED_01_451_20170112193018.log" |
>
>   {
>
>     while read -r line
>     do
>
>       if [[ $(echo "$line" | grep -i 'completed') ]]
>       then
>
>         ((var++))
>
>       echo "- INSIDE WHILE:var:$var"
>       echo "PID_WHILE: $$"
>
>       fi
>
>     done
>
>   }
>
>   echo "- OUTSIDE WHILE:var:$var"
>
> }
[x004191a@xsnl11p317a log]$ fnct_dae_V3
PID_Fonction: 15468
- INSIDE WHILE:var:1
PID_WHILE: 15468
- INSIDE WHILE:var:2
PID_WHILE: 15468
- INSIDE WHILE:var:3
PID_WHILE: 15468
- INSIDE WHILE:var:4
PID_WHILE: 15468
- OUTSIDE WHILE:var:0
[x004191a@xsnl11p317a log]$

So, I allow myself to ask you one more time: how is it possible to lose the value of the variable if the whole execution use a unique PID (cf. fnct_dae_V1 and I assume fnct_dae_V3) ?

Thanks again,

You did not apply Scrutinizer's suggestion. And, a subshell and a separate process are two distinct things. And, again, the response is exactly the same. Using cat and a pipeline to feed input into your while read loop places the while read loop in a subshell (when you are using bash and many other shells). Get rid of the unneeded cat (which creates an extra process, drains system resources, and slows down your script without serving any useful purpose) and the subshell, and it will work just fine. Or, if you use a Korn shell ( ksh ) instead of the shell you're currently using, the last stage of a pipeline will be run in the current shell execution environment instead of in a subshell environment. (But, I haven't checked to see if you are using any other extensions to the POSIX standard shell requirements that might be treated differently by ksh than they are in the shell you're currently using.)

Peasant · January 14, 2017, 12:02am

In the cat ... | while read you should see another pid while doing ps -ef | grep <your script pid> .
Another pid will be subshell pid with your script PID as parent.

For instance :

echo $$
var=0
echo "junk" | while read line
do
	sleep 100
	((var++))
done
echo $var

Yields :

user@machine:~/posao$ ./sshell.sh 
2775

user@machine:~/work$ ps -ef | grep 2775
user     2775  2488  0 05:46 pts/0    00:00:00 bash
user     2777  2775  0 05:46 pts/0    00:00:00 bash # var will be incremented in shell with PID 2777, when it completes it will exit, parent (our script with PID 2775) will not be aware of var increment.
user     2782  2528  0 05:46 pts/1    00:00:00 grep 2775
user@machine:~/work$ ps -ef | grep sleep
user     2778  2777  0 05:46 pts/0    00:00:00 sleep 100
user     2784  2528  0 05:46 pts/1    00:00:00 grep sleep

Other example :

echo $$
var=0
while read line
do
	sleep 100
	((var++))
done < junk.txt
echo $var

Yields :

user@machine:~/work$ ./shell.sh 
2755

user@machine:~/work$ ps -ef | grep 2755
user     2755  2488  0 05:40 pts/0    00:00:00 bash
user     2756  2755  0 05:40 pts/0    00:00:00 sleep 100 # after this, var will be incremented in shell with pid 2755 (our script).
user     2759  2528  0 05:41 pts/1    00:00:00 grep 2755

Notice the difference in bash count with parent / children relation in both examples.

Hope that helps
Regards
Peasant.

Scrutinizer · January 14, 2017, 4:03am

dae:

[..]
2. Scrutinizer, I applied your suggestion (illustration below with fnct_dae_V3) but, in that case, it seems that I can not keep the value of the variable with that syntax:

fnct_dae_V3:
[x004191a@xsnl11p317a log]$ fnct_dae_V3 ()
> {
>
>   echo "PID_Fonction: $$"
>
>   var=0
>
>   cat "STG_INSTNCE_COMPLTED_01_451_20170112193018.log" |
>
>   {
>
>     while read -r line
>     do
>
>       if [[ $(echo "$line" | grep -i 'completed') ]]
>       then
>
>         ((var++))
>
>       echo "- INSIDE WHILE:var:$var"
>       echo "PID_WHILE: $$"
>
>       fi
>
>     done
>
>   }
>
>   echo "- OUTSIDE WHILE:var:$var"
>
> }
[x004191a@xsnl11p317a log]$ fnct_dae_V3
PID_Fonction: 15468
- INSIDE WHILE:var:1
PID_WHILE: 15468
- INSIDE WHILE:var:2
PID_WHILE: 15468
- INSIDE WHILE:var:3
PID_WHILE: 15468
- INSIDE WHILE:var:4
PID_WHILE: 15468
- OUTSIDE WHILE:var:0
[x004191a@xsnl11p317a log]$
So, I allow myself to ask you one more time: how is it possible to lose the value of the variable if the whole execution use a unique PID (cf. fnct_dae_V1 and I assume fnct_dae_V3) ?

Thanks again,

As Don noted earlier, you did not apply my suggestion. Look carefully: the second brace is in the wrong place:

wrong:

    done
  }
  echo "- OUTSIDE WHILE:var:$var"
}

right:

    done
    echo "- OUTSIDE WHILE:var:$var"
  }
}