Shell script explanation

newuser21 · February 1, 2018, 4:27pm

Hey,
can someone explain me this script?

i=0
while read WORT
do
 echo $WORT|grep a>/dev/null || echo$WORT|grep B>dev/null || let i=$i+1
done
echo $i

The first lane initializie the variable i with the value of 0.
The loop line has 3 different options because of ||. The only option I understand
is that i gets +1 everytime if the other two options dont apply.

Last line just gives out the value i after the while loop is done.

I dont understand this line " echo $WORT|grep a>/dev/null || echo$WORT|grep B>dev/null "

rdrtx1 · February 1, 2018, 5:06pm

The script tries to count how many times the input does not contain the letter a then does not contain the letter B . The while loop does not stop though. The second grep should direct to /dev/null (the slash is missing). There should also be a space after echo . A slight change could be:

i=0

WORT=0

while read WORT && [ -n "$WORT" ]
do
     echo $WORT|grep a>/dev/null || echo $WORT|grep B>/dev/null || let i=$i+1
done

echo $i

The read loop will stop with an empty input.

newuser21 · February 1, 2018, 5:37pm

I still dont actually understand it. In my solution I have like 15 files
Abbe, Ananas, Apfel, Apfelsine, Asterix, Backen, Berg, Burg, Hacken, Halle,
Huepfen, Obelix, Schuber, Werbung, Barbier

to use this script it would be ls | /home/notroot/scripts/myscript.sh

16 times "read Wort"   
15 times "echo $WORT|grep a>/dev/null
10 times echo $WORT|grep B>/dev/null
8 Timeslet i=$i+1
1 time echo$i

Can you tell me why would it count like 15 times for a or 10 tims for B?

rdrtx1 · February 1, 2018, 5:47pm

8 Times let i=$i+1 is correct. On the list of file names evaluated there are 8 file names that do not have both the letters a and B .

bakunin · February 1, 2018, 5:48pm

Yes, but this might lead to incorrect results. Consider this input:

line a
line B

another line

The while loop would terminate at line 3, no?

Furthermore:

WORT=0

does not make sense with a variable which is supposed to take string values afterwards.

WORT=""

or, depending on the shell used, one of these:

local WORT=""
typeset WORT=""
declare WORT=""

would make more sense.

I hope this helps.

bakunin

Don_Cragun · February 1, 2018, 5:49pm

What is it that you don't understand?

Is it the echo $WORT|grep a>/dev/null you don't understand or is it that you don't understand what || does as a command separator?

newuser21 · February 1, 2018, 5:56pm

It might be ||

With the explanation from earlier I get that grep a>/dev/null does count if the file contains the letter a right?
Is the reason for 15 times because its the first instruction if yes then I guess I understand this part. But what is with B?

rdrtx1 · February 1, 2018, 6:12pm

From the 15 names, 10 names do not contain "a" ( grep a fails) so they go to grep B

newuser21 · February 1, 2018, 6:50pm

Ah now I understand this part but was the part correct by me with 15 times? Because it is the first instruction ?

rdrtx1 · February 1, 2018, 7:27pm

Yes. All 15 names get evaluated by grep a first.

Don_Cragun · February 2, 2018, 1:57am

Let us go back to your original code (with a space added between echo and $WORT and with the missing slash character added before dev/null ):

i=0
while read WORT
do
 echo $WORT|grep a>/dev/null || echo $WORT|grep B>/dev/null || let i=$i+1
done
echo $i

You know what the first line does.

The read WORT reads the next line from standard input; removes the trailing <newline>; and, as long as the last character in the remainder of that line is not a backslash ( \ ) character, returns an exit code 0 which causes the while to run the code between the following do and done . If there the last character of a line was a backslash character, read will also remove the backslash character; read the next line from standard input; and append its contents to WORT recursively until a line is found that does not end with a backslash character. This continues repeatedly until an I/O error is encountered or an EOF is reached while reading from standard input. Either of these conditions will cause read to exit with a non-zero exit status causing the loop to be terminated.

Moving on to the grep commands inside the loop... The command:

echo $WORT|grep a>/dev/null

uses echo to write one line (after removing trailing backslash characters and merging lines) to grep a > /dev/null
The grep utility will read that line and if it contains one or more lowercase latin letter a ( a ) characters, it will copy that line to its standard output (in this case redirected to /dev/null ) and return success (i.e., a 0 exit status). Otherwise, nothing will be written to standard output and grep will return failure (i.e., a non-zero exit status).

In an OR-list (i.e.; a list of pipelines separated by the shell || operator, the 1st pipeline in the list is executed and its exit status is evaluated. If that pipeline succeeds, the OR-list is complete at that point; any remaining pipelines in the OR-list will be skipped; and the exit status of the OR-list will be success (i.e., exit code 0). If the first pipeline in the OR-list failed (i.e., returned a non-zero exit code), the next pipeline in the list is executed and its exit status is evaluated just like the 1st pipeline in the list. If no pipeline in the list succeeded, the exit status of the OR-list will be the exit status of the last pipeline executed. If any pipeline in the OR-list succeeded, the exit status of the OR-list will be success (i.e., 0).

An OR-list could also be written as a nested if statement. A nested if statement producing the same results as the OR-list in the loop in your code could also be written as:

	if echo $WORT|grep a>/dev/null
	then	continue
	else 	if echo $WORT|grep B>/dev/null
		then	continue
		else	let i=$i+1
		fi
	fi

As we can see from this (if it wasn't obvious when reading the OR-list n your code), this increments the value of the variable i for each line read from your script's standard input that contains neither a lowercase latin a nor an uppercase latin B.

Note that executing grep once or twice for each line in a text file containing lots of input lines is a slow and compute intensive operation. You haven't given us any sample input data and you haven't given us any indication of the sizes of the input files you will be processing, so we can only make wild comments about possible alternatives... For example, if your input files do not contain any continuation lines (i.e. lines ending with a backslash character immediately followed by <newline> character), your entire script could be replaced by a much simpler (and, for large files, much faster) single grep command:

grep -cv '[aB]'

If your input does contain continuation lines, your input files are relatively long, you want continuation lines to be merged into single lines before looking for the characters you're trying to match, you could still replace your script with an awk script that would join continuation lines (like read does when invoked without the -r option), look for the two patterns on joined lines and increment a count for lines that don't match either pattern, and print the count when it hits EOF on its input. (I'll leave writing this simple awk script as an exercise for the reader. With no description of the data that your script will be processing, there is no reason to waste the time writing a script like that if your input files will never contain continuation lines.)

I hope this helps you better understand what your script is doing.