However, it's a very slow way to read file line by line.
E.g. In a file that has 3 columns, and less than 400 rows, like this:
I run next script:
cat $line | while read line; do ## Reads each line
grup=`echo "$line" | cut -d " " -f3`; ## Takes third column
if [ "$grup" == "27" ]; then // ## If column == "27" prints column 2.
exp=`echo "$line" | cut -d " " -f2`;
echo $exp;
fi;
done;
Using "time" command it lasts:
It's a huge waste of time to read only less than 400 rows. Is there any way to make it faster?
Occasionally I used awk to process a file line by line, and it is much faster. Why? Any hint to read a file in bash?
You spend lots of time looping round and demanding in cut again & again. Each time, you start a new process so the system spends effort there. The awk answer is probably the way to go if you are comfortable, however you can simplify you script by using the read statement better:-
cat $line | while read first second third rest; do ## Reads each line into separate variables
if [ "$third" == "27" ]; then // ## If column == "27" prints column 2.
echo $second;
fi;
done;
I did an "Ask Jeeves" search with +bash +read specified and got quite a few examples.
As for the time command, have a read of the man page. The main figure though is real as this will be the elapsed time you will experience.
while read first second third rest; do ## Reads each line into separate variables
if [ "$third" == "27" ]; then // ## If column == "27" prints column 2.
echo $second;
fi;
done < $line
vivek d r, I actually didn't make an script. I just ran whole script from command line, preceded by time, ie:
rbatte1, I undestand that you recommend me to learn awk for scripting?
I ran command Franklin52 said (using awk), and results where awesome.
And using redirect (while.... < $file) results were also very good:
Why is there so much difference in performance using redirect rather than using pipes as I did? Could be because using redirection whole script runs in one shell, while using pipes (cat $file | while...) use several shells?
And even more, why awk (which is a program) has better performance than bash built-in commands?
Thank you very much for all answers, they help me a lot. And sorry for my English
The main reason is that your original had the following logic:-
Start a process to read a line from the input
Start a process to perform the cut *1[*]Do a compare looking for value 27
If we match, start a process for another cut *2[*]Display the result
Start from top to read next line
For a 400 line file, you are forcing 400 cut processes to be run for *1 and another set for the cut in *2
Depending on your shell, you might start 400 read processes, plus 400 echo statements in *1 and more for *2 for each line matching value 27.
All of this generates vast amounts of work just in the overheads. I'm not very good with awk myself but it all runs in a single process so is excellent if you can invest the time to get into the syntax. My variation removed many of these processes, but probably could still be improved. Every process launch requires memory to be allocated, perhaps logs to be written, paging/swap space to be altered etc, so before it actually does anything, there is a significant processing overhead - and then there may be end-of-process overheads too.
The use of the cat at the front makes it more readable for some, although I'm sure purists may not agree. I suppose it depends how you describe your logic in your mind before writing code. I just tried to follow your logic with a few tweaks so it doesn't become too different and need documentation or lots of work on your part to decipher, but it's the difference between thinking:-
Working on this file, I will do these things to it, versus
By the way, I forgot to tell you I was using cygwin to run those commands and scripts, although it probably doesn't make any difference in all you told me.