Help with executing awk and While loop

machomaddy · March 18, 2013, 6:07am

Hi All,
I have a file say, sample.txt

Source Name: xxx
Department|Revenue
1001|3252
1002|3345

I am using the above file in one of my script. I need to read from Line 3 of the above the file and do some process.

My script has a code:

awk 'NR > 2' sample.txt | while read Dep; do awk -F '|' '{print $1}' ; done

The Output is:

1002   #2nd Department Name in Line 4

Expected is:

1001   #1st Department Name in Line 3
1002   #2nd Department Name in Line 4

The above command ignores line 3 and starts from line 4 even after giving as awk 'NR > 2'. Not sure why this happens.

Just want to understand why awk skips Line 3 even if it mentioned as 'NR > 2' :o

PikK45 · March 18, 2013, 6:24am

I did this, and got ur expected output!!

 
echo "Source Name: xxxDepartment|Revenue1001|32521002|3345" | awk -F"|" 'NR>2 {print $1}'

machomaddy · March 18, 2013, 7:47am

I tried...Itz nt working. While is not reading the 1st line . I even created a temp file with Dept names alone and passed it in while loop as below

while read i
do
.
.
.
done < temp.txt

While is skipping the 1st line :rolleyes:

---------- Post updated at 05:17 PM ---------- Previous update was at 05:09 PM ----------

OKAY...Got it!!!

WRONG CODE

awk 'NR > 2' sample.txt | while read Dep; do awk -F '|' '{print $1}' ; done

awk 'NR > 2' sample.txt | while read Dep; do 
echo $Dep | awk -F '|' '{print $1}' 
 done

[SOLVED]

Don_Cragun · March 18, 2013, 8:10am

OK. Let's analyze what is happening. We have a file named sample.txt containing:

Source Name: xxx
Department|Revenue
1001|3252
1002|3345

and we have a script (reformatted and line numbers added for discussion):

1  awk 'NR > 2' sample.txt |
2  while read Dep;
3  do awk -F '|' '{print $1}' ;
4  done

So, first, the awk on line 1 reads and discards the first two lines from sample.txt and writes the last two lines from sample.txt into the pipe feeding line 2.

The read on line 2 sets Dep to 1001|3252 (consuming the 3rd line from sample.txt that was the 1st line output by the awk on line 1).

Then (since no other input file is specified) the awk on line 3 reads the remainder of the output from the awk on line 1 and prints 1002 .

Then the next call to read on line 2 hits EOF and terminates the while loop.

If you want the awk on line 3 to process the data stored in $Dep , you could change your script to something like:

awk 'NR > 2' sample.txt | while read Dep; do echo "Processing $Dep"; echo "$Dep" | awk -F '|' '{print $1}' ; done

Note the text in red that I've added to your script. This modified script produces the output:

Processing 1001|3252
1001
Processing 1002|3345
1002

I hope this helps,
Don

PikK45 · March 18, 2013, 8:20am

All this could have been done in a single command.. No need of echo, 3-4 awk and while loops

 
awk -F"|" 'NR>2 { print "Processing "$0; print $1;}' file

Don_Cragun · March 18, 2013, 8:42am

Of course, I agree.

I assumed machomaddy plans to perform other processing on each (non-header) line of an input file and that the second awk was a placeholder for some other processing.

If all that machomaddy wants is to print up to the first vertical bar in every line after line 2 from an input file:

sed '3,$s/|.*//' sample.txt

would be much more efficient.

PikK45 · March 18, 2013, 8:44am

I agree

Scrutinizer · March 18, 2013, 9:55am

An all shell approach would be:

{ 
  read; read
  while IFS="|" read dep rev
  do
    printf "%s\n" "$dep"
  done
} < infile

machomaddy · March 18, 2013, 11:07am

Thanks Don!! your explanation was much appreciated. I realized the fault...I posted the correct code just before your 1st reply
You were right, I had few operations to be performed in the second awk!!!

Don_Cragun · March 18, 2013, 2:42pm

Unless you have other things that have to be done by something other than awk between your calls to awk, it would be much more efficient to do all of the work in a single awk script rather than calling awk n-1 times (where n is the number of lines in your input file).