Read command restarts at beginning of file

I am reading in a file in a Korn shell script. The file has one header row followed by possibly millions of data rows. I want to do one thing to the header row and write it out to a new output file, then something different to the data rows. My code looks like this:

read header < $infile 
    lineout=...$header...
echo $lineout >> $outfile
 
while read dataline
do
   lineout=....$dataline.....
   echo $lineout >> $outfile
done < $infile
 

But when I do this the while loop reads the first line in again - so I get the header line twice. The pointer goes back to the start of the file, or it closes and reopens, not sure what is going on. I know I could put the header logic in the while loop and just test for the first line and do that coding there. But I'd rather not have to do that test millions of times when I know I only need to process the first line differently. Why does the file reader not move to the next line? I assume the "done < $infile" part is forcing it to start at the beginning again. Anyway to make this work this way?

Every time you do <$infile, it's re-opening the file, just because that's what it means.

If you want to open the file only once, you can do this:

exec 5<$infile # Open into file descriptor 5

read header <&5 # Read from file descriptor 5.  It will remember its place.

while read dataline
do
...
done <&5

exec 5<&- # Close file descriptor 5
2 Likes

A redirection is opened, read/written, and closed for every single compound command that uses it (c.f. e.g. man bash ).
You're redirecting stdin twice, so $infile is opened (and reset) twice. How about collecting all above into one "group command" and redirect its stdin?

1 Like

Thank You Corona688. That's exactly what I was looking for. Works fine now.

1 Like

Since there is only one input file and one output file in your original code:

read header < $infile 
    lineout=...$header...
echo $lineout >> $outfile
 
while read dataline
do
   lineout=....$dataline.....
   echo $lineout >> $outfile
done < $infile

you could put all of your script in a group and redirect the input and output of the group instead of redirecting each command in the group:

{   read header
    lineout=...$header...
    echo "$lineout"

    while read dataline
    do
        lineout=....$dataline.....
        echo $lineout
    done
} < "$infile" >> "$outfile"
1 Like