Error while reading line by line.

angie1234 · April 16, 2012, 10:25am

Hi ,

I have a number of files having an average line count of 10,000 lines. I have to read each file line by line and output the parsed lines into one single output file. The combined size of all the input files will be around 25 MB.
I'm using a re-directed while-read loop

while read LINE
do
.
.
. 
done < $FILENAME

But all the lines are not parsed into the output file. Some of the lines in big files containing more than 10,000 lines are being ommitted and it goes on to read next file. All the lines in files below 3000 lines are read correctly and not ommitted.

I haven't got any clue as to why this is happenening. Is it a memory issue or code issue ?
Can anyone give me an insight as to why this is happening and how can it be rectified.

neutronscott · April 16, 2012, 11:03am

This is just an example? Is your parsing really just adding a line count? How are you iterating over the files?

As is there, this is much better

awk '{ print NR, $0; }' file1 file2 ...

angie1234 · April 16, 2012, 11:06am

This is just an example. Just want to mention the skeletel of the loop. Have huge number of line parsing conditions inside the loop .

And as for awk , the body of the loop contains many back-slashes . So while converting from this loop, it gives many errors.
So I thought i would attack one at a time. First I want to rectify, the omitting of lines from reading problem.
Is it caused by the while-read loop ?

Corona688 · April 16, 2012, 11:37am

There is nothing wrong with the skeleton of the loop.

Everything else, the stuff you didn't post? I have no idea.

Post a sample of the input you have and the output you want and we'll show you how to do it.

neutronscott · April 16, 2012, 11:41am

How do you notice these lines missing? In a text editor? One thing that comes to mind is printing it to a terminal if they have \r line endings it'll seem to overwrite. Also in bash I'd use read -r
Are you using any continue or break, maybe your logic is wrong. It would certainly be easier if you posted real code.

Scrutinizer · April 16, 2012, 12:00pm

It really depends on your input files and what you do inside the loop. The only thing that will not always work correctly with this loop skeleton is that there should be double quotes around "$FILENAME" .

methyl · April 16, 2012, 12:20pm

Providing that these are correctly-formatted unix text files the only major stopper might be if individual records are too long or if you are using arrays (whether it be Shell or awk).

angie1234 · April 17, 2012, 3:53am

Input files are double-quote fixed length de-limited files.
I have used double-quotes in this format "${FILENAME}".dat , as the filename with extension as such isn't stored in any variable.
Inside the loop, the code performs basic extraction using if-conditions, cut-command and sed-command from each field which does not effect the while-read loop at all.

My problem is it works for small files perfectly i.e it reads every line of small files and gives expected output. It only gives the error for very huge files of about more than around 7000 lines. So I just wanted to know why it works for small files but not large files.
So if you could give me suggestions on what can wrong , I would check out for that.

---------- Post updated at 01:23 PM ---------- Previous update was at 01:20 PM ----------

What do you mean by correctly formmatted unix text files ?
Yes, the individual records contain around 10,000 to 30,000 lines. For the records with less than 5000 lines, the code works perfectly. And the output file obtained after consolidating will be around 100,000 to 200,000 lines .

Scrutinizer · April 17, 2012, 3:55am

You would really need to give us one or more samples of input files (just a couple of anonimized lines) that go wrong, and some insight in the code inside the loop..

methyl · April 17, 2012, 8:13am

A correctly-formatted unix text file contains contains characters from the ASCII printable range and each record is terminated with a single line-feed character. The line terminator is normally invisible in programs like vi .

The number of records in a file should not make any difference unless you are using arrays. What might matter is the length of the record or the number of delimited fields in the record.

Based on the figures supplied you have 100,000 records totalling 25 Mb which gives an average record length of 262 bytes. In some unix Operating Systems just an echo statement this long would exceed the maximum command length.

What Operating System and version do you have and what Shell do you use?

How about posting the command and the matching error message?