My next play would be a tool that tailed the file (stdin) and wrote periodic line counts to a log (stdout), so you could start it and just check the log occasionally. It could be scripted as I described above, or PERL or C. You could even have it do cr as line separator and just look at the dedicated xterm for the counts as they overwrite periodically.
PS: the critical right buffer size for fwcl varies by system, so it might be nice to try sizes from 8 K up and see how it varies. You want to empty any disk cache or controller block, but not exceed it. Since many files are sequentially written to media, big blocks ensure fewer seeks and other sequential advantages are mined.
I suppose you could partition the file and do separate processes or threads to count each segment. Probably, the advantage dies after 2 threads, as the disk i/o is saturated. However, as the disk gets less sequential, this might help by queuing a lot of requests, driving a good disk queue manager to sweep the carriage in and out satisfying block requests in cylinder order, and keeping the queue on every SCSI spindle from going empty.
---------- Post updated at 02:00 PM ---------- Previous update was at 01:52 PM ----------
You can get the size cheap with ls -l, and there is very likely an average line length, but if you just have to know the line count, estimates will not satisfy that daemon, which is not logic, but psychology.
---------- Post updated at 02:05 PM ---------- Previous update was at 02:00 PM ----------
Once I wrote a tool that took file names from stdin, mmap64()'d each file, did a string search in the map and munmap64(). With a long file list, it was amazingly good at stopping every other process dead -- rolled out. So, mmap() is fastest, but this task is not the foremost priority of this system.