Pardon me if I'm posting a duplicate thread but..
I have a text file with over 150 Million records, file size is in the range if MB(close to GB).
The requirement is to read ALL the lines excepting the FIRST LINE which is the file header and the LAST LINE which is it's trailer record.
What is the most OPTIMUM way to do it?
I'm aware that the SED solution will take a significantly long time to process such a huge file, hence I'm not opting it.
You're not really giving much information, but you could always start by keeping a count of the record being processed and throw out the first and last.
wc -l inputfile | read totalrecs
Will give you the total number of records in the file. So, ...
recordCount=0
wc -l filename | read totalrecs
cat filename | while read rec
do
((recordCount+=1))
if [[ $recordCount == 1 ]] ; then
continue ;
fi
if [[ $recordCount == $totalrecs ]] ; then
break;
fi
# ... your other processing goes here
done
Tested with a 2 GB file (excluding writing to a file which should be similar for all approaches):
$ time sed '1d;$d' greptestin1 > /dev/null
real 0m29.835s
user 0m29.186s
sys 0m0.591s
$ time awk 'NR>2{print p}{p=$0}' greptestin1 > /dev/null # BSD awk
real 1m44.183s
user 1m43.627s
sys 0m0.481s
$ time mawk 'NR>2{print p}{p=$0}' greptestin1 > /dev/null
real 0m14.982s
user 0m14.463s
sys 0m0.498s
$ time gawk 'NR>2{print p}{p=$0}' greptestin1 > /dev/null
real 0m24.682s
user 0m24.210s
sys 0m0.414s
$ time gawk4 'NR>2{print p}{p=$0}' greptestin1 > /dev/null
real 0m27.621s
user 0m27.173s
sys 0m0.419s