Actually above is an extracted file from a Sql Server with each field delimited by <EOFD> and each row ends with <EORD>. I need to split the file into chunks of maybe 2 million rows. Now since this is not a normal delimited file as its basically a file with a single huge line having <EORD> as the indicator for each row end.
Can someone please advise how can I proceed with the same?
If you have GNU awk ( gawk ) or mawk you could try something like this, which should split the file in chunks (new files ending with "-chunknr") of 20,000,000 rows where the last file contains the remainder of rows:
Thanks. The command works fine. Just one thing.It takes really huge time to split a file say for size 3 GB. Is there a workaround for this? And just a small correction - Changed n=20000000 to n=2000000 as need files in chunks of 2 million rows and not 20 million.
For scenario wherein the file has around 20 million records and size is around 3 GB, the split kept on running for more than 10 minutes and hence I had to close the session as this benchmark was unacceptable.
And I am reading and writing to the same disk. For the disk speed part, I am a bit novice in unix so have to do a little search for that.
In addition to posting numbers instead of the very subjective really huge time (to me, that time might be a life expectancy, say 80 years), it's always useful to see some comparisons, like the results of command time wc <your-file-name> , time grep 'e' <your-file-name> , etc.