Cutting a large log file in to smaller ones

I have a very large (150 megs) IRC log file from 2000-2001 which I want to cut down to individual daily log files. I have a very basic knowledge of the cat, sed and grep commands. The log file is time stamped and each day in the large log file begins with a "Session Start" string like so:

Session Start: Tue Mar 21 03:54:00 2000

Any idea how I could take this large log file, and use sed/grep to break it up in to individual daily versions? Ideally the output would save in to a directory of my chosing in sequential format. Something like so: "resultingfile001.log" and "resultingfile002.log" and so on. I will manually rename the files afterwards.

Any and all help would be greatly appreciated!

I would suggest using awk...

awk '
/Session Start:/ {
  close f
  ++n
}
{
  f=sprintf("resultingfile.%03d",n)
  print $0 > f
}
' largelogfile

split and dd will also do what you need.

Carriage-control files work the best with split. IMO.

How? Have you got an example? I didn't know that split or dd could search files for a particular pattern. My versions of split and dd can only break-up a file into fixed-size pieces.

My bad.

They are modified versions of dd and split. They accept '-exec ' like arguments -- it allows us to embed commands like 'grep -l' which return multiple line numbers.

We reformat huge numbers of files and convert between ASCII-EBCDIC, so a decision was made a long time ago to do it this way.

I appreciate everyone's follow-ups. I tried using this code, dumped it in to a shell script, chmod'ed it and tried running it. There seems to be a problem, however. It gave me one "resultingfile.001" but stopped afterwards. Any pointers? Otherwise it appears it will do exactly what I need it to!

Thank you very much for your help, Ygor!

Perhaps you using an old version of awk? On some systems the newer awk is called nawk.

I am using Mac OS 10.3.3.

Thank you for the prompt replies, Ygor!

Any idea how I should proceed next? Thank you immensely in advance.

I'm not sure which version(s) of awk OSX has. To find out use...

ls /usr/bin/*awk*

...then try again with nawk or gawk. Also check that the search pattern exists many times in the file, using...

grep 'Session Start:' largelogfile

I did some testing and it does seem to work...

$ cat largelogfile
Session Start: Tue Mar 21 03:54:00 2000
line one
line two
line three
Session Start: Wed Mar 22 03:54:00 2000
line four
line five
line six
$ run_the_script
$ ls -o
-rw-r--r-- 1 ygor 138 May 18 12:14 largelogfile
-rw-r--r-- 1 ygor 69 May 18 12:16 resultingfile.001
-rw-r--r-- 1 ygor 69 May 18 12:16 resultingfile.002
$ head resulting*
==> resultingfile.001 <==
Session Start: Tue Mar 21 03:54:00 2000
line one
line two
line three

==> resultingfile.002 <==
Session Start: Wed Mar 22 03:54:00 2000
line four
line five
line six

Hi ALL,

I have a file of around 300 lines which has a string say "SERVER" occuuring 35 times.

I am using awk command to split the file, wherever VIKAS occurs.

It creates 10 files.... l

But this awk has a restriction, it makes maximum 10 files.
I have tried nawk also, it gives an error

there is no gawk in my system. (Sun OS)

Pls help !!

Thanks in adv.

Try csplit instead

No restriction with the following version of your script :

awk '/SERVER/{if (n++) close(f n)}{print > f n}' f=/vikas/list /vikas/final

Jean-Pierre.