I have a very large (150 megs) IRC log file from 2000-2001 which I want to cut down to individual daily log files. I have a very basic knowledge of the cat, sed and grep commands. The log file is time stamped and each day in the large log file begins with a "Session Start" string like so:
Session Start: Tue Mar 21 03:54:00 2000
Any idea how I could take this large log file, and use sed/grep to break it up in to individual daily versions? Ideally the output would save in to a directory of my chosing in sequential format. Something like so: "resultingfile001.log" and "resultingfile002.log" and so on. I will manually rename the files afterwards.
How? Have you got an example? I didn't know that split or dd could search files for a particular pattern. My versions of split and dd can only break-up a file into fixed-size pieces.
They are modified versions of dd and split. They accept '-exec ' like arguments -- it allows us to embed commands like 'grep -l' which return multiple line numbers.
We reformat huge numbers of files and convert between ASCII-EBCDIC, so a decision was made a long time ago to do it this way.
I appreciate everyone's follow-ups. I tried using this code, dumped it in to a shell script, chmod'ed it and tried running it. There seems to be a problem, however. It gave me one "resultingfile.001" but stopped afterwards. Any pointers? Otherwise it appears it will do exactly what I need it to!
I'm not sure which version(s) of awk OSX has. To find out use...
ls /usr/bin/*awk*
...then try again with nawk or gawk. Also check that the search pattern exists many times in the file, using...
grep 'Session Start:' largelogfile
I did some testing and it does seem to work...
$ cat largelogfile
Session Start: Tue Mar 21 03:54:00 2000
line one
line two
line three
Session Start: Wed Mar 22 03:54:00 2000
line four
line five
line six
$ run_the_script
$ ls -o
-rw-r--r-- 1 ygor 138 May 18 12:14 largelogfile
-rw-r--r-- 1 ygor 69 May 18 12:16 resultingfile.001
-rw-r--r-- 1 ygor 69 May 18 12:16 resultingfile.002
$ head resulting*
==> resultingfile.001 <==
Session Start: Tue Mar 21 03:54:00 2000
line one
line two
line three
==> resultingfile.002 <==
Session Start: Wed Mar 22 03:54:00 2000
line four
line five
line six