Cutting a large log file in to smaller ones

MrTangent · May 17, 2004, 6:39am

I have a very large (150 megs) IRC log file from 2000-2001 which I want to cut down to individual daily log files. I have a very basic knowledge of the cat, sed and grep commands. The log file is time stamped and each day in the large log file begins with a "Session Start" string like so:

Session Start: Tue Mar 21 03:54:00 2000

Any idea how I could take this large log file, and use sed/grep to break it up in to individual daily versions? Ideally the output would save in to a directory of my chosing in sequential format. Something like so: "resultingfile001.log" and "resultingfile002.log" and so on. I will manually rename the files afterwards.

Any and all help would be greatly appreciated!

Ygor · May 17, 2004, 8:02am

I would suggest using awk...

awk '
/Session Start:/ {
  close f
  ++n
}
{
  f=sprintf("resultingfile.%03d",n)
  print $0 > f
}
' largelogfile

jim_mcnamara · May 17, 2004, 11:15am

split and dd will also do what you need.

Carriage-control files work the best with split. IMO.

Ygor · May 17, 2004, 12:13pm

How? Have you got an example? I didn't know that split or dd could search files for a particular pattern. My versions of split and dd can only break-up a file into fixed-size pieces.

jim_mcnamara · May 17, 2004, 1:40pm

My bad.

They are modified versions of dd and split. They accept '-exec ' like arguments -- it allows us to embed commands like 'grep -l' which return multiple line numbers.

We reformat huge numbers of files and convert between ASCII-EBCDIC, so a decision was made a long time ago to do it this way.

MrTangent · May 17, 2004, 5:53pm

I appreciate everyone's follow-ups. I tried using this code, dumped it in to a shell script, chmod'ed it and tried running it. There seems to be a problem, however. It gave me one "resultingfile.001" but stopped afterwards. Any pointers? Otherwise it appears it will do exactly what I need it to!

Thank you very much for your help, Ygor!

Ygor · May 18, 2004, 3:57am

Perhaps you using an old version of awk? On some systems the newer awk is called nawk.

MrTangent · May 18, 2004, 4:50am

I am using Mac OS 10.3.3.

Thank you for the prompt replies, Ygor!

Any idea how I should proceed next? Thank you immensely in advance.

Ygor · May 18, 2004, 7:28am

I'm not sure which version(s) of awk OSX has. To find out use...

ls /usr/bin/*awk*

...then try again with nawk or gawk. Also check that the search pattern exists many times in the file, using...

grep 'Session Start:' largelogfile

I did some testing and it does seem to work...

$ cat largelogfile
Session Start: Tue Mar 21 03:54:00 2000
line one
line two
line three
Session Start: Wed Mar 22 03:54:00 2000
line four
line five
line six
$ run_the_script
$ ls -o
-rw-r--r-- 1 ygor 138 May 18 12:14 largelogfile
-rw-r--r-- 1 ygor 69 May 18 12:16 resultingfile.001
-rw-r--r-- 1 ygor 69 May 18 12:16 resultingfile.002
$ head resulting*
==> resultingfile.001 <==
Session Start: Tue Mar 21 03:54:00 2000
line one
line two
line three

==> resultingfile.002 <==
Session Start: Wed Mar 22 03:54:00 2000
line four
line five
line six

vikas027 · October 29, 2007, 4:13am

Hi ALL,

I have a file of around 300 lines which has a string say "SERVER" occuuring 35 times.

I am using awk command to split the file, wherever VIKAS occurs.

It creates 10 files.... l

But this awk has a restriction, it makes maximum 10 files.
I have tried nawk also, it gives an error

there is no gawk in my system. (Sun OS)

Pls help !!

Thanks in adv.

Ygor · November 7, 2007, 2:00am

Try csplit instead

aigles · November 7, 2007, 5:55am

No restriction with the following version of your script :

awk '/SERVER/{if (n++) close(f n)}{print > f n}' f=/vikas/list /vikas/final

Jean-Pierre.