Sed to parse log file

Hi all, thanks for reading the post.

I'm trying to parse hundreds of log files in a directory. One log file looks similar to below:

Investigator  : Jim_Foo
Custodian     : Jim_Foo-HDD1-FOO-1234
Export Path   : N:\FOO-1234\Foo_Foo
Compute MD5   : No
File List Only: No
Extensions Selected:
     DOC
     DOCX
     EML
     MSG
     OST
     PDF
     PPT
     PPTX
     PST
     XLS
     XLSX
     XLSM
     ZIP
     REP

Matched File : foo_file1.pdf
Match Type   : File Extension Match
File Number  : 0
File Size    : 640
MD5 Hash     : 'Compute MD5 Hash' not selected
MAC Times    : M=09/27/10 02:05:26PM  A=02/01/12 09:59:49AM  C=09/27/10 02:05:26PM  
Original Path: foopath\foopath\foopath\foobar\foo_file.pdf

Matched File : foo_file2.pdf
Match Type   : File Extension Match
File Number  : 0
File Size    : 123
MD5 Hash     : 'Compute MD5 Hash' not selected
MAC Times    : M=09/27/10 02:05:26PM  A=02/01/12 09:59:49AM  C=09/27/10 02:05:26PM  
Original Path: foopath\foopath\foopath\foobar\foo_file.pdf

I would like to search for strings with sed and put the results into a csv. I can search for the right patterns, but I'm having trouble printing it correctly.

$ sed -e '/Custodian/b' -e '/Matched File/b' -e '/File Size/b' -e '/Original Path/b' -e d *

Custodian : Jim_Foo-HDD1-FOO-1234
Export Path : N:\FOO-1234\Foo_Foo
Matched File : foo_file1.pdf
File Size : 640
Original Path: foopath\foopath\foopath\foobar\foo_file1.pdf
Matched File : foo_file2.pdf
File Size : 123
Original Path: foopath\foopath\foopath\foobar\foo_file.pdf[/CODE]

CSV needs to look something like this:

Custodian,Export Path,Matched File,File Size,Original Path
Custodian,Export Path,Matched File,File Size,Original Path
"Jim_Foo-HDD1-FOO-1234","N:\FOO-1234\Foo_Foo","foo_file1.pdf","640","foopath\foopath\foopath\foobar\foo_file1.pdf"
"Jim_Foo-HDD1-FOO-1234","N:\FOO-1234\Foo_Foo","foo_file2.pdf","123","foopath\foopath\foopath\foobar\foo_file.pdf"

I'd use awk to do this, since it has much easier recall with variables and sparse arrays. Workign on it.

$ awk -F": " '/Custodian/{c=$2}/Export Path/{p=$2}/Matched File/{m=$2}/File Size/{s=$2}/Original Path/{o=$2} {if(c && p && m && o && s){printf("\"%s\",\"%s\",\"%s\",\"%s\",\"%s\"\n",c,p,m,s,o);m=s=o=0;}}' test.txt
"Jim_Foo-HDD1-FOO-1234","N:\FOO-1234\Foo_Foo","foo_file1.pdf","640","foopath\foopath\foopath\foobar\foo_file.pdf"
"Jim_Foo-HDD1-FOO-1234","N:\FOO-1234\Foo_Foo","foo_file2.pdf","123","foopath\foopath\foopath\foobar\foo_file.pdf"

1 Like

perhaps more easily configurable:

$ cat myscript.awk

BEGIN { OFS="\",\"" }

FNR==1 {        for(X in A) delete A[X];        }

match($0, /:/) {
        N=substr($0, 1, RSTART-1);
        V=substr($0, RSTART+2);
        sub(/ *$/, "", N);
        A[N]=V;
}

/^Original Path/ {
        print "\"" A["Custodian"], A["Export Path"], A["Matched File"], A["File Size"], A["Original Path"] "\""
}

$ awk -f myscript.awk datafile datafile datafile

"Jim_Foo-HDD1-FOO-1234","N:\FOO-1234\Foo_Foo","foo_file1.pdf","640","foopath\foopath\foopath\foobar\foo_file.pdf"
"Jim_Foo-HDD1-FOO-1234","N:\FOO-1234\Foo_Foo","foo_file2.pdf","123","foopath\foopath\foopath\foobar\foo_file.pdf"
"Jim_Foo-HDD1-FOO-1234","N:\FOO-1234\Foo_Foo","foo_file1.pdf","640","foopath\foopath\foopath\foobar\foo_file.pdf"
"Jim_Foo-HDD1-FOO-1234","N:\FOO-1234\Foo_Foo","foo_file2.pdf","123","foopath\foopath\foopath\foobar\foo_file.pdf"
"Jim_Foo-HDD1-FOO-1234","N:\FOO-1234\Foo_Foo","foo_file1.pdf","640","foopath\foopath\foopath\foobar\foo_file.pdf"
"Jim_Foo-HDD1-FOO-1234","N:\FOO-1234\Foo_Foo","foo_file2.pdf","123","foopath\foopath\foopath\foobar\foo_file.pdf"

$
1 Like

Made some progress, but still need help formatting the output:

$sed -e '/Custodian/b' -e '/Matched File/b' -e '/File Size/b' -e '/Original Path/b' -e d * | sed -e 's/.*= \(.*\) \(.*\)/"\2, \1",/' -e 's/.*= \(.*\)/"\1",/' -e 's/.*: \(.*\)/"\1",/' -e 's/"\(.*\) ..:..:.*/"\1"/'

"Jim_Foo-HDD1-FOO-1234",
"foo_file1.pdf",
"640",
"foopath\foopath\foopath\foobar\foo_file.pdf",
"foo_file2.pdf",
"123",
"foopath\foopath\foopath\foobar\foo_file.pdf",

It should print as:

"Jim_Foo-HDD1-FOO-1234","foo_file1.pdf","640","foopath\foopath\foopath\foobar\foo_file.pdf"
"Jim_Foo-HDD1-FOO-1234","foo_file2.pdf","123","foopath\foopath\foopath\foobar\foo_file.pdf"

Need to:

  1. Repeat "Custodian" before each "Matched File"
  2. Remove some of the extra commas

---------- Post updated at 11:17 AM ---------- Previous update was at 11:04 AM ----------

This is perfect, thank you sir.