Parsing chunks of text and finding data

erick_tuk · May 29, 2011, 10:22pm

Hi, I need a script that parses and greps data out of a textfile.
I have a text file that has this structure:

File1
host1.localdomain

text random text

Found errors

this text is random (41123) --- random random
at.5165 ---- random random
at.5165 ---- random random
at.5165 ---- random random
at.5123 ---- random random
at.5155 ---- random random
at.5333 ---- random random
200
200 hostname.localdomain

extra this text is random (41239) --- random random
at.5123 ---- random random:87654
at.5123 ---- random random:232
at.5123 ---- random random:23
at.5123 ---- random random
at.5123 ---- random random
at.5123 ---- random random
4565
4565 hostname.localdomain.end

this text is random (41123) --- random random
at.5165 ---- random random
at.5165 ---- random random:53
at.5165 ---- random random:5523323
at.5123 ---- random random:322
at.5155 ---- random random
at.5333 ---- random random
200
200 hostname.localdomain

I want a script that recognices this chunks of data (it doesn't necessarily starts with the word "this", it could be anything) but it always ends with a number, then newline and the same number with extra data.

I want the script to get the chunks of data that has a number larger than 1000 ( I am talking about the number that is by itself, in this case "4665") then take this chunk of data, in this case the chunk begins in "extra" and ends in "end", I want to create a md5sum of this chunk of text (where a number larger than 1000 was found)

Please notice, that the number that I am looking for that should be larger than 1000 should only be matched not in regular lines, but only the lines that only show a number (not text).

Please help, preferably on python or bash.

Thanks in advance

rdcwayx · May 29, 2011, 10:45pm

awk 'BEGIN{FS="\n";RS=""} $(NF-1)~/^[0-9]+$/ && $(NF-1) >1000 {print > ++i ".new.text" }' File1

md5sum *.new.txt

if the number "4665" is unique, you can generate the file name with the unique number directly.

awk 'BEGIN{FS="\n";RS=""} $(NF-1)~/^[0-9]+$/ && $(NF-1) >1000 {print >  $(NF-1) ".new.text" }' File1

erick_tuk · May 29, 2011, 11:04pm

Wow! that was really quick and helpfull, thanks a lot!

---------- Post updated at 10:04 PM ---------- Previous update was at 09:59 PM ----------

genius! it works