parsing a database with java

stuggler · March 29, 2011, 2:43am

hi all
i have a databasewhich consists few fields in the form
Time(starting from 0 in asc order) bytes service flags protocols

e.g.

0, 0, 0, 56, 86
1, 524, 3, 6, 65
1, 624, 0, 43, 33
2, 72, 0, 43, 80
3, 26, 3, 6, 86
4, 323, 3, 1459, 95
5, 325, 3, 1459, 33
6, 225, 3, 1436, 57

now here in the window size of 2 seconds(here the time field) the pattern in the last column repeats the sequence 86-95-33 (The window size is 2 seconds in both the cases as you can see e.g 2-0=2 and 5-3=2)

so the output should be like the pattern of the field along with number of times it has been repeated throughout the table(there are around 50000 such records in the file)

sgruenwald · March 29, 2011, 3:03am

I still don't get it. Could you show an example of an output you would like to have? Also, why is the 1 second line repeated?

cgkmal · March 29, 2011, 3:18am

Hi stuggler,

I'm not sure if you need to count how many times is repeated the sequence 86-95-33 in field 5 within the whole table, if so, try with:

echo "0, 0, 0, 56, 86
1, 524, 3, 6, 65
1, 624, 0, 43, 33
2, 72, 0, 43, 80
3, 26, 3, 6, 86
4, 323, 3, 1459, 95
5, 325, 3, 1459, 33
6, 225, 3, 1436, 57
7, 225, 3, 1436, 86
8, 225, 3, 1436, 57
9, 225, 3, 1436, 95
10, 225, 3, 1436, 33
11, 524, 3, 6, 86
12, 624, 0, 43, 95
13, 72, 0, 43, 33" | 
awk -F"," '$5~/86/{a=NR}
$5~/95/{b=NR}
$5~/33/{c=NR;if(a==b-1 && b==c-1) s++}END{print "Sequence is repeated",s,"times"}'
Sequence is repeated 2 times

Hope it helps,

Regards.

stuggler · March 29, 2011, 4:03am

thanks cgkmal
it works perfectly fine
but what if the sequence is not fixed one ?
I mean the final output must include all the patterns with their count which are repeated in particular time duration
is there any shell way to deal with that ?
thanks in advance

cgkmal · March 29, 2011, 3:08pm

Hi stuggler,

Great that works for you.

May you show real sample and desired output?

Regards

stuggler · March 30, 2011, 3:53pm

the output shud consist all such sequences which are repeated in the particular time slot

Also the sequence shud be displayed along with the count of the sequence occured in the time slot

cgkmal · March 30, 2011, 10:29pm

Hi stuggler,

I'm not sure if you need something as follow, if so try with:
Sample: (There are some interlaced sequences)

cat inputfile
0, 0, 0, 56, 11
1, 524, 3, 6, 74
1, 624, 0, 43, 33
2, 72, 0, 43, 80
3, 26, 3, 6, 86
4, 323, 3, 1459, 95
5, 325, 3, 1459, 33
6, 225, 3, 1436, 57
7, 225, 3, 1436, 86
8, 225, 3, 1436, 57
9, 225, 3, 1436, 95
10, 225, 3, 1436, 33
11, 524, 3, 6, 86
12, 624, 0, 43, 95
14, 72, 0, 43, 33
15, 72, 0, 43, 11
16, 72, 0, 43, 74
17, 225, 3, 1436, 33
18, 225, 3, 1436, 11
19, 225, 3, 1436, 74
20, 225, 3, 1436, 33

# 1-) Show all sequences of 3 consecutive numbers counting their occurrences.

awk 'BEGIN{print "Sequence Found|Occurrences";OFS="|"}
{a[NR]=$5;nr=NR}
{for(i=1;i<=nr-2;i++) S=a"-"a[i+1]"-"a[i+2]}
{for(j=1;j<=nr-2;j++) $1=sprintf("%s", S[j]);c[$1]++}
END{for (m in c) if(m~/-/) print m,c[m]}' inputfile
Sequence Found|Occurrences
33-86-95|1
57-95-33|1
74-33-80|1
33-11-74|2
57-86-57|1
95-33-57|1
80-86-95|1
95-33-86|1
86-57-95|1
95-33-11|1
74-33-11|1
86-95-33|2
11-74-33|3
33-57-86|1
33-80-86|1

# 2-) Show only sequences of 3 consecutive numbers with more than one occurrence.

awk 'BEGIN{print "Sequence Found|Occurrences";OFS="|"}
{a[NR]=$5;nr=NR}
{for(i=1;i<=nr-2;i++) S=a"-"a[i+1]"-"a[i+2]}
{for(j=1;j<=nr-2;j++) $1=sprintf("%s", S[j]);c[$1]++}
END{for (m in c) if(m~/-/ && c[m]>1) print m,c[m]}' inputfile
Sequence Found|Occurrences
33-11-74|2
86-95-33|2
11-74-33|3

Hope it helps,

Regards

stuggler · April 2, 2011, 6:56am

Yup it will help so much
Thanks dear