So basically, the $1 is epoch date, $2 and $3 is some application data
From one if the threads, I was following I am using this code to group the 1st column based on the interval
awk -F "|" '{while ($1>=t*w) t++; A[t]++} END {for (i=1;i<=t;i++) print " "strftime("%F %T",(i-1)*w)" to "strftime("%F %T",i*w-1)"|"(A)}' w=3600 ${work_file} |tail
What I need is sub-group based on 3rd column as well
Meaning if the interval is say 3600 (1 hours) the output should be
1440993600 to 1440997200|ABCDEF|2
1440993600 to 1440997200|ABCD|1
1440993600 to 1440997200|ABC|1
1440993600 to 1440997200|ABCDE|1
1440997200 to 1441000800|ABCDE|12
1440997200 to 1441000800|ABC|3
1441000800 to 1441004400|ABCD|5
1441000800 to 1441004400|ABCDE|3
1441000800 to 1441004400|ABCDEF|7
.
.
.
---------- Post updated at 17:29 ---------- Previous update was at 17:28 ----------
Howsoever, try
awk -F\| '
{INTV=int($1/3600)
I[INTV]++
AD[$3]++
T[INTV,$3]++
}
END {for (i in I) for (a in AD) if (T[i,a]) print i*3600 " to " (i+1)*3600, a, T[i,a]
}
' OFS="|" file
1441015200 to 1441018800|AB|1
1441015200 to 1441018800|ABCDEF|1
1441015200 to 1441018800|ABC|1
1440993600 to 1440997200|ABCDEF|2
1440993600 to 1440997200|ABCDE|1
1440993600 to 1440997200|ABCD|1
1440993600 to 1440997200|ABC|1
In relation to the same data file, I am trying to group using the command I mentioned above
awk -F "|" '{while ($1>=t*w) t++; A[t]++} END {for (i=1;i<=t;i++) print " "strftime("%F %T",(i-1)*w)" to "strftime("%F %T",i*w-1)"|"(A)}' w=3600 ${work_file} |tail
This code behaves correctly for small set data in a file
The moment the data is like some 2M records, it does not throw correct results
For example in the same data example above, when I run it over 2M records, I get the output as this way
2015-09-01 00:00:00 to 2015-09-01 00:59:59|377387
2015-09-01 01:00:00 to 2015-09-01 01:59:59|372157
2015-09-01 02:00:00 to 2015-09-01 02:59:59|386135
2015-09-01 03:00:00 to 2015-09-01 03:59:59|335708
2015-09-01 04:00:00 to 2015-09-01 04:59:59|382802
2015-09-01 05:00:00 to 2015-09-01 05:59:59|6449915
The last count 6449915 is actually not correct
It is displayed as
Total records in the file minus (sum of 377387 + 372157 + 386135 + 335708 + 382802) which is visually not correct at all
@Scrutinizer: brilliant and terse; works well if the first epoch time in a group is an integer multiple of 3600 but shifts the entire range output if the time has some additional seconds in it.
@hemanty4u: With the data as given, t is calculated by successive addition of 1, i.e. 400277 loops, while one single division yields the same result. And, in the END section, you print 400283 times your output line. No surprise run time is that long! Try for (a in A) print a, A[a] in the END section.