ernesto
September 9, 2015, 3:33am
1
Hi all,
i have a file which contains something as below:
INPUT
00:00 0
01:00 0
02:00 0
03:00 0
04:00 0
05:00 0
06:00 0
07:00 0
08:00 0
09:00 0
10:00 5
11:00 0
13:00 5
14:00 4
15:00 0
16:00 7
17:00 6
18:00 0
19:00 0
20:00 0
23:00 0
what i wanted to achieved is that there should be a checking if there's a missing field/row on the first column. Desired output is shown below.
DESIRED OUTPUT
00:00 0
01:00 0
02:00 0
03:00 0
04:00 0
05:00 0
06:00 0
07:00 0
08:00 0
09:00 0
10:00 5
11:00 0
12:00 0
13:00 5
14:00 4
15:00 0
16:00 7
17:00 6
18:00 0
19:00 0
20:00 0
21:00 0
22:00 0
23:00 0
Thanks in advance.
Is this a homework assignment? If so, it must be filed in the Homework and Coursework Forum and a completely filled out Homework template must be included in your post.
If this is not homework, please explain why you are trying to do this and show us what you have tried. Please also tell us what operating system and shell you're using.
ernesto
September 9, 2015, 4:12am
3
Hi Sir,
it's not. We are getting stats based on a table which has 00:00:00 to 23:59:00 logging. and i am getting the total for an hour.
Hope you could help me.
Cheers,
---------- Post updated at 03:11 AM ---------- Previous update was at 03:02 AM ----------
We are using a hadoop nosql as the DB
Please see code below. The code part is more of the function that creates the files.
function func_formater
{
file=$1
outfile=$2
awk 'BEGIN{
OFS="\t";
}
{
sub(/\|/,X,$NF);
A[substr($2,1,2)]+=$NF
}
END {
for(i in A){
print i":00" OFS A
}
}
' OFS="\t" $file | sort -n > $outfile
}
function 24Hour
{
STARTTIME=$1
ENDTIME=$2
handlerFile="24HourHandler.dat"
comp="SELECT time, server, component, sum(a+b) as total FROM <table> where logtime >= '${STARTTIME}' and logtime < '${ENDTIME}' GROUP BY time, server, component ORDER BY time, server, component ASC;"
#run in hadoop db
out=`ssh $user@$host "echo \"$comp\" > ${compFile};dbname run -file=${compFile} address=$host -port=$port"`
echo $outHandler | cut -d'-' -f13- | sed -e 's/---//g' | xargs -n5 | head -n -1 > 24HourExtract.dat
ids=$(cat 24HourExtract.dat | awk -F'|' '{print $2}' | sort -u | sed -e 's/\n//g')
comp=$(cat 24HourExtract.dat | awk -F'|' '{print $3}' | sed -e 's/\n//g' | sort -u)
for id in `echo $ids`
do
for component in `echo $comp`
do
grep -i ${id} 24HourExtract.dat | grep -i ${component} > ${id}_${component}.dat
#Call formater function
func_formater {id}_${component}.dat ${id}_${component}_final.dat
echo "${component}" > ${id}_${component}_final1.DAT
cat ${id}_${component}_final.dat >> ${id}_${component}_final1.DAT
done
done
}
TIMEEND="2015-09-08 23:59:00.0"
TIMESTART="2015-09-08 00:00:00.0"
24Hour "${TIMESTART}" "${TIMEEND}"
---------- Post updated at 03:12 AM ---------- Previous update was at 03:11 AM ----------
we are using Linux RedHat and bash as the default SHELL
ernesto:
Hi all,
i have a file which contains something as below:
INPUT
00:00 0
01:00 0
02:00 0
03:00 0
04:00 0
05:00 0
06:00 0
07:00 0
08:00 0
09:00 0
10:00 5
11:00 0
13:00 5
14:00 4
15:00 0
16:00 7
17:00 6
18:00 0
19:00 0
20:00 0
23:00 0
what i wanted to achieved is that there should be a checking if there's a missing field/row on the first column. Desired output is shown below.
DESIRED OUTPUT
00:00 0
01:00 0
02:00 0
03:00 0
04:00 0
05:00 0
06:00 0
07:00 0
08:00 0
09:00 0
10:00 5
11:00 0
12:00 0
13:00 5
14:00 4
15:00 0
16:00 7
17:00 6
18:00 0
19:00 0
20:00 0
21:00 0
22:00 0
23:00 0
Thanks in advance.
Hello Ernesto,
Following may help you in same. Let's say our Input_file is as follows.
cat test2323
00:00 0
01:00 0
02:00 0
03:00 0
04:00 0
05:00 0
06:00 0
07:00 0
08:00 0
09:00 0
10:00 5
11:00 0
13:00 5
14:00 4
15:00 0
16:00 7
17:00 6
18:00 0
19:00 0
20:00 0
23:00 0
27:00 0
81:00 0
Then following code may help us in same.
awk -F":" 'NR==1{print;A=$1}{if($1==A+1){print;A=$1} else {diff=$1-A;while(diff>0){print A+1 FS "00 0";A=A+1;diff--}};}' test2323
Output will be as follows.
00:00 0
01:00 0
02:00 0
03:00 0
04:00 0
05:00 0
06:00 0
07:00 0
08:00 0
09:00 0
10:00 5
11:00 0
12:00 0
13:00 0
14:00 4
15:00 0
16:00 7
17:00 6
18:00 0
19:00 0
20:00 0
21:00 0
22:00 0
23:00 0
24:00 0
25:00 0
26:00 0
27:00 0
28:00 0
29:00 0
30:00 0
31:00 0
32:00 0
33:00 0
34:00 0
35:00 0
36:00 0
37:00 0
38:00 0
39:00 0
40:00 0
41:00 0
42:00 0
43:00 0
44:00 0
45:00 0
46:00 0
47:00 0
48:00 0
49:00 0
50:00 0
51:00 0
52:00 0
53:00 0
54:00 0
55:00 0
56:00 0
57:00 0
58:00 0
59:00 0
60:00 0
61:00 0
62:00 0
63:00 0
64:00 0
65:00 0
66:00 0
67:00 0
68:00 0
69:00 0
70:00 0
71:00 0
72:00 0
73:00 0
74:00 0
75:00 0
76:00 0
77:00 0
78:00 0
79:00 0
80:00 0
81:00 0
Thanks,
R. Singh
Below code will suffice even if the records are missing from the top or bottom of the list
awk -F ':' '
BEGIN {
p = -1;
t = "00\t0"
}
{
p++;
if($1 + 0 != p) {
for(i = p; i < $1+0; i++) {
printf "%02d:%s\n", p++, t
}
};
print
}
END {
for(i = ++p; i <= 23; i++) {
printf "%02d:%s\n", p++, t
}
}' file
RudiC
September 9, 2015, 6:17pm
6
Try also
awk '{split($1,N,":"); while (++L < N[1]) printf "%02d:00\t0 <--\n", L; L=N[1]} 1' file
00:00 0
01:00 0
02:00 0
03:00 0
04:00 0
05:00 0
06:00 0
07:00 0
08:00 0
09:00 0
10:00 5
11:00 0
12:00 0 <--
13:00 5
14:00 4
15:00 0
16:00 7
17:00 6
18:00 0
19:00 0
20:00 0
21:00 0 <--
22:00 0 <--
23:00 0
1 Like