Replace missing row with 0

ernesto · September 9, 2015, 3:33am

Hi all,

i have a file which contains something as below:

INPUT

what i wanted to achieved is that there should be a checking if there's a missing field/row on the first column. Desired output is shown below.

DESIRED OUTPUT

Thanks in advance.

Don_Cragun · September 9, 2015, 3:56am

Is this a homework assignment? If so, it must be filed in the Homework and Coursework Forum and a completely filled out Homework template must be included in your post.

If this is not homework, please explain why you are trying to do this and show us what you have tried. Please also tell us what operating system and shell you're using.

ernesto · September 9, 2015, 4:12am

Hi Sir,

it's not. We are getting stats based on a table which has 00:00:00 to 23:59:00 logging. and i am getting the total for an hour.

Hope you could help me.

Cheers,

---------- Post updated at 03:11 AM ---------- Previous update was at 03:02 AM ----------

We are using a hadoop nosql as the DB

Please see code below. The code part is more of the function that creates the files.

function func_formater
{

   file=$1
   outfile=$2

   awk 'BEGIN{
                   OFS="\t";

             }
             {
                   sub(/\|/,X,$NF);
                   A[substr($2,1,2)]+=$NF
             }
        END  {
                for(i in A){
                                print i":00" OFS A
                           }
             }
        ' OFS="\t" $file | sort -n > $outfile

}

function 24Hour
{
        STARTTIME=$1
        ENDTIME=$2

        handlerFile="24HourHandler.dat"

        comp="SELECT time, server, component, sum(a+b) as total FROM <table> where logtime >= '${STARTTIME}' and logtime < '${ENDTIME}'  GROUP BY time, server, component ORDER BY time, server, component ASC;"

		#run in hadoop db
        out=`ssh $user@$host "echo \"$comp\" > ${compFile};dbname run -file=${compFile} address=$host -port=$port"`

        echo $outHandler | cut -d'-' -f13- | sed -e 's/---//g' | xargs -n5 | head -n -1  > 24HourExtract.dat

        ids=$(cat 24HourExtract.dat | awk -F'|' '{print $2}' | sort -u | sed -e 's/\n//g')
        comp=$(cat 24HourExtract.dat | awk -F'|' '{print $3}' | sed -e 's/\n//g' | sort -u)

        for id in `echo $ids`
        do
                for component in `echo $comp`
                do
                        grep -i ${id} 24HourExtract.dat | grep -i ${component} > ${id}_${component}.dat

                        #Call formater function
                        func_formater {id}_${component}.dat ${id}_${component}_final.dat

                        echo "${component}" > ${id}_${component}_final1.DAT
                        cat ${id}_${component}_final.dat >> ${id}_${component}_final1.DAT

                done
        done
}


TIMEEND="2015-09-08 23:59:00.0"
TIMESTART="2015-09-08 00:00:00.0"

24Hour "${TIMESTART}" "${TIMEEND}"

---------- Post updated at 03:12 AM ---------- Previous update was at 03:11 AM ----------

we are using Linux RedHat and bash as the default SHELL

RavinderSingh13 · September 9, 2015, 4:39am

ernesto:

Hi all,

i have a file which contains something as below:

INPUT
00:00   0
01:00   0
02:00   0
03:00   0
04:00   0
05:00   0
06:00   0
07:00   0
08:00   0
09:00   0
10:00   5
11:00   0
13:00   5
14:00   4
15:00   0
16:00   7
17:00   6
18:00   0
19:00   0
20:00   0
23:00   0
what i wanted to achieved is that there should be a checking if there's a missing field/row on the first column. Desired output is shown below.

DESIRED OUTPUT
00:00   0
01:00   0
02:00   0
03:00   0
04:00   0
05:00   0
06:00   0
07:00   0
08:00   0
09:00   0
10:00   5
11:00   0
12:00   0
13:00   5
14:00   4
15:00   0
16:00   7
17:00   6
18:00   0
19:00   0
20:00   0
21:00   0
22:00   0
23:00   0
Thanks in advance.

Hello Ernesto,

Following may help you in same. Let's say our Input_file is as follows.

 cat test2323
00:00   0
01:00   0
02:00   0
03:00   0
04:00   0
05:00   0
06:00   0
07:00   0
08:00   0
09:00   0
10:00   5
11:00   0
13:00   5
14:00   4
15:00   0
16:00   7
17:00   6
18:00   0
19:00   0
20:00   0
23:00   0
27:00   0
81:00   0

Then following code may help us in same.

  awk -F":" 'NR==1{print;A=$1}{if($1==A+1){print;A=$1} else {diff=$1-A;while(diff>0){print A+1 FS "00   0";A=A+1;diff--}};}' test2323

Output will be as follows.

Thanks,
R. Singh

SriniShoo · September 9, 2015, 6:32am

Below code will suffice even if the records are missing from the top or bottom of the list

awk -F ':' '
BEGIN {
 p = -1; 
 t = "00\t0"
}
{
 p++; 
 if($1 + 0 != p) {
  for(i = p; i < $1+0; i++) {
   printf "%02d:%s\n", p++, t
  }
 };
 print
} 
END {
 for(i = ++p; i <= 23; i++) {
  printf "%02d:%s\n", p++, t
 }
}' file

RudiC · September 9, 2015, 6:17pm

Try also

awk '{split($1,N,":"); while (++L < N[1]) printf "%02d:00\t0 <--\n", L; L=N[1]} 1' file
00:00   0
01:00   0
02:00   0
03:00   0
04:00   0
05:00   0
06:00   0
07:00   0
08:00   0
09:00   0
10:00   5
11:00   0
12:00    0 <--
13:00   5
14:00   4
15:00   0
16:00   7
17:00   6
18:00   0
19:00   0
20:00   0
21:00    0 <--
22:00    0 <--
23:00   0