Missing date

hi team,

i have a file contains data as follows

F1 file

date system name
1-jan-2012 x
1-jan-2012 y
1-jan-2012 x
5-jan-2012 y
3-jan-2012 z
3-jan-2012 z
4-jan-2012 x
4-jan-2012 x

now i want missing system date if any from F2.

for example: let say x system date is not available in this file for 2,3 jan2012. if not available then i need to display 0 for missing date for system x.

let say z system date is not available in this file for 1,2 jan2012. if not available then i need to display 0 for missing date for system y.

Please let me know if you have any queries.

quick reply must be appreciated.

What would the output look like given your the F1 file above?

the output will be like as below.
date systemName count
1-jan-2012 x 1
1-jan-2012 x 0
2-jan-2012 x 0
3-jan-2012 x 0
1-jan-2012 y 0
2-jan-2012 y 0
3-jan-2012 y 0
4-jan-2012 y 1
1-jan-2012 z 0
2-jan-2012 z 0
3-jan-2012 z 0

Why no 4-jan-2012 z 0 record ?

o/p :
4-jan-2012 y 2
---as it is 2 time available in file

If you have gnu awk you could do this:

awk -F "[- ]" '
BEGIN {
   split("jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec", mth, ",")
   for(i=1;i<=12;i++) m[mth]=i;
   min=999999999999;
   OFS=" "
   print "date", "systemName","count"
}
NR>1{
  dt=mktime(sprintf("%04d %02d %02d 00 00 00", $3, m[$2], $1));
  min=min>dt?dt:min;
  max=max<dt?dt:max;
  h[$4]
  c[dt,$4]++
}
END {
   for(host in h)
      for(d=min;d<=max;d+=3600*24)
          print strftime("%d-%b-%Y", d),host, (d SUBSEP host) in c ? c[d,host] : 0
}' infile

file content sample
--------------------------

01-aug-12,-
01-aug-12,-
01-aug-12,ARTACT001
01-aug-12,ARTACT001
01-aug-12,ARTACT001
01-aug-12,ARTACT001
01-aug-12,ARTBRIO
01-aug-12,ARTBRIO
01-aug-12,ARTBRIO
01-aug-12,ARTBRIO

your output is below:

01-Aug-12 i1b 3
02-Aug-12 i1b 3
03-Aug-12 i1b 3
04-Aug-12 i1b 3
05-Aug-12 i1b 2
06-Aug-12 i1b 2
07-Aug-12 i1b 2
08-Aug-12 i1b 2
09-Aug-12 i1b 2
10-Aug-12 i1b 2
11-Aug-12 i1b 2

Please let me know if you have any queries.

Umm, you've changed the format of the input file.

Updated solution for:

  • 2-digit year
  • comma as FS
  • no header row
  • "-" as hostname
awk -F "[-, ]" '
BEGIN {
   split("jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec", mth, ",")
   for(i=1;i<=12;i++) m[mth]=i;
   min=999999999999;
   OFS=" "
   print "date", "systemName","count"
}
{
  if(length($4)==0) $4="-"
  if($3 < 1800) $3+=2000
  dt=mktime(sprintf("%04d %02d %02d 00 00 00", $3, m[$2], $1));
  min=min>dt?dt:min;
  max=max<dt?dt:max;
  h[$4]
  c[dt,$4]++
}
END {
   for(host in h)
      for(d=min;d<=max;d+=3600*24)
          print tolower(strftime("%d-%b-%Y", d)),host, (d SUBSEP host) in c ? c[d,host] : 0
}' infile

really its a smart answer..you are in track

input file format is like this
-------------------------------------

01-aug-12,-,cpu
03-aug-12,-,mem
01-aug-12,ARTACT001
01-aug-12,ARTACT001
01-aug-12,ARTACT001
01-aug-12,ARTACT001
01-aug-12,ARTBRIO,cpu
03-aug-12,ARTBRIO,disk

The output will be like this..
-----------------------------

01-aug-12,-,cpu,1
02-aug-12,-,cpu,0
03-aug-12,-,cpu,1
01-aug-12,ARTACT001,3
01-aug-12,ARTBRIO,cpu,1
02-aug-12,ARTBRIO,0
03-aug-12,ARTBRIO,disk

quick response will really appriciated.

Hi friend,

can you please advice for the below scenario :

if any of component i.e. cpu/mem/disk missing , then i need to know missing date with component name.

Thanks
Rabindra

Try this:

awk '
BEGIN {
   split("jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec", mth, ",")
   for(i=1;i<=12;i++) m[mth]=i;
   min=999999999999;
   OFS=","
}
/,/ {
  p=index($0,",")
  split(substr($0,1,p),f,"-")
  dt=mktime(sprintf("%04d %02d %02d 00 00 00", 2000+f[3], m[f[2]], f[1]));
  min=min>dt?dt:min;
  max=max<dt?dt:max;
  h[substr($0,p+1)]
  c[dt,substr($0,p+1)]++
}
END {
   for(host in h)
      for(d=min;d<=max;d+=3600*24)
          print tolower(strftime("%d-%b-%y", d)),host, (d SUBSEP host) in c ? c[d,host] : 0
}' infile

hi friend,

previous code is working fine. Now this is my file contents

-----------------------------------
component,available_date,count
-----------------------------------
Mg_Message_count,5-Aug-12,48 Mg_Message_count,6-Aug-12,48 Mg_Message_count,7-Aug-12,42 Mg_Message_count,20-Aug-12,24 Mg_Message_count,21-Aug-12,24 Mg_Message_count,22-Aug-12,24 Mg_Message_count,23-Aug-12,24 Mg_Message_count,24-Aug-12,24

then my output would be
----------------------------

Mg_Message_count,5-Aug-12,48 Mg_Message_count,6-Aug-12,48 Mg_Message_count,7-Aug-12,42
Mg_Message_count,8-Aug-12,0 Mg_Message_count,9-Aug-12,0 Mg_Message_count,10-Aug-12,0 Mg_Message_count,11-Aug-12,0 Mg_Message_count,12-Aug-12,0 Mg_Message_count,13-Aug-12,0 Mg_Message_count,14-Aug-12,0 Mg_Message_count,15-Aug-12,0 Mg_Message_count,16-Aug-12,0 Mg_Message_count,17-Aug-12,0 Mg_Message_count,18-Aug-12,0 Mg_Message_count,19-Aug-12,0 Mg_Message_count,20-Aug-12,24 

need your help asap.

still i am on doubt. need your help soon. please let me know you have any queries.

This is solved in http://www.unix.com/shell-programming-scripting/201775-solved-missing-date-unix-2.html\#post302708793. I'm not sure double posting really helps.