Helpm with time function

alula · December 23, 2016, 10:05am

I want to print the difference (in days) between ($7) and the oldest record date ($6) based on unique ID ($5) on a new field. In addition, I want to subtract oldest date from recent dates(in days) ($6) for each unique ID ($5).

Here is the data looks like

 7  81  1    47  32070  2010-12-14    20101009  
 7  82  2    10  41920  2010-12-14    20100724  
 7  83  1    67  29446  2010-12-14    20101118   
 7  81  1    47  32070  2011-5-11     20101009     
 7  83  1    67  29446  2011-6-22     20101118     
 7  82  2    10  41920  2011-5-14     20100724

I would like to see as follows

 7  81  1    47  32070   2010-12-14   20101009     65      147    
 7  82  2    10  41920   2010-12-14   20100724     170     150  
 7  83  1    67  29446   2010-12-14   20101118     26      188  
 7  81  1    47  32070   2011-5-11    20101009      65      147  
 7  83  1    67  29446   2011-6-22    20101118      26      188  
 7  82  2    10  41920   2011-5-14    20100724      170     150

I have used the following code but doesnt work well. could you help me please.

BEGIN{
  lastid=0
  lastmilk=0
}
{
   milkyear=substr($6,1,4)
  milkmonth=substr($6,5,2)
  milkday=substr($6,7,2)
  
  startmilksec = mktime(""milkyear" "milkmonth" "milkday"  00")
  endmilk = mktime(""milkyear" "milkmonth" "milkday" 00")
   if(lastid!=$5) 
  {
    printf("%s 0 0 0\n", $0) 
  }
  else
  {
    startmilkdiff=startmilk-lastmilk
    printf("%s %0.f %f %f\n", $0, startmilkdiff) # 
  }
   lastid=$5
  lastmilk=startmilk
}

rbatte1 · December 23, 2016, 11:38am

Sadly, the above doesn't help us diagnose it very well. What output/errors do you get?
Can you get any trace output from/after the mktime function? I think that the date input to mktime is actually a string in the format YYYY MM DD HH MM SS and you don't have all of it, hence I'm wondering if that's where it's going wrong. There is also confusion about all the double quotes.

I also don't see where you are trying to output the days as days. You are displaying the output as the raw difference between two timestamps, which we don't know if they are formatted correctly.

Can you add some printf statements into your code and show us the output from a single input record?

Kind regards,
Robin

RudiC · December 23, 2016, 11:47am

Your specification is not too clear, and the code doesn't help understanding nor interpreting .
It is always beneficial to show WHAT exactly "doesn't work well". A few comments that jump to mind looking at your code :

where is $7 used for computations?
milkmonth and milkday will not have correct values as the substr parameters are wrong and could be moving.
lastmilk is undefined for the first line, and not reset if $5 changes. This may be intended behaviour, though.
endmilk is never used; why not drop it entirely?
as startmilkdiff is the only variable in the printf stament, three %f format specifiers are redundant.
as correspondent $5 values are NOT in sequence, you'd better work with arrays, or do a sort beforehand.

EDIT:

plus, the mktime function requires 6 or 7 parameters; you supply 4 only.
you want to output the difference in days, but calculate and print second values only.

alula · December 23, 2016, 12:20pm

the data has a repeated record on different days for the same Id [$5] and i want to print the difference between the oldest and recent record date (in days) [$6] for each unique ID on separate field. In addition, i want to print separately also the difference between field [$7] (which is a date) and the oldest record date of field [$6] for each unique ID. I am struggling to write the command. could you help me in writting the script please.

jim_mcnamara · December 24, 2016, 9:47am

Want help? Show us your output as your code now produces please.

alula · December 24, 2016, 12:04pm

I need to print the difference (in days) in ($6) between the starting and end date of records for each unique ID ($5) on a new field.

[

 7  65  2    5   32070  2010-12-14    13:25:30  
 7  82  2    10  41920  2010-12-14    11:30:45
 7  83  1    67  29446  2010-12-14    04:15:25  
 7  81  1    47  32070  2011-5-11      08:14:20
 7  83  1    67  29446  2011-6-22      07:13:24    
 7  82  2    10  41920  2011-5-14      06:15:25

]

I want to see like this:
code:
[

 7  65  2    5   32070   2010-12-14      13:25:30      65     
 7  82  2    10  41920   2010-12-14      11:30:45     170  
 7  83  1    67  29446   2010-12-14       04:15:25    26   
 7  81  1    47  32070   2011-5-11         08:14:20    65    
 7  83  1    67  29446   2011-6-22         07:13:24    26   
 7  82  2    10  41920   2011-5-14          06:15:25   70

]

jim_mcnamara · December 25, 2016, 5:15pm

This is some awk code. Because you wanted to add days to lines where it could not be calculated, this code is two awk scripts in one bash script. It could be cleaned up.
It adds -1 to lines in error, error being a singleton line.

$ cat filename && ./t.awk
7  65  2    5   32070  2010-12-14    13:25:30
7  82  2    10  41920  2010-12-14    11:30:45
7  83  1    67  29446  2010-12-14    04:15:25
7  81  1    47  32070  2011-5-11      08:14:20
7  83  1    67  29446  2011-6-22      07:13:24
7  82  2    10  41920  2011-5-14      06:15:25
7  82  2    10  41921  2011-5-14      06:15:25

7  65  2    5   32070  2010-12-14    13:25:30   147
7  82  2    10  41920  2010-12-14    11:30:45 150
7  83  1    67  29446  2010-12-14    04:15:25   189
7  81  1    47  32070  2011-5-11      08:14:20 147
7  83  1    67  29446  2011-6-22      07:13:24     189
7  82  2    10  41920  2011-5-14      06:15:25 150
7  82  2    10  41921  2011-5-14      06:15:25 -1

# 7  82  2    10  41920  2011-5-14      06:15:25


awk '{
       split($6,arr,"-")
       a=sprintf("%s %s %s 0 0 0",arr[1], arr[2], arr[3])
       d=mktime(a)
       # this is to handle the fact that you want the same data early and late 
       # in the output
       delta[$5]=delta[$5] " " d
     } 
     END {for(i in delta) {print i, delta}  }'  filename > tmp.dat

#    
# display output, print error "-1" if each $5 does not occur twice in the input file
awk '{
      if (FILENAME=="tmp.dat" )
      { 
        delta[$1]=$0; 
        next
      }
      if (FILENAME=="filename")
      { 
        a="-1"  # default is error
        if($5 in delta)
        {
           cnt=split(delta[$5],arr)
           if(cnt==3) # correct number, 1 for $5 two & three are epoch seconds
           {
             a=arr[3] - arr[2]
             a/=86400
             a=int(a)
           }
        }
        print $0, a      
        next
      }
    }' tmp.dat filename   
# note the order of files here is important

alula · December 26, 2016, 6:11am

@jim thanks for the script. I have tried each of the script but it doesn't work. could you clarify me better and which script to use. thanks for your help.

RudiC · December 26, 2016, 7:27am

It is always helpful to post EXACTLY what "doesn't work" and so keep people from guessing.
Any output? No output? Error messages?

Both scripts need to be run in sequence, and the data file name needs to be "filename", so either rename your file or adapt/correct in the second script line 7. On top, it would be beneficial to remove the ^M (\r, 0x0D) DOS line terminators from the input file.

It were benficial also if the requirements and data structures DON'T change between posts (compare #1 and #6).

Don_Cragun · December 26, 2016, 3:00pm

Moderator comments were removed during original forum migration.

alula · December 27, 2016, 3:46am

this is the error message that appears.

awk: datecommand.txt:1: awk '{
awk: datecommand.txt:1:     ^ invalid char ''' in expression
awk: datecommand.txt:1: awk '{
awk: datecommand.txt:1:     ^ syntax error

do you have a solution?

Don_Cragun · December 27, 2016, 4:05am

Moderator comments were removed during original forum migration.