I want to print the difference (in days) between ($7) and the oldest record date ($6) based on unique ID ($5) on a new field. In addition, I want to subtract oldest date from recent dates(in days) ($6) for each unique ID ($5).
Sadly, the above doesn't help us diagnose it very well. What output/errors do you get?
Can you get any trace output from/after the mktime function? I think that the date input to mktime is actually a string in the format YYYY MM DD HH MM SS and you don't have all of it, hence I'm wondering if that's where it's going wrong. There is also confusion about all the double quotes.
I also don't see where you are trying to output the days as days. You are displaying the output as the raw difference between two timestamps, which we don't know if they are formatted correctly.
Can you add some printf statements into your code and show us the output from a single input record?
Your specification is not too clear, and the code doesn't help understanding nor interpreting .
It is always beneficial to show WHAT exactly "doesn't work well". A few comments that jump to mind looking at your code :
where is $7 used for computations?
milkmonth and milkday will not have correct values as the substr parameters are wrong and could be moving.
lastmilk is undefined for the first line, and not reset if $5 changes. This may be intended behaviour, though.
endmilk is never used; why not drop it entirely?
as startmilkdiff is the only variable in the printf stament, three %f format specifiers are redundant.
as correspondent $5 values are NOT in sequence, you'd better work with arrays, or do a sort beforehand.
EDIT:
plus, the mktime function requires 6 or 7 parameters; you supply 4 only.
you want to output the difference in days, but calculate and print second values only.
the data has a repeated record on different days for the same Id [$5] and i want to print the difference between the oldest and recent record date (in days) [$6] for each unique ID on separate field. In addition, i want to print separately also the difference between field [$7] (which is a date) and the oldest record date of field [$6] for each unique ID. I am struggling to write the command. could you help me in writting the script please.
This is some awk code. Because you wanted to add days to lines where it could not be calculated, this code is two awk scripts in one bash script. It could be cleaned up.
It adds -1 to lines in error, error being a singleton line.
# 7 82 2 10 41920 2011-5-14 06:15:25
awk '{
split($6,arr,"-")
a=sprintf("%s %s %s 0 0 0",arr[1], arr[2], arr[3])
d=mktime(a)
# this is to handle the fact that you want the same data early and late
# in the output
delta[$5]=delta[$5] " " d
}
END {for(i in delta) {print i, delta} }' filename > tmp.dat
#
# display output, print error "-1" if each $5 does not occur twice in the input file
awk '{
if (FILENAME=="tmp.dat" )
{
delta[$1]=$0;
next
}
if (FILENAME=="filename")
{
a="-1" # default is error
if($5 in delta)
{
cnt=split(delta[$5],arr)
if(cnt==3) # correct number, 1 for $5 two & three are epoch seconds
{
a=arr[3] - arr[2]
a/=86400
a=int(a)
}
}
print $0, a
next
}
}' tmp.dat filename
# note the order of files here is important
@jim thanks for the script. I have tried each of the script but it doesn't work. could you clarify me better and which script to use. thanks for your help.
It is always helpful to post EXACTLY what "doesn't work" and so keep people from guessing.
Any output? No output? Error messages?
Both scripts need to be run in sequence, and the data file name needs to be "filename", so either rename your file or adapt/correct in the second script line 7. On top, it would be beneficial to remove the ^M (\r, 0x0D) DOS line terminators from the input file.
It were benficial also if the requirements and data structures DON'T change between posts (compare #1 and #6).