Extract repetead values in date range

Gents,

Using the following values:

3687650458 08/29/2017 1.67581
3687650388 08/29/2017 1.67581
3687650330 08/29/2017 1.67581
3687650330 08/29/2017 1.67581
3687650330 08/28/2017 2.67581
3687650330 08/28/2017 2.67581
3687650330 08/27/2017 3.67581
3687650330 08/27/2017 3.67581
3687650330 08/26/2017 4.67581
3687650330 08/26/2017 4.67581
3687650330 08/25/2017 5.67581
3687650330 08/25/2017 5.67581
3687650330 08/24/2017 6.67581
3687650330 08/24/2017 6.67581
3687650330 08/23/2017 7.67581
3687649144 08/29/2017 1.67581
3687649144 08/29/2017 1.67581
3687649144 08/28/2017 2.67581
3687649144 08/28/2017 2.67581
3687649144 08/27/2017 3.67581
3687649144 08/27/2017 3.67581
3687649144 08/26/2017 4.67581
3687649144 08/26/2017 4.67581
3687649144 08/25/2017 5.67581
3687649144 08/25/2017 5.67581
3687649144 08/24/2017 6.67581
3687649144 08/24/2017 6.67581
3687649144 08/23/2017 7.67581
3686449774 08/25/2017 5.67581
3686449774 08/24/2017 6.67581
3686449774 08/24/2017 6.67581
3686449774 08/23/2017 7.67581
3685250156 08/29/2017 1.67581
3685250156 08/29/2017 1.67581
3685250156 08/28/2017 2.67581
3685250156 08/28/2017 2.67581
3685250156 08/27/2017 3.67581
3685250156 08/27/2017 3.67581
3685250156 08/26/2017 4.67581
3685250156 08/26/2017 4.67581
3685250156 08/25/2017 5.67581
3685250156 08/25/2017 5.67581
3685250156 08/24/2017 6.67581
3685250156 08/24/2017 6.67581
3685250156 08/23/2017 7.67581
3685250108 08/29/2017 1.67581
3685250108 08/29/2017 1.67581
3685250108 08/28/2017 2.67581
3685250108 08/28/2017 2.67581
3685250108 08/27/2017 3.67581
3685250108 08/27/2017 3.67581
3685250108 08/26/2017 4.67581

Desired output

3687650330 08/23/2017 08/29/2017 7
3687649144 08/23/2017 08/29/2017 7
3685250156 08/23/2017 08/29/2017 7
3685250108 08/26/2017 08/29/2017 4

The required data in output are the minimum and maximum date where the same value in column is repeated during this days.. the column 3 is the difference between the 2 dates.. .

As u notice I need only the values which are more than 3 days repeated and always keep only the last date in this case

08/29/2017 

.

Example i don't extract the value below because the repeated values are not ending with date

08/29/2017 

.

3686449774 08/23/2017 08/25/2017 3

Please help, thanks..

Any attempts / ideas / thoughts from your side? You promised last time...

1 Like

Hi RudiC

Using the following code i get close..

dt=`awk '{print $2}' tmp5 | sort -n -t"/" -k3 -k1 -k2 | tail -1`

awk     '               {D1=$1
                         D2=$2
             D3=$3
                        }
         !(D1 in MIN)   {MIN[D1]=D2
                         MAX[D1]=D2
             TIM[D1]=D3
                         next
                        }
          D2 < MIN[D1]  {MIN[D1]=D2}
          D2 > MAX[D1]  {MAX[D1]=D2}
          END           {for (m in MIN) print m, MIN[m], MAX[m], TIM[m]}
        ' tmp5 | awk '{if($3 ~ "'$dt'") print}' 

output

3685250108 08/26/2017 08/29/2017 1.67581
3687650330 08/23/2017 08/29/2017 1.67581
3685250156 08/23/2017 08/29/2017 1.67581
3687650458 08/29/2017 08/29/2017 1.67581
3687650388 08/29/2017 08/29/2017 1.67581
3687649144 08/23/2017 08/29/2017 1.67581

only i need to get the correct value in diff between 2 days..

Please, help.. Tks

Not sure I understand - what exactly is missing? And, how to get at it?

1 Like

Hi RudiC

I will like to get the days diff between min and max date.

3685250108 08/26/2017 08/29/2017 4
3687650330 08/23/2017 08/29/2017 7
3685250156 08/23/2017 08/29/2017 7
3687650458 08/29/2017 08/29/2017 1
3687650388 08/29/2017 08/29/2017 1
3687649144 08/23/2017 08/29/2017 7

thanks in advance.

I am missing YOUR approach to calculate that required difference in your (?) attempt... howsoever, try

awk     '
                {DV = substr ($2, 7) substr ($2, 1, 2) substr ($2, 4, 2)
                }
!($1 in MIN)    {MIN[$1] = DV
                 MAX[$1] = DV
                 MIS[$1] = $2
                 MAS[$1] = $2
                 next
                }
DV > MXDV       {MXDV = DV
                }
DV < MIN[$1]    {MIN[$1] = DV
                 MIS[$1] = $2
                }
DV > MAX[$1]    {MAX[$1] = DV
                 MAS[$1] = $2
                }

END             {for (m in MIN) if (MAX[m] == MXDV) print m, MIS[m], MAS[m], MAX[m] - MIN[m] + 1
                }
' file
3685250108 08/26/2017 08/29/2017 4
3687650330 08/23/2017 08/29/2017 7
3685250156 08/23/2017 08/29/2017 7
3687650458 08/29/2017 08/29/2017 1
3687650388 08/29/2017 08/29/2017 1
3687649144 08/23/2017 08/29/2017 7
1 Like

Hello RudiC,

It works perfectly.

Thanks a lot