Perl read file with dates in

ab52 · May 13, 2013, 11:10am

Hi All

I have a text file that has a list of dates in it ( see below example) is there i can just pull out the lines that are from this week ( week starting on monday) and then work out the how many occurances there are on each name in collum 2

2013-05-13 08:20:02       bacha           Blah              524gps
2013-05-13 08:30:02       guppy           blah              125gps
2013-05-13 12:10:02       mojarra        blah                  104gps
2013-05-13 13:59:01       longfin          blah               105gps

the collums are made using sprintf "%-25s %-15s %-20s %-6s %-20s

many thanks
A

DGPickett · May 13, 2013, 5:30pm

So, filter for these lines, filter for date range, extract user name and get counts? Shell/sed might be easier, but I am not a PERL fancier. To do the date bit, need input or clock algorythm, your definition of a week, get rid of the hyphens and do integer compare. You can use an associative vector to keep counts or sort|uniq -c.

durden_tyler · May 14, 2013, 12:38pm

$
$ # Current week is from Monday, 5/13/2013 to Sunday, 5/19/2013
$ cat input.txt
2013-05-11 09:10:11 bacha Blah 524gps
2013-05-13 08:20:02 bacha Blah 524gps
2013-05-13 08:30:02 guppy blah 125gps
2013-05-13 12:10:02 mojarra blah 104gps
2013-05-13 13:59:01 longfin blah 105gps
2013-05-14 20:21:22 longfin blah 105gps
2013-05-19 22:23:24 longfin blah 105gps
2013-05-20 09:45:59 longfin blah 105gps
$
$
$ ##
$ perl -lne 'BEGIN {
               use Time::Local 'timelocal_nocheck';
               @x = localtime;
               if ($x[6] == 0) { $sdow = $x[7] - 5;         $edow = $x[7] + 2         }
               else            { $sdow = $x[7] - $x[6] + 2; $edow = $x[7] - $x[6] + 8 }
               @s = localtime timelocal_nocheck 0,0,0,$sdow,0,(1900+$x[5]);
               $s[5] += 1900; $s[4]++;
               $sow = sprintf("%4d%02d%02d",$s[5],$s[4],$s[3]);
               @e = localtime timelocal_nocheck 59,59,23,$edow,0,(1900+$x[5]);
               $e[5] += 1900; $e[4]++;
               $eow = sprintf("%4d%02d%02d",$e[5],$e[4],$e[3]);
             }
             ($y, $mn, $d, $h, $mi, $s, $name) = m/^(\d+)-(\d+)-(\d+)\s+(\d+):(\d+):(\d+)\s+(\w+).*/;
             $curr_dt = sprintf("%4d%02d%02d",$y,$mn,$d);
             if ($sow <= $curr_dt and $curr_dt <= $eow) { $occurrences{$name}++ }
             END {
               while (($k, $v) = each %occurrences) {
                 print "Name : $k\tNo. of occurrences = $v";
               }
             }
            ' input.txt
Name : longfin  No. of occurrences = 3
Name : mojarra  No. of occurrences = 1
Name : bacha    No. of occurrences = 1
Name : guppy    No. of occurrences = 1
$
$

DGPickett · May 14, 2013, 2:42pm

Just mostly off the cuff, not PERL but correct general approach, not tested, narrative follows:

export lo=20130513 hi=20130519
(IFS="-$IFS"
 declare -A cts 
 while read y m d t u rest
 do
  ymd="$y$m$d"
 
  case "$ymd" in
  ([21][09][0-9][0-9][0-1][0-9][0-3][0-9])
   if (( ymd <= hi && ymd >= lo ))
   then
    if (( ++cts[$u] == 1 ))
    then
     us="$us $u"
    fi
   fi
   ;;
  (*)
   ;;
  esac
 done
 for u in $us
 do
  echo $u $cts[$u]
 done
 
 
 
 
 

)<input_file >cts_file

Store low and high dates in integer form
Open a subshell to segregate code with funny $IFS including dash. (Hope it does not mess up other things!)
Creat an associative array cts for counts.
Begin reading lines into 6 variables: year, month, day, time, user, rest.
If the line starts with a date, increment/create a count for that user.
If this is the first count, save user names in us.
After reading all lines, dump the cts for all users.

I might have left IFS alone and read the date as d, removed the - to create ymd: ymd=`echo $d | tr -d '[-]'` but thats an exec per line, si I need to read bash man to see how to remove them with builtins. I could have neatened up the input with sed so it was only 'ymd u' lines.

ab52 · May 21, 2013, 4:12pm

thanks guys wokrs like s charm

ab52 · May 29, 2013, 3:38pm

Thanks for this it works great. Can you point out the place in the code where i can adjust the date it looks for.

Thanks
A

durden_tyler:

$
$ # Current week is from Monday, 5/13/2013 to Sunday, 5/19/2013
$ cat input.txt
2013-05-11 09:10:11 bacha Blah 524gps
2013-05-13 08:20:02 bacha Blah 524gps
2013-05-13 08:30:02 guppy blah 125gps
2013-05-13 12:10:02 mojarra blah 104gps
2013-05-13 13:59:01 longfin blah 105gps
2013-05-14 20:21:22 longfin blah 105gps
2013-05-19 22:23:24 longfin blah 105gps
2013-05-20 09:45:59 longfin blah 105gps
$
$
$ ##
$ perl -lne 'BEGIN {
   use Time::Local 'timelocal_nocheck';
   @x = localtime;
   if ($x[6] == 0) { $sdow = $x[7] - 5;         $edow = $x[7] + 2         }
   else            { $sdow = $x[7] - $x[6] + 2; $edow = $x[7] - $x[6] + 8 }
   @s = localtime timelocal_nocheck 0,0,0,$sdow,0,(1900+$x[5]);
   $s[5] += 1900; $s[4]++;
   $sow = sprintf("%4d%02d%02d",$s[5],$s[4],$s[3]);
   @e = localtime timelocal_nocheck 59,59,23,$edow,0,(1900+$x[5]);
   $e[5] += 1900; $e[4]++;
   $eow = sprintf("%4d%02d%02d",$e[5],$e[4],$e[3]);
   }
   ($y, $mn, $d, $h, $mi, $s, $name) = m/^(\d+)-(\d+)-(\d+)\s+(\d+):(\d+):(\d+)\s+(\w+).*/;
   $curr_dt = sprintf("%4d%02d%02d",$y,$mn,$d);
   if ($sow <= $curr_dt and $curr_dt <= $eow) { $occurrences{$name}++ }
   END {
   while (($k, $v) = each %occurrences) {
   print "Name : $k\tNo. of occurrences = $v";
   }
   }
   ' input.txt
Name : longfin  No. of occurrences = 3
Name : mojarra  No. of occurrences = 1
Name : bacha    No. of occurrences = 1
Name : guppy    No. of occurrences = 1
$
$

durden_tyler · May 29, 2013, 10:54pm

I don't understand.
You want to adjust the date that this code looks for??
You'll have to adjust the date in the input file (data file) itself. Not sure what the point of doing that would be.

jim_mcnamara · May 30, 2013, 7:01am

IF your input file is different from what you posted, please post a few EXACT lines of the input file you use. Not what you think we need to see.

DGPickett · May 30, 2013, 4:37pm

You can grep out just one date, or use sed, since sed can quit when it sees a higher date. Sed can find the first desired date, start passing data until it sees the next date or EOF, and quit without passing (printing) that line. You might have to keep 1-2 lines in the buffer so you can detect the exit in time to prevent it being printed. Sed can do this using N and P. I am sure PERL can emulate sed in this even more simply.