I have a process that requires me to read data from huge log files and find the most recent entry on a per-user basis. The number of users may fluctuate wildly month to month, so I can't code for it with names or a set number of variables to capture the data, and the files are large so I don't want to read the it several times.
The entries of interest have a particular string so I can extract just them from the overall log file and I have a way to split the output into separate files on a per-user basis, my plan being to then just read the last line of each files created with tail -1
and the filename giving me the user account in question.
My boss, however, worries about false-positive data matches for my expression (by chance or maliciously) that might try to overwrite a critical file.
My data has a syslog-type date in it which means doing a sort -u
is proving tricky too. I've got this far with splitting the data out to files under /tmp/logs as splitlog.rbatte1 or similar but if field 11 were ever */../../etc/passwd
then potentially I would be in trouble.
The date is the first three fields and 'as far as I am aware' a valid user name would be in field 11, but ........
A simplified part of the code would be:-
grep "Active transaction started" /var/log/qapplog | awk "{print \$1, \$2, \$3, \$11> \"/tmp/logs/splitlog.\"\$11}"
for userfile in /tmp/logs/splitlog.*
do
lastrecord=$(tail -1 $userfile)
printf "User %s last record is %s\n" "$userfile" "$lastrecord"
.... whatever else here ....
done
I have considered adding tr -d "\/"
to strip out the characters, but now that it's been raised, I'm concerned that there may be other things I'm not considering.
Is there a better way to work here, potentially with awk getting the equivalent of basename "$11"
or variable substitution in the shell of "${{11}##*/}"
?
Any suggestions welcome. Perhaps there is a better design overall that will find the last entry on a per-user basis. The log is thankfully written in time order, so the last in the file by user name is the last by time already.
Kind regards,
Robin