Hi,
I have a script that search log files for the string CORRUPT and I then print 10 lines before and after the pattern match. Let's call this pattern_match.ksh
First I do a
grep -in "CORRUPTION DETECTED" $DIR_PATH/alert_${sid}* > ${tmpfile_00}.${sid}
which gives me the list of files that has the string "CORRUPTION DETECTED" in them
Then using a while loop, I do something like below. Ignore the ..., am just showing the part where I print the matching pattern and lines before and after the pattern match.
while read line
do
ALERTLOG=`echo $line | awk -F":" '{ print $1 }'`
str_found=`echo $line | awk -F":" '{ print $2 }'`
let str_before=${str_found}-10
let str_after=${str_found}+10
...
...
sed -n "${str_before},${str_after}p" ${ALERTLOG} > ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.tmp.CURRENT
echo
count=`ls -l ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.out.* 2>/dev/null | wc -l | awk '{ print $1 }'`
if [[ $count = 0 ]] ; then
let next=${count}+1
cp -p ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.tmp.CURRENT ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.out.${next}
else
...
...
...
cp -p ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.tmp.CURRENT ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.out.${next}
fi
fi
...
...
...
done < ${tmpfile_00}.${sid}
So, at the moment, it is doing what I am after, that is, so now I have extracts of files that contain the "CORRUPTION DETECTED" string with +/- 10 lines from the pattern match.
This is similar to
awk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=3 a=5 s="abcd"
from Print lines before and after pattern match Unfortunately, I don't have the nawk/gawk that I needed to use it.
There is also the sed one liner example
sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h
but unfortunately I can't get the proper syntax to get it to print more lines before the pattern match. I know how to print more lines after the pattern match but using several
n;p;
. Is there a short version for sed if you want to do
n;p;n;p;n;p;n;p;n;p;n;p;n;p;n;p;n;p;n;p;
which is to print 10 lines after the match
I don't have the grep version also that will allow me to grep and print lines before and after match, i.e.the
grep -A1 -B1
thingy.
Hence, I end up doing a grep -in and then doing +/- and sed -n is a long and crude way of doing what I am after but I don't know of any other way of doing it the way I can understand it I am having a hard time understanding the sed and awk one-liners. Also, my method makes it simpler for if I want to print more than +/- 10 lines, I simply change the lines that do the +/- section.
However, there are flaws to my script as always
- If for example the log file is small that it only has 10 lines for example, the sed -n "${str_before},${str_after}p" will then give error. I can't find a way of getting sed to check for valid line numbers to do a sed on, is there?
- Because the files that I am doing grep on doesn't get deleted until after a month or so, and I run this corruption check script daily, I do end up with several duplicate files named differently.
How do I check and remove duplicate files that are named differently? I used the following script and running md5sum. Script is name x.ksh at the moment, will change it later
Sample run of the x.ksh script with some example log files is as below:
$: ls -1 *log*
log.1
log.10
log.11
log.12
log.13
log.14
log.15
log.16
log.17
log.18
log.2
log.3
log.4
log.5
log.6
log.7
log.8
log.9
$: md5sum *log*
c931703fc30e4b98c0352029dca44573 log.1
d92e2c0237a6e575287f10c1a86f4353 log.10
c931703fc30e4b98c0352029dca44573 log.11
d92e2c0237a6e575287f10c1a86f4353 log.12
c931703fc30e4b98c0352029dca44573 log.13
d92e2c0237a6e575287f10c1a86f4353 log.14
c931703fc30e4b98c0352029dca44573 log.15
d92e2c0237a6e575287f10c1a86f4353 log.16
c931703fc30e4b98c0352029dca44573 log.17
d92e2c0237a6e575287f10c1a86f4353 log.18
d92e2c0237a6e575287f10c1a86f4353 log.2
c931703fc30e4b98c0352029dca44573 log.3
d92e2c0237a6e575287f10c1a86f4353 log.4
c931703fc30e4b98c0352029dca44573 log.5
d92e2c0237a6e575287f10c1a86f4353 log.6
c931703fc30e4b98c0352029dca44573 log.7
d92e2c0237a6e575287f10c1a86f4353 log.8
c931703fc30e4b98c0352029dca44573 log.9
$: ./x.ksh
$: ls -1 *log*
log.1
log.2
$: cat x.ksh
#!/bin/ksh
#
#ls -1 *log*
md5sum *log* | sort > tmp.00
md5sum *log* | awk '{ print $1 }' | sort | uniq > tmp.01
while read md5
do
grep "^${md5}" tmp.00 | awk '{ print $2 }' | sort | sort -n -t. -k2 | awk 'NR>1 { print }' | xargs rm
done < tmp.01
rm tmp.00
rm tmp.01
Is there any other way of checking for duplicate files? At the moment, I run pattern_match.ksh and then call x.ksh from there. My question is, is there a way to check for duplicate files 'immediately' instead of how am doing it at the moment running x.ksh?
For example, if I already have files log.1 to log.50 and they have different checksum meaning they are all different files, non-duplicated. Then the sed/pattern_match.ksh generates file log.51, I want to be able to check log.51 against log.1 to log.50 that it isn't a duplicate of any of them. Or is this already exactly what my x.ksh script is doing and am just over-complicating stuff :o I hope I am explaining this correctly.
Anyway, please advise. Thanks in advance.