My issue is that because I'm looking back 10 lines it's pulling in more data than I want. The 10 lines is including lines with the word policy for other policies where I'm only interested in the first occurrence of policy in the reverse search.
So for example my string 9005 is located in 2 different parts of the file and the first occurrence works fine (because there's no line in the preceding 10 containing policy) but the second occurrence is pulling in 2 other lines other than the one I want.
I'm wondering how do I break out of the search when the first occurrence of policy is reached for each 9005 or alternatively instead of searching back 10 lines search back to the word policy for each 9005 ?
Providing representative samples of your data and expected output invites fast and accurate responses. Otherwise the answers either will use no data, or will use individual and / or eccentric datasets. Here is how I interpreted your question.
Nonstandard utility glark has options for this kind of task:
producing sets of lines for 2009 and policy within 2 lines of one another:
% ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0.8 (lenny)
bash GNU bash 3.2.39
glark version 1.8.0
-----
Input data file data1:
apple
policy
banana
cherry
date
2009
fig
grape
kiwi
lemon
mango
nectarine
policy
orange
2009
peach
rhubarb
-----
Results:
13 policy
14 orange
15 2009
The glark command was in the Debian repository. Otherwise, see the web page noted in the script for more examples and downloads. The glark code is in ruby, so that needs to be available.
Unfortunately we don't have ruby/glark/tac etc installed which is why I was looking towards an sed/awk solution.
Sorry if the question was more confusing than necessary.
Basically here's some sample data in a file :
CS02010002 Policy 9999998599
CS13000008 Tax processing was done for 17/03/2012.
CS95869005 No BC record found. Please review urgently
CS02010002 Policy 9999998599
SS00200001 Change of adress processed
CS13000008 Tax processing was done for 18/03/2012.
CS02010002 Policy 9999999609
CS02010002 Policy 9999999619
CS02010002 Policy 9999999629
CS43500005 Payout Number A0002 is being processed now.
CS43500005 Payout Number A0003 is being processed now.
CS02010002 Policy 9999999639
CS43500005 Payout Number A0001 is being processed now.
CS02010002 Policy 9999999759
CS02010002 Policy 9999999899
CS43500005 Payout Number A0003 is being processed now.
CS13000008 Tax processing was done for 17/03/2012.
CS95869005 No BC record found. Please review urgently
The output I'm looking for is
9999998599
9999999899
corresponding to the previous policy reference before the "CS95869005 No BC record found. Please review urgently" line
but when I run the command
sed s/^M//g test | nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print ;c=a}b{r[NR%b]=$0}' b=10 a=0 s="9005"|grep "Policy "|sort -u |awk '{print $3}'
I understand why I'm getting the extra policy numbers (due to the b=10) but I can't shorten the gap as I don't know how many lines will be between the message "CS95869005 No BC record found. Please review urgently" and the previous "CS02010002 Policy " message.
This is why I was looking for a stop at the first occurrence in the backwards search or something to that effect.
I'm still not confident that I understand your question, so perhaps these meta-answers may help.
There was a suggested solution from balajesuri in perl that you may have missed. I don't know about the definite timing of the reading-backwards module, and you seem to not be able to install items, but I think that module might be standard. (The sample code I tried in 2010 was almost instantaneous, but that was with a very short file.) If you do not have it, there are other solutions.
Assuming that the suggestion from birei is correct, you could try various means for reversing a file.
You may not have tac or rev, but there are versions available in perl from one of the CPAN projects.
So you could try using any of those: PPT: tac
( link to rev removed )
Other approaches to producing a reverse copy of a file are, one in sed:
sed -n '1{
h
}
1 !{
x
H
}
${
x
H
p
}' inputfile
and you could also add line numbers (cat -n), sort in reverse, and remove the line numbers (cut) to get a reverse copy of a file.
A completely different approach, but unsuitable for very large files because it does an extra pass of the file for each error found.
Works by numbering the lines in the input stream, finding each occurance of "No BC record found" and then scanning the ten lines above that record for the last occurrence of a record containing "Policy".
cat -n filename.txt | grep "No BC record found"|awk '{print $1}' | while read E1
do
# Line ten lines above "No BC record found"
E2=$((${E1} - 10))
if [ ${E2} -le 0 ]
then
E2=1
fi
# Line number one line above "No BC record found"
E3=$((E1 -1))
# Search 10 line block to just above "No BC record found"
sed -n "${E2},${E3}p;${E3}q" filename.txt | \
grep "Policy" | tail -1 | awk '{print $3}'
done
./scriptname
9999998599
9999999899
... and I know that it has a "cat" command in it !