=245 this is testing
=035 abc123
=245 this is testing1
=035 abc124
=245 this is testing2
=035 abc125
=035 abc126
=245 this is testing3
here i have to pull out those lines having two =035 instead of alternative 035 and 245 i.e extract abc125 and abc126. any command or script for this . please help
So if two lines begin the same then pull out those values? Or if two lines are exactly =035, then extract the rest of the line? Or if any number of consecutive lines are =035 then print out them all? Give this a go:
I think little amendment is required, dono if possible
=245 this is testing
=035 abc123
=245 this is testing1
=035 abc124
=245 this is testing2
=035 abc125
=035 abc126
=245 this is testing3
=035 abc127
=035 abc128
=035 abc129
=245 this is testing 4
Here it is extracting abc125,abc126 ,abc127,abc128,abc129 but it should not extract abc126 and abc129 because it is followed by =245 line.
Thanks for the reply. Ya you are wright like , it should check the following line is also =035 if so print the second column value, if followed by =245 should skip it.Please guide
Hi,Drewk thanks a lot , it works perfect, can u also help me to understand the codelogic.
Hi Dennis ,thanks when i run your code, its more or less correct but extracting few others which is followed by =245 , this is happening especially in the starting and in the end.
1) The invocation of perl with -0777 means slurp the whole file. This means the entire file will be in memory since you are referring to multiple lines. You could write something that will read multiple lines, but that is more complex logic. Perl can handle very big files this way, but nonetheless, it may fail with really huge files...
2) Note the Regex of "/=035\s+(.*)\n(?!=245)/g" used in the while loop. Here are the details:
"=035\\s\+" matches the =035 then any number of non CR whitespace until anything that is not whitespace;
"\(.*\)" captures the remainder of the line, up to the \\n;
"\\n" matches the end of line;
"\(?!=245\)" is a 'zero-width negative lookahead assertion'. In plain English, that means 'don't match the last bit if the next bit is true;'
"g" means all of these patterns.
On the last post, I did it quickly which usually means more sloppy. The last one first printed the input, then deleted the pattern matching a line with =035.* followed by a line with =254.* -- then print the remaining =035 lines. I did it stepwise instead of one sweep...
I cannot overemphasize how easy this becomes if you use a regex tool.
Either one, you can just play with patterns on your sample text until it does what you expect. There >>can<< be some bugs, such as gskinner does not handle the negative lookahead or lookbehind assertions properly, but it sure beats scratching your head...
set -A array $(</tmp/inputfile) #reading file in array
c= ${#array
[*]} #the no. of elements in the array
i=1
while [ $i -lt $c ]
do
j=`expr $i +1`
v1=`echo ${array[$i]}|cut -d" " -f1`
v2=`echo ${array[$j]}|cut -d" " -f1`
if [ $v1 = $v2 ]
then
array[$i]=`echo F.${array[$i]}`
fi
i=`expr $i +1`
done
i=1
while [ $i -le $c ]
do
x=`echo ${array[$i]|cut -c1`
if [ $x -ne "F" ]
then
`echo ${array[$i]}>>newfile`
fi
done
`cat newfile`
see if the code given above works.
here i am trying to store each lines of the inputfile in an array.
c is the number of array elements. (or the numbert of lines in the input file.)
i am checkimg of the two consecutive fileds have same value. if true i am setting F as the 1st character in the 1st of the two lines. then store the same in the array.
later in another loop i am checking if the 1st character of any line is not F (this marks that this line has not been repeated). if true i am writing that to newfile.
Agreed. I think everyone should use these types regex tools. I do and they save a lot of time. I also use "fat client" version on my XP machine. It is really good and stores prior patterns, explains things, etc.