Finding strings through multiple lines

Hi,
I need to search for a multiple line pattern and remove it

the pattern is search for

(ln number) <TABLE name=*>

and if 3 lines below that the line is

(ln number) </TABLE>

Then remove those 4 lines.

Thank you

Try:

perl -0pe 's/\(ln number\) <TABLE name=\*>.*?\(ln number\) <\/TABLE>//s' file

Sorry that didn't work I will try and be more clearer.

the (ln number) bit is the line number no brackets. There are also bits in the text which has <TABLE name=...> but when it isn't followed 3 lines later by </TABLE> I want to keep it i only want the lines deleted which follow this exact pattern

number <TABLE name=*>
number <ROW>
number </ROW>
number </TABLE>

Thank you for all the help :slight_smile:

Can you provide exact sample data?

perl -0pe 's/\d+ <TABLE name=\*>.*?\d+ <\/TABLE>//s' file

want to remove

112  <TABLE name="something>
113  <ROW>
123  </ROW>
124  </TABLE>
125  <TABLE name="somethingelse>
126  <ROW>
129  </ROW>
130  </TABLE>

want to keep

493  <TABLE name="Headers_11" number="15">
494  <ROW>
495  <dest_semi_11><![CDATA[destination]]></dest_semi_11>
496  <num_semi_11><![CDATA[number sent]]></num_semi_11>
497  <cost_semi_11><![CDATA[cost]]></cost_semi_11>
498  </ROW>
499  </TABLE>
perl -0pe 's/\d+\s+<TABLE name=[^>]*>\n\d+\s+<ROW>\n\d+\s+<\/ROW>\n\d+\s+<\/TABLE>\n//g' file

Unfortunately that isn't working :frowning:
I tried cutting out the line numbers and then doing it but it still doesn't remove anything from the script

It is working for the data that you provided:

[root@rhel2 ~]# cat file
112  <TABLE name="something>
113  <ROW>
123  </ROW>
124  </TABLE>
125  <TABLE name="somethingelse>
126  <ROW>
129  </ROW>
130  </TABLE>
493  <TABLE name="Headers_11" number="15">
494  <ROW>
495  <dest_semi_11><![CDATA[destination]]></dest_semi_11>
496  <num_semi_11><![CDATA[number sent]]></num_semi_11>
497  <cost_semi_11><![CDATA[cost]]></cost_semi_11>
498  </ROW>
499  </TABLE>
[root@rhel2 ~]# perl -0pe 's/\d+\s+<TABLE name=[^>]*>\n\d+\s+<ROW>\n\d+\s+<\/ROW>\n\d+\s+<\/TABLE>\n//g' file
493  <TABLE name="Headers_11" number="15">
494  <ROW>
495  <dest_semi_11><![CDATA[destination]]></dest_semi_11>
496  <num_semi_11><![CDATA[number sent]]></num_semi_11>
497  <cost_semi_11><![CDATA[cost]]></cost_semi_11>
498  </ROW>
499  </TABLE>

Can you post output of:

cat -Te sample_file

I have just created a file with just that sample in it and the output I got was correct like you said it
however, when I use it on the larger file it doesn't seem to work :s
the line numbers go into the thousands I dont know if that makes a difference?

---------- Post updated at 01:02 PM ---------- Previous update was at 11:49 AM ----------

maybe if I give a better example with the full range of code it would be easier?

  2823  <TABLE name="TotalEventSemiSum_13>
  2824  </TABLE>
  2827  <TABLE name="TotalEventSemiSum_22>
  2828  </TABLE>
  2831  <TABLE name="TotalEventSemiSum_57>
  2832  <ROW>
  2837  </ROW>
  2838  </TABLE>
  2841  <TABLE name="TotalEventSemiSum_58>
  2842  <ROW>
  2843  <EventSemiSumTotal_58><![CDATA[�20.40]]></EventSemiSumTotal_58>
  2844  <EventSemiSumDescription_58><![CDATA[Roaming text messages non-EU]]></EventSemiSumDescription_58>
  2845  <EventSemiSumTotalSent_58><![CDATA[51]]></EventSemiSumTotalSent_58>
  2846  <source_58><![CDATA[07775884968]]></source_58>
  2847  </ROW>
  2848  </TABLE>
  2851  <TABLE name="TotalEventSemiSum_16>
  2852  <ROW>
  2857  </ROW>
  2858  </TABLE>

output needed

  2841  <TABLE name="TotalEventSemiSum_58>
  2842  <ROW>
  2843  <EventSemiSumTotal_58><![CDATA[�20.40]]></EventSemiSumTotal_58>
  2844  <EventSemiSumDescription_58><![CDATA[Roaming text messages non-EU]]></EventSemiSumDescription_58>
  2845  <EventSemiSumTotalSent_58><![CDATA[51]]></EventSemiSumTotalSent_58>
  2846  <source_58><![CDATA[07775884968]]></source_58>
  2847  </ROW>
  2848  </TABLE>

does that help?
Thank you for your continued support you are all very helpful on here :slight_smile:

Try:

perl -0pe 's/\s+\d+\s+<TABLE name=[^>]*>\n(\s+\d+\s+<ROW>\n\s+\d+\s+<\/ROW>\n)?\s+\d+\s+<\/TABLE>//g' file

I hope you can see now that simplifying sample data not always help getting the solution :wink:

Many thanks for all your help!

---------- Post updated at 03:12 PM ---------- Previous update was at 02:57 PM ----------

Could you let me know just one more thing, which part of that code indicates the fact that the page number precedes the text? I need to change it so the works when there is no line number and the first letter of each line is then
example

<TABLE name="something>
<ROW> 
</ROW> 
</TABLE> 
<TABLE name="somethingelse> 
<ROW> 
</ROW> 
</TABLE> 
<TABLE name="Headers_11" number="15"> 
<ROW> 
<dest_semi_11><![CDATA[destination]]></dest_semi_11> 
<num_semi_11><![CDATA[number sent]]></num_semi_11> 
<cost_semi_11><![CDATA[cost]]></cost_semi_11> 
</ROW> 
</TABLE>

Thanks

Try:

perl -0pe 's/\s*(\d+\s+)?<TABLE name=[^>]*>\n(\s*(\d+\s+)?<ROW>\n\s*(\d+\s+)?<\/ROW>\n)?\s*(\d+\s+)?<\/TABLE>//g' file

that didn't work sorry its needs to do the exact same thing as before but now the file no long has line numbers on it

P.S I mean that the input file has no line numbers on it, not that I want them removing when it goes to output

[root@rhel2 ~]# cat file
<TABLE name="TotalEventSemiSum_13>
</TABLE>
<TABLE name="TotalEventSemiSum_22>
</TABLE>
<TABLE name="TotalEventSemiSum_57>
<ROW>
</ROW>
</TABLE>
<TABLE name="TotalEventSemiSum_58>
<ROW>
<EventSemiSumTotal_58><![CDATA[�20.40]]></EventSemiSumTotal_58>
<EventSemiSumDescription_58><![CDATA[Roaming text messages non-EU]]></EventSemiSumDescription_58>
<EventSemiSumTotalSent_58><![CDATA[51]]></EventSemiSumTotalSent_58>
<source_58><![CDATA[07775884968]]></source_58>
</ROW>
</TABLE>
<TABLE name="TotalEventSemiSum_16>
<ROW>
</ROW>
</TABLE>
[root@rhel2 ~]# perl -0pe 's/\s*(\d+\s+)?<TABLE name=[^>]*>\n(\s*(\d+\s+)?<ROW>\n\s*(\d+\s+)?<\/ROW>\n)?\s*(\d+\s+)?<\/TABLE>//g' file

<TABLE name="TotalEventSemiSum_58>
<ROW>
<EventSemiSumTotal_58><![CDATA[�20.40]]></EventSemiSumTotal_58>
<EventSemiSumDescription_58><![CDATA[Roaming text messages non-EU]]></EventSemiSumDescription_58>
<EventSemiSumTotalSent_58><![CDATA[51]]></EventSemiSumTotalSent_58>
<source_58><![CDATA[07775884968]]></source_58>
</ROW>
</TABLE>

The problems I were having were that the at the end of each line there was a ^M which I hadn't noticed until I opened the file by chance in vi editor. Have now solved my problems.
Thanks alot for your continued patience with me :slight_smile: