Dividing tab blocks in bash script

Hi everyone,
I have a data.xml file which only contains thousands of data (tag) blocks. A part of the file looks exactly like this;

<data>
Line
Line
Line
</data>
<data>
Line
Line
Line
</data>

the rest of the file is simply a repetition of this part. Here each data block contains a number of multiple data lines. I need to separate each data lines so that it will look like;

<data>
Line
</data>
<data>
Line
</data>
<data>
Line
</data>
<data>
Line
</data>
<data>
Line
</data>

I need to write a bash script which reads data.xml and then executes the necessary operations. I guess the script file should have a loop with a conditional statement (if...then...else).

Any help will be highly appreciated.
thanks a lot.

awk '/<data>/{f=1}/<\/data>/{f=0}f==1&&!/data/{printf "<data>\n%s\n</data>\n",$0;}' xml_filename

OR

awk '/<data>/{f=1;next;}/<\/data>/{f=0;next;}f==1{printf "<data>\n%s\n</data>\n",$0;}' xml_filename
1 Like

thanks a lot bipinajith, both works flawlessly :slight_smile:

Could you please explain step by step what you did there?

You can do it in sed -- read in second and third lines with N, N and if
open data, line, line
then make it
open data, line, close data, open data, line
then P out and discard three lines, N another line and branch back to if.

Else P out and discard one line and back to second N and if.

Do a $q before each N.

awk '/<data>/{                          # If line contains pattern: <data>
  f=1;                                  # Set flag variable f = 1
  next;                                 # Skip current record
 } /<\/data>/ {                         # If line contains pattern: </data> - Since / is meta-character I escaped it \/
  f=0;                                  # Set flag variable f = 0
  next;                                 # Skip current record
 } f==1 {                               # If line flag variable is equal to 1 f == 1
  printf "<data>\n%s\n</data>\n",$0;    # Print <data> -- newline -- followed by current record : $0 -- newline -- followd by </data>
}' xml_filename                         # Read filename: xml_filename

I'd think you would need to print '</data>\n<data>\n%s\n",$0 to close out the first and put the second in a new tag. OK, you are discarding the old tags. That works.

Why do you use the next here?
Will this speed up the process some, since it does more than one thing in one run when finding the data tag?

$ awk 'BEGIN {print "<data>"} /data/ {next} 1' ORS="\n</data>\n<data>\n" file | sed '$d'

or

$ awk '/data/ {next} 1' ORS="\n</data>\n<data>\n" file | sed '1 s/^/<data>\n/; $d'