Dividing tab blocks in bash script

hayreter · January 31, 2013, 2:32pm

Hi everyone,
I have a data.xml file which only contains thousands of data (tag) blocks. A part of the file looks exactly like this;

<data>
Line
Line
Line
</data>
<data>
Line
Line
Line
</data>

the rest of the file is simply a repetition of this part. Here each data block contains a number of multiple data lines. I need to separate each data lines so that it will look like;

<data>
Line
</data>
<data>
Line
</data>
<data>
Line
</data>
<data>
Line
</data>
<data>
Line
</data>

I need to write a bash script which reads data.xml and then executes the necessary operations. I guess the script file should have a loop with a conditional statement (if...then...else).

Any help will be highly appreciated.
thanks a lot.

Yoda · January 31, 2013, 2:49pm

awk '/<data>/{f=1}/<\/data>/{f=0}f==1&&!/data/{printf "<data>\n%s\n</data>\n",$0;}' xml_filename

OR

awk '/<data>/{f=1;next;}/<\/data>/{f=0;next;}f==1{printf "<data>\n%s\n</data>\n",$0;}' xml_filename

hayreter · January 31, 2013, 3:10pm

thanks a lot bipinajith, both works flawlessly

hayreter · January 31, 2013, 3:14pm

Could you please explain step by step what you did there?

DGPickett · January 31, 2013, 3:15pm

You can do it in sed -- read in second and third lines with N, N and if
open data, line, line
then make it
open data, line, close data, open data, line
then P out and discard three lines, N another line and branch back to if.

Else P out and discard one line and back to second N and if.

Do a $q before each N.

Yoda · January 31, 2013, 3:23pm

awk '/<data>/{                          # If line contains pattern: <data>
  f=1;                                  # Set flag variable f = 1
  next;                                 # Skip current record
 } /<\/data>/ {                         # If line contains pattern: </data> - Since / is meta-character I escaped it \/
  f=0;                                  # Set flag variable f = 0
  next;                                 # Skip current record
 } f==1 {                               # If line flag variable is equal to 1 f == 1
  printf "<data>\n%s\n</data>\n",$0;    # Print <data> -- newline -- followed by current record : $0 -- newline -- followd by </data>
}' xml_filename                         # Read filename: xml_filename

DGPickett · January 31, 2013, 3:29pm

I'd think you would need to print '</data>\n<data>\n%s\n",$0 to close out the first and put the second in a new tag. OK, you are discarding the old tags. That works.

Jotne · February 1, 2013, 2:49am

Why do you use the next here?
Will this speed up the process some, since it does more than one thing in one run when finding the data tag?

RudiC · February 1, 2013, 3:02am

$ awk 'BEGIN {print "<data>"} /data/ {next} 1' ORS="\n</data>\n<data>\n" file | sed '$d'

or

$ awk '/data/ {next} 1' ORS="\n</data>\n<data>\n" file | sed '1 s/^/<data>\n/; $d'