Replacing tags

Hi ,

I have a file ...it's like a xml file.

File 1:

<tag1> Value11</tag1><tag2>value12</tag2>
<tag1>Value21</tag1>
...... Continues

Now what I want as output is

Value11|value12
Value21||
......Continues

This is just learning purpose. I tried reading each line, then parsing it. It's working file. But looking for better suggestion :).

try this..

awk -F "[<>]" '{ for(i=3;i<=NF;i+=4){ printf $i"|"}}{print ""}' file

another one..

awk -F "[<>]" '{ for(i=3;i<=NF;i+=4){if(s){s=s"|"$i}else{s=$i}}}{print s;s=""}' file

It's giving

Value11|value12|
Value21|

not

Value11|value12
Value21||

In second column <tag2> is missing( or optional). So for that it should also print a '|'

something like this.?

 awk -F "[<>]" '{ for(i=3;i<=NF;i+=4){if(s){s=s"|"$i}else{s=$i}}}{if(NF>5){print s;s=""}else{print s"||";s=""}}' file
1 Like

Like this?

perl -pe 'while(s:<.*?>(.*?)</.*?>:\1|:){next}' file
1 Like

Yes...

But can it be done without hardcoding the value?

what if I have more than 2 tags? The condition is 1st tag is always mandatory. Rest can be optional. So if a file has total 5 tags then total 4 '|' should be there in the output file.

To elaborate
Input: Number of tags here is 4
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1>Value21</tag1><tag3>value23</tag3>

Output:
Value11|value12|value13|value14
Value21||value23|

for more tags..

$ cat file
<tag1> Value11</tag1><tag2>value12</tag2>
<tag1>Value21</tag1>
<tag1> Value11</tag1><tag2>value12</tag2><tag1> Value11</tag1><tag2>value12</tag2>

$ awk -F "[<>]" '{ for(i=3;i<=NF;i+=4){if(s){s=s"|"$i}else{s=$i}}}{if(NF>5){print s;s=""}else{print s"||";s=""}}' file

 Value11|value12
Value21||
 Value11|value12| Value11|value12

let me know if you want add something else..

1 Like

My requirement is little bit different. :slight_smile: . Sorry for asking so much.

if you see the output of mine

Output:
Value11|value12|value13|value14
Value21||value23|

All line have exactly same number of '|'.

try this...

awk -F "[<>]" '{ for(i=2;i<=NF;i+=4){a++;(gsub("[a-z]","",$i));if($i ~ a){if(s){s=s"|"$(i+1)}else{s=$(i+1)}}else{s=s"||"$(i+1)"|"}}}{print s;s="";a=0}' file
1 Like

Well .. I must say it's working good but only till certain extent ...
For example ...

For the file having
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1>Value21</tag1><tag3>value23</tag3>
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1> Value11</tag1><tag2>value12</tag2>

Output is coming as

Value11|value12|value13|value14
Value21||value23|
Value11|value12|value13|value14
Value11|value12|value13|value14
Value11|value12

So each line does not have same number of pipes :).

Try the following:

awk -F "(<[/]*tag|>)" '
{       for(i = 2; i <= NF; i += 4) {
                o[NR,$i] = $(i + 1)
                if($i > maxf) maxf = $i
        }
}
END {   for(i = 1; i <= NR; i++)
                for(j = 1; j <= maxf; j++)
                        printf("%s%s", o[i,j], j == maxf ? "\n" : "|")
}' input

If the file input contains:

<tag1> Value11</tag1><tag2>value12</tag2>
<tag10> Value10</tag10><tag2>value12</tag2>
<tag5>field5</tag5><tag4>field4</tag4><tag3>field3</tag3><tag2>field2</tag2><tag1>field1</tag1><tag7>field7, no 6</tag7><tag10>no 8 or 9; this is 10</tag10>

it will produce the following output:

 Value11|value12||||||||
|value12|||||||| Value10
field1|field2|field3|field4|field5||field7, no 6|||no 8 or 9; this is 10
2 Likes

Awesome :).

Can you change it a little bit? We don't get the tags like tag1, tag2 :frowning:

So what we get it is

  1. One parameter.
    Exampls: All tags="Tag1,Tag2,tag3,.....,Tagn"

Now from it we will get the informations like

  1. how many tags are there
  2. What is the exact sequence of the tags. Sequence will never change in the file. So for above example tag1=3 will never come before Tag2.

just change this to Don's solution.....(awesome code by Don..:))

awk -F "(<[/]*Tag|>)" '
1 Like
cat file.txt

<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1>Value21</tag1><tag3>value23</tag3>
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1> Value11</tag1><tag2>value12</tag2>


tags='tag1,tag2,tag3,tag4'

awk -v tags="$tags" 'BEGIN{max=split(tags,t,",")}
{p=""
for(i=1;i<=max;i++)
{
 patt="<" t ">[^<]*</" t ">"
 if(match($0,patt))
 {
  v=substr($0,RSTART,RLENGTH)
  gsub(/<[^>]+>/,"",v)
 }
 else
  v=""
 p=p v "|"
}
sub(/[|]$/,"",p)
print p
}' file.txt

 Value11|value12|value13|value14
Value21||value23|
 Value11|value12|value13|value14
 Value11|value12|value13|value14
 Value11|value12||
1 Like

Thank you Puma, Don, elixir_sinari for helping me out. :b: