Hi ,
I have a file ...it's like a xml file.
File 1:
<tag1> Value11</tag1><tag2>value12</tag2>
<tag1>Value21</tag1>
...... Continues
Now what I want as output is
Value11|value12
Value21||
......Continues
This is just learning purpose. I tried reading each line, then parsing it. It's working file. But looking for better suggestion :).
pamu
September 27, 2012, 9:42am
2
try this..
awk -F "[<>]" '{ for(i=3;i<=NF;i+=4){ printf $i"|"}}{print ""}' file
another one..
awk -F "[<>]" '{ for(i=3;i<=NF;i+=4){if(s){s=s"|"$i}else{s=$i}}}{print s;s=""}' file
It's giving
Value11|value12|
Value21|
not
Value11|value12
Value21||
In second column <tag2> is missing( or optional). So for that it should also print a '|'
pamu
September 27, 2012, 9:51am
4
something like this.?
awk -F "[<>]" '{ for(i=3;i<=NF;i+=4){if(s){s=s"|"$i}else{s=$i}}}{if(NF>5){print s;s=""}else{print s"||";s=""}}' file
1 Like
Like this?
perl -pe 'while(s:<.*?>(.*?)</.*?>:\1|:){next}' file
1 Like
Yes...
But can it be done without hardcoding the value?
what if I have more than 2 tags? The condition is 1st tag is always mandatory. Rest can be optional. So if a file has total 5 tags then total 4 '|' should be there in the output file.
To elaborate
Input: Number of tags here is 4
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1>Value21</tag1><tag3>value23</tag3>
Output:
Value11|value12|value13|value14
Value21||value23|
pamu
September 27, 2012, 10:03am
7
for more tags..
$ cat file
<tag1> Value11</tag1><tag2>value12</tag2>
<tag1>Value21</tag1>
<tag1> Value11</tag1><tag2>value12</tag2><tag1> Value11</tag1><tag2>value12</tag2>
$ awk -F "[<>]" '{ for(i=3;i<=NF;i+=4){if(s){s=s"|"$i}else{s=$i}}}{if(NF>5){print s;s=""}else{print s"||";s=""}}' file
Value11|value12
Value21||
Value11|value12| Value11|value12
let me know if you want add something else..
1 Like
My requirement is little bit different. . Sorry for asking so much.
if you see the output of mine
Output:
Value11|value12|value13|value14
Value21||value23|
All line have exactly same number of '|'.
pamu
September 27, 2012, 10:17am
9
try this...
awk -F "[<>]" '{ for(i=2;i<=NF;i+=4){a++;(gsub("[a-z]","",$i));if($i ~ a){if(s){s=s"|"$(i+1)}else{s=$(i+1)}}else{s=s"||"$(i+1)"|"}}}{print s;s="";a=0}' file
1 Like
Well .. I must say it's working good but only till certain extent ...
For example ...
For the file having
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1>Value21</tag1><tag3>value23</tag3>
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1> Value11</tag1><tag2>value12</tag2>
Output is coming as
Value11|value12|value13|value14
Value21||value23|
Value11|value12|value13|value14
Value11|value12|value13|value14
Value11|value12
So each line does not have same number of pipes :).
Try the following:
awk -F "(<[/]*tag|>)" '
{ for(i = 2; i <= NF; i += 4) {
o[NR,$i] = $(i + 1)
if($i > maxf) maxf = $i
}
}
END { for(i = 1; i <= NR; i++)
for(j = 1; j <= maxf; j++)
printf("%s%s", o[i,j], j == maxf ? "\n" : "|")
}' input
If the file input
contains:
<tag1> Value11</tag1><tag2>value12</tag2>
<tag10> Value10</tag10><tag2>value12</tag2>
<tag5>field5</tag5><tag4>field4</tag4><tag3>field3</tag3><tag2>field2</tag2><tag1>field1</tag1><tag7>field7, no 6</tag7><tag10>no 8 or 9; this is 10</tag10>
it will produce the following output:
Value11|value12||||||||
|value12|||||||| Value10
field1|field2|field3|field4|field5||field7, no 6|||no 8 or 9; this is 10
2 Likes
Awesome :).
Can you change it a little bit? We don't get the tags like tag1, tag2
So what we get it is
One parameter.
Exampls: All tags="Tag1,Tag2,tag3,.....,Tagn"
Now from it we will get the informations like
how many tags are there
What is the exact sequence of the tags. Sequence will never change in the file. So for above example tag1=3 will never come before Tag2.
pamu
September 28, 2012, 2:05am
13
just change this to Don's solution.....(awesome code by Don..:))
awk -F "(<[/]*Tag|>)" '
1 Like
cat file.txt
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1>Value21</tag1><tag3>value23</tag3>
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1> Value11</tag1><tag2>value12</tag2><tag3>value13</tag3><tag4>value14</tag4>
<tag1> Value11</tag1><tag2>value12</tag2>
tags='tag1,tag2,tag3,tag4'
awk -v tags="$tags" 'BEGIN{max=split(tags,t,",")}
{p=""
for(i=1;i<=max;i++)
{
patt="<" t ">[^<]*</" t ">"
if(match($0,patt))
{
v=substr($0,RSTART,RLENGTH)
gsub(/<[^>]+>/,"",v)
}
else
v=""
p=p v "|"
}
sub(/[|]$/,"",p)
print p
}' file.txt
Value11|value12|value13|value14
Value21||value23|
Value11|value12|value13|value14
Value11|value12|value13|value14
Value11|value12||
1 Like
Thank you Puma, Don, elixir_sinari for helping me out.