Print Value between desired html tag

Hi,

I have a html line as below :-

<dataFilter><filterName>Customer.PromotionsProfile</filterName></dataFilter><dataFilter><filterName>Customer.Messages</filterName></dataFilter><dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters><validateCustomerCompleteness <sessionToken>           <tokenValue>kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB</tokenValue></sessionToken>< dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters>

From the above ,i would like to have only the text between tokenValue tags.

Expected o/p :- kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB

P.s :- The above tag's are all considered as a single line in my text file so grep is giving the entire line again need some other alternative solutions.

Thanks in advance,
Satish.

Try:

perl -nle '/<tokenValue>(.*)<\/tokenValue>/ && print $1' file

If there can be multiple tags per line, try:

awk '$1=="tokenValue"{print $2}' RS=\< FS=\> file

An additional advantage is that if it is a very long line, the application's (in this case awk) maximum record length will not likely become exceeded..

1 Like

If your grep supports -o you could do:

$ grep -Eo "<tokenValue>.*</tokenValue>" file
<tokenValue>kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB</tokenValue>

(although that still leaves you with the tags)

Or use sed:

$ sed 's#.*<tokenValue>\(.*\)</tokenValue>.*#\1#g' file
kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB

or awk:

$ awk '/tokenValue/ {print $2}' FS=">" RS="<|</" file
kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB

(all cygwin/GNU)

The "g"-flag has no meaning here, because this will only find one match per line, either the value (if there is only one tag), or the value and anything up until the last "tokenValue" tag on a line.

This one is not as precise because it will match any string containing "tokenValue" (for example "tokenValue2") in both tag and value...

1 Like

True, it's not correct.

Possibly of interest though - what you actually get is the last tokenValue value, not the whole intervening text as you might think (and as with the perl solution) - the leading .* greedily eats everything else (at least with my sed).

$ cat file
<dataFilter><filterName>Customer.PromotionsProfile</filterName></dataFilter><dataFilter><filterName>Customer.Messages</filterName></dataFilter><dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters><validateCustomerCompleteness  <sessionToken>            <tokenValue>1AkfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB</tokenValue></sessionToken><   dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters><dataFilter><filterName>Customer.PromotionsProfile</filterName></dataFilter><dataFilter><filterName>Customer.Messages</filterName></dataFilter><dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters><validateCustomerCompleteness  <sessionToken>            <tokenValue>1BkfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB</tokenValue></sessionToken><   dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters>
<dataFilter><filterName>Customer.PromotionsProfile</filterName></dataFilter><dataFilter><filterName>Customer.Messages</filterName></dataFilter><dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters><validateCustomerCompleteness  <sessionToken>            <tokenValue>2kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB</tokenValue></sessionToken><   dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters>
 $ awk '$1=="tokenValue"{print $2}' RS='<' FS='>' file
1AkfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB
1BkfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB
2kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB
$ perl -nle '/<tokenValue>(.*)<\/tokenValue>/ && print $1' file
1AkfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB</tokenValue></sessionToken>< dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters><dataFilter><filterName>Customer.PromotionsProfile</filterName></dataFilter><dataFilter><filterName>Customer.Messages</filterName></dataFilter><dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters><validateCustomerCompleteness <sessionToken>           <tokenValue>1BkfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB
2kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB
$ sed 's#.*<tokenValue>\(.*\)</tokenValue>.*#\1#g' file
1BkfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB
2kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB
$ sed --version | head -1
sed (GNU sed) 4.2.2
1 Like

Try

$ cat file
<dataFilter><filterName>Customer.PromotionsProfile</filterName></dataFilter><dataFilter><filterName>Customer.Messages</filterName></dataFilter><dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters><validateCustomerCompleteness <sessionToken>           <tokenValue>kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB</tokenValue></sessionToken>< dataFilter><filterName>Customer.PaymentProfileDetail</filterName></dataFilter></dataFilters>
$ awk '{gsub(".*<tokenValue>|</tokenValue>.*",x)}1' file
kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB

OR

$ grep -Po '(?<=<tokenValue>).*(?=</tokenValue>)' file
kfuYW9mcmjkasfkjsasvIR/hm/bb945chszG8zSIC89DBq9Q7NiB

@ CarloM mine as well as yours grep fails if there is multiple entry