Copy and paste text inside a xml file

I have a really big XML file. I need copy the value of one tag inside another one tag. I try to publish one example.

  <channel update="i" site="merge-xmltv" site_id="" xmltv_id="Rai 1">Rai 1</channel>
  <channel update="i" site="merge-xmltv" site_id="" xmltv_id="Rai 1 +2HD">Rai 1 +2HD</channel>
  <channel update="i" site="merge-xmltv" site_id="" xmltv_id="Rai 1 +1HD">Rai 1 +1HD</channel>
  <channel update="i" site="merge-xmltv" site_id="" xmltv_id="Rai 2">Rai 2</channel>
  <channel update="i" site="merge-xmltv" site_id="" xmltv_id="Rai 2 +2HD">Rai 2 +2HD</channel>
  <channel update="i" site="merge-xmltv" site_id="" xmltv_id="Rai 2 +1HD">Rai 2 +1HD</channel>

I need this file will be:

  <channel update="i" site="merge-xmltv" site_id="Rai 1" xmltv_id="Rai 1">Rai 1</channel>
  <channel update="i" site="merge-xmltv" site_id="Rai 1 +2HD" xmltv_id="Rai 1 +2HD">Rai 1 +2HD</channel>
  <channel update="i" site="merge-xmltv" site_id="Rai 1 +1HD" xmltv_id="Rai 1 +1HD">Rai 1 +1HD</channel>
  <channel update="i" site="merge-xmltv" site_id="Rai 2" xmltv_id="Rai 2">Rai 2</channel>
  <channel update="i" site="merge-xmltv" site_id="Rai 2 +2HD" xmltv_id="Rai 2 +2HD">Rai 2 +2HD</channel>
  <channel update="i" site="merge-xmltv" site_id="Rai 2 +1HD" xmltv_id="Rai 2 +1HD">Rai 2 +1HD</channel>

I tried to use the command sed

sed -i 's/\(.*xmltv_id="\)\(.*\)\(">.*site_id="\)\(**\)\(" xmltv_id.*\)/\1\2\3\2\5/' WebGrab++.config.xml

but something is wrong...

I am using cygwin and perl is not really supported, thank you if someone can help me!

YMMV:

awk -F'"' '{$(NF-3)=$(NF-1)}1' OFS='"' myFile
1 Like

Can you explain me the command? I recive one error:

awk: cmd. line:1: (FILENAME=web.xml FNR=2) fatal: attempt to access field -2

I can't understand the reason, I never used awk :(, if you explain me I think I can fix :slight_smile: :slight_smile:

Assuming that you get the above diagnostic when running the script vgersh99 suggested:

awk -F'"' '{$(NF-3)=$(NF-1)}1' OFS='"' myFile

it is telling us that line number 2 in your input file (the one named web.xml ) does not contain any double-quote ( " ) characters. And, since the sample input that you showed us had four pairs of double-quotes on every input line, it wasn't prepared to handle input in a different format.

If you can't be bothered to accurately describe the format of the input data your script will be processing, we have to assume that you will be able to modify any suggestions provided to weed out (or otherwise process) lines in your input file(s) that do not match the format of the data you said you wanted to process.

1 Like

Mmmm...
After your explanation is really more clear. But push me to ask again.
I am asking to use one command like sed or awk because before I used a different command and I am sure was right, the command was:

xml ed --inplace -u "//channel/@site_id" -x "string(../@xmltv_id)" WebGrab++.config.xml

Using it, I had a similar error like now.
The error is:

WebGrab++.config.xml:2.2: Extra content at the end of the document

So I am thinking the errors I am having are not about the commds I am giving (awk or xmlstarlet) but are about the commands I gave before. The file is really big and the original format is like this:

<?xml version="1.0"?>
<settings>
.
.
.
.
<!--line 136 -->
<!--01-->
    <channel update="i" site="merge-xmltv-utc" site_id="" xmltv_id="Rai 1">Rai 1</channel>          <!-- Rai 1 -->
    <channel offset="2" same_as="Rai 1" xmltv_id="Rai 1 +2HD">Rai 1 +2HD</channel>          <!-- Rai 1 +2 HD -->
    <channel offset="1" same_as="Rai 1" xmltv_id="Rai 1 +1HD">Rai 1 +1HD</channel>          <!-- Rai 1 +1 HD-->
.
.
.
</settings>

If I apply both command (xmlstarlet or awk) on the original file they are working, but is not what I want. To fix the document and have the document like I published at the first post I need use some commands before, the commands are:

# Delete everything between "<channel" and "xmltv_id"
sed -ri 's/(    <channel )(.*)(xmltv_id)/\1\3/g' WebGrab++.config.xml
# Add after "<channel" the fields "<channel update="i" site="merge-xmltv" site_id="" "
sed -i 's/    <channel /    <channel update="i" site="merge-xmltv" site_id="" /g'  WebGrab++.config.xml
# Delete lines from 1 to 136
sed -i '1,136d' WebGrab++.config.xml
# Delete last line
sed -i '$d' WebGrab++.config.xml
# Delete all lines contain comments
sed -i '/^</d' WebGrab++.config.xml
# Delete all empty lines
sed -i '/^ *$/d' WebGrab++.config.xml
# Delete all comments till the end of the line
sed -i 's/<!--.*//' WebGrab++.config.xml
# Delete all blank space or tab at the end of the line
sed -i 's/[[:blank:]]*$//' WebGrab++.config.xml

Afer I applied all this procedure, the file look like damage, like sed did something wrong after the second line.

Do you have any idea what's happen?

It is not much clearer to me WHAT you really need in the end, but Don Cragun's analysis proves right when seeing your second sample input file.
You might want to safeguard your script by adding some tests for applicability (or justification) of the modifications:

awk -F'"' '/channel update/ && !$6 {$(NF-3)=$(NF-1)}1' OFS='"' file

Adding some or all of your "damage producers" is not a problem once it is known which should be included

1 Like

When I back home, I will attach the file to be more clear.

---------- Post updated 01-29-17 at 04:12 AM ---------- Previous update was 01-28-17 at 08:26 AM ----------

I found the solution. The command

sed -ri 's/(    <channel )(.*)(xmltv_id)/\1\3/g' WebGrab++.config.xml

damage the file and after awk can't replace the fields. Using the command:

sed -i -e 's/\(    <channel \).*\(xmltv_id\)/\1\2/ 'WebGrab++.config.xml

everything works fine!