Creating new xml files with labels from existing xml

Hi, I want to create a new file for every actor in a existing xml file.
The code for the actual file is from a .nfo file of a movie.
I need a file named after every actor/name with all the info in the original file.
The file would be something like:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!--created on 2024-02-05 20:45:02 - tinyMediaManager 4.3.14-->
<movie>
  <title>Exploradores</title>
  <originaltitle>Explorers</originaltitle>
  <sorttitle/>
  <epbookmark/>
  <year>1985</year>
  <ratings>
    <rating default="false" max="10" name="themoviedb">
      <value>6.2</value>
      <votes>426</votes>
    </rating>
    <rating default="false" max="100" name="tomatometerallcritics">
      <value>74.0</value>
      <votes>0</votes>
    </rating>
    <rating default="false" max="10" name="tomatometeravgcritics">
      <value>5.9</value>
      <votes>0</votes>
    </rating>
  </ratings>
  <userrating>0.0</userrating>
  <top250>0</top250>
  <set/>
  <plot>Ben Crandall es un niño consumidor compulsivo de películas de monstruos de serie B y cómics de ciencia-ficción que vive obsesionado con los viajes espaciales y la existencia de extraterrestres. Una noche, vislumbra en sueños los planos de un circuito electrónico y decide dibujarlo para construirlo junto a sus amigos Wolfgang y Darren. El primero es un pequeño genio capaz de traducir el sueño de Ben en un complejo programa de ordenador que funciona de verdad. El paso siguiente es crear una nave espacial casera. Los tres amigos están listos para embarcarse en una aventura inimaginable en la que viajarán hasta otra galaxia, donde tendrán contacto con una cultura alienígena que quizás no termine de ser exactamente como Ben había soñado y cuyo principal representante sólo conoce la Tierra a través de lo representado en series de televisión de los 60.</plot>
  <outline>Ben Crandall es un niño consumidor compulsivo de películas de monstruos de serie B y cómics de ciencia-ficción que vive obsesionado con los viajes espaciales y la existencia de extraterrestres. Una noche, vislumbra en sueños los planos de un circuito electrónico y decide dibujarlo para construirlo junto a sus amigos Wolfgang y Darren. El primero es un pequeño genio capaz de traducir el sueño de Ben en un complejo programa de ordenador que funciona de verdad. El paso siguiente es crear una nave espacial casera. Los tres amigos están listos para embarcarse en una aventura inimaginable en la que viajarán hasta otra galaxia, donde tendrán contacto con una cultura alienígena que quizás no termine de ser exactamente como Ben había soñado y cuyo principal representante sólo conoce la Tierra a través de lo representado en series de televisión de los 60.</outline>
  <tagline>La aventura comienza en tu patio trasero.</tagline>
  <runtime>102</runtime>
  <thumb aspect="poster">https://image.tmdb.org/t/p/original/f7aN2yei6bjZAlfvk5uurixtZuv.jpg</thumb>
  <fanart>
    <thumb>https://image.tmdb.org/t/p/original/1XDJzR6EVXQoVEKjnVjCUe9SZQ0.jpg</thumb>
    <thumb>https://image.tmdb.org/t/p/original/qOuiakCdH3pULYUdxq0k2cOwhbl.jpg</thumb>
    <thumb>https://image.tmdb.org/t/p/original/Aqg1oGujKXEkmpSTmt5bspKkOm6.jpg</thumb>
  </fanart>
  <mpaa>US:PG / US:Rated PG</mpaa>
  <certification>US:PG / US:Rated PG</certification>
  <id>tt0089114</id>
  <tmdbid>9872</tmdbid>
  <uniqueid default="false" type="tmdb">9872</uniqueid>
  <uniqueid default="true" type="imdb">tt0089114</uniqueid>
  <country>US</country>
  <status/>
  <code/>
  <premiered>1985-07-12</premiered>
  <watched>false</watched>
  <playcount>0</playcount>
  <genre>Familiar</genre>
  <genre>Ciencia Ficción</genre>
  <genre>Fantasía</genre>
  <studio>Paramount</studio>
  <credits tmdbid="59943">Eric Luke</credits>
  <director tmdbid="4600">Joe Dante</director>
  <tag>spacecraft</tag>
  <tag>dream</tag>
  <tag>washington dc, usa</tag>
  <tag>space travel</tag>
  <tag>bullying</tag>
  <tag>alien</tag>
  <tag>best friend</tag>
  <tag>middle school</tag>
  <tag>extraterrestrial life form</tag>
  <tag>young love</tag>
  <tag>circuit board</tag>
  <actor>
    <name>Ethan Hawke</name>
    <role>Ben Crandall</role>
    <thumb>https://image.tmdb.org/t/p/h632/a7rgJl8TYUWAfJuM4fxbLHgD7BL.jpg</thumb>
    <profile>https://www.themoviedb.org/person/569</profile>
    <tmdbid>569</tmdbid>
  </actor>
  <actor>
    <name>River Phoenix</name>
    <role>Wolfgang Müller</role>
    <thumb>https://image.tmdb.org/t/p/h632/2kAB7jDmPfEa6BTmXlGcidCzMSz.jpg</thumb>
    <profile>https://www.themoviedb.org/person/741</profile>
    <tmdbid>741</tmdbid>
  </actor>
  <actor>
    <name>Jason Presson</name>
    <role>Darren Woods</role>
    <thumb>https://image.tmdb.org/t/p/h632/jmMHUinxfCmMUgYONlOEQf8m7Lz.jpg</thumb>
    <profile>https://www.themoviedb.org/person/103829</profile>
    <tmdbid>103829</tmdbid>
  </actor>
  <actor>
    <name>Amanda Peterson</name>
    <role>Lori Swenson</role>
    <thumb>https://image.tmdb.org/t/p/h632/m1O3HsAupOQvakIYqcxb6QpZ6dg.jpg</thumb>
    <profile>https://www.themoviedb.org/person/59942</profile>
    <tmdbid>59942</tmdbid>
  </actor>
  <actor>
    <name>Bobby Fite</name>
    <role>Steve Jackson</role>
    <thumb>https://image.tmdb.org/t/p/h632/zvHkJibEwxwqWmDGajG5WGTitRa.jpg</thumb>
    <profile>https://www.themoviedb.org/person/59941</profile>
    <tmdbid>59941</tmdbid>
  </actor>
  <actor>
    <name>Dana Ivey</name>
    <role>Mrs. Müller</role>
    <thumb>https://image.tmdb.org/t/p/h632/oQEeRwEb9w4TuSVp334OP9z6Ljm.jpg</thumb>
    <profile>https://www.themoviedb.org/person/13314</profile>
    <tmdbid>13314</tmdbid>
  </actor>
  <actor>
    <name>Bradley Gregg</name>
    <role>Steve Jackson's Gang</role>
    <thumb>https://image.tmdb.org/t/p/h632/wdutERs05CdWND79gtzXA8WVYrD.jpg</thumb>
    <profile>https://www.themoviedb.org/person/3039</profile>
    <tmdbid>3039</tmdbid>
  </actor>
  <actor>
    <name>Georg Olden</name>
    <role>Steve Jackson's Gang</role>
    <profile>https://www.themoviedb.org/person/1075017</profile>
    <tmdbid>1075017</tmdbid>
  </actor>
  <actor>
    <name>Chance Schwass</name>
    <role>Steve Jackson's Gang</role>
    <profile>https://www.themoviedb.org/person/1075018</profile>
    <tmdbid>1075018</tmdbid>
  </actor>
  <actor>
    <name>Danny Nucci</name>
    <role>Nasty Kid at School</role>
    <thumb>https://image.tmdb.org/t/p/h632/w8nMQvgBFY5vuDoN5B698ostLWN.jpg</thumb>
    <profile>https://www.themoviedb.org/person/8540</profile>
    <tmdbid>8540</tmdbid>
  </actor>
  <actor>
    <name>Taliesin Jaffe</name>
    <role>Ludwig Müller</role>
    <thumb>https://image.tmdb.org/t/p/h632/6J7WD9B9ijvP8rw5hREV47MQbUt.jpg</thumb>
    <profile>https://www.themoviedb.org/person/37713</profile>
    <tmdbid>37713</tmdbid>
  </actor>
  <actor>
    <name>James Cromwell</name>
    <role>Mr. Müller</role>
    <thumb>https://image.tmdb.org/t/p/h632/vpNQQbM5PtxsYmVm4oh79SGFyUK.jpg</thumb>
    <profile>https://www.themoviedb.org/person/2505</profile>
    <tmdbid>2505</tmdbid>
  </actor>
  <actor>
    <name>Brooke Bundy</name>
    <role>Science Teacher</role>
    <thumb>https://image.tmdb.org/t/p/h632/ak0d9nYCA7UN2EBcBOK3cIVFgo9.jpg</thumb>
    <profile>https://www.themoviedb.org/person/72157</profile>
    <tmdbid>72157</tmdbid>
  </actor>
  <actor>
    <name>Tricia Bartholome</name>
    <role>Girl in Classroom</role>
    <profile>https://www.themoviedb.org/person/987886</profile>
    <tmdbid>987886</tmdbid>
  </actor>
  <actor>
    <name>Eric Luke</name>
    <role>Darren's Teacher</role>
    <thumb>https://image.tmdb.org/t/p/h632/yQCcIvXtdd0N4HRcji4CxIf5Szj.jpg</thumb>
    <profile>https://www.themoviedb.org/person/59943</profile>
    <tmdbid>59943</tmdbid>
  </actor>
  <actor>
    <name>Robert Picardo</name>
    <role>Starkiller / Wak / Wak and Neek's Father</role>
    <thumb>https://image.tmdb.org/t/p/h632/9TOY0mhWRMlU5Ur8QHMoDecWs2n.jpg</thumb>
    <profile>https://www.themoviedb.org/person/16180</profile>
    <tmdbid>16180</tmdbid>
  </actor>
  <actor>
    <name>Karen Mayo-Chandler</name>
    <role>Starkiller's Girlfriend</role>
    <thumb>https://image.tmdb.org/t/p/h632/ltaoXOQ7HqyTL6ytYL3Ez2hQm9J.jpg</thumb>
    <profile>https://www.themoviedb.org/person/100480</profile>
    <tmdbid>100480</tmdbid>
  </actor>
  <actor>
    <name>Robert F. Boyle</name>
    <role>Starkiller's Girlfriend's Father</role>
    <profile>https://www.themoviedb.org/person/2657</profile>
    <tmdbid>2657</tmdbid>
  </actor>
  <actor>
    <name>John P. Navin, Jr.</name>
    <role>Couple at Drive-In</role>
    <thumb>https://image.tmdb.org/t/p/h632/aXtqDkyDm6g2k49IMmqdZ4p4FTs.jpg</thumb>
    <profile>https://www.themoviedb.org/person/92557</profile>
    <tmdbid>92557</tmdbid>
  </actor>
  <actor>
    <name>Mary Hillstead</name>
    <role>Couple at Drive-In</role>
    <profile>https://www.themoviedb.org/person/1545229</profile>
    <tmdbid>1545229</tmdbid>
  </actor>
  <actor>
    <name>Simone Blue</name>
    <role>Snack Bar Girl</role>
    <profile>https://www.themoviedb.org/person/1545230</profile>
    <tmdbid>1545230</tmdbid>
  </actor>
  <actor>
    <name>Meshach Taylor</name>
    <role>Gordon Miller</role>
    <thumb>https://image.tmdb.org/t/p/h632/mvjrIKMKz3w9DhRKkISX9dSHUpb.jpg</thumb>
    <profile>https://www.themoviedb.org/person/101784</profile>
    <tmdbid>101784</tmdbid>
  </actor>
  <actor>
    <name>Dick Miller</name>
    <role>Charlie Drake</role>
    <thumb>https://image.tmdb.org/t/p/h632/zyQXh8BkLg0DDo23pxJN2dy6Bqj.jpg</thumb>
    <profile>https://www.themoviedb.org/person/102441</profile>
    <tmdbid>102441</tmdbid>
  </actor>
  <actor>
    <name>Leslie Rickert</name>
    <role>Neek</role>
    <profile>https://www.themoviedb.org/person/1075019</profile>
    <tmdbid>1075019</tmdbid>
  </actor>
  <actor>
    <name>Mary Kay Place</name>
    <role>Mrs. Crandall (uncredited)</role>
    <thumb>https://image.tmdb.org/t/p/h632/2kMJaI9g9kOMw7BXb8tlztUxQZe.jpg</thumb>
    <profile>https://www.themoviedb.org/person/5960</profile>
    <tmdbid>5960</tmdbid>
  </actor>
  <producer tmdbid="16154">
    <name>Michael Finnell</name>
    <thumb>https://image.tmdb.org/t/p/h632/iVKOtioYuV9rLCbrBt8JRGTILE1.jpg</thumb>
    <profile>https://www.themoviedb.org/person/16154</profile>
  </producer>
  <producer tmdbid="11302">
    <name>Edward S. Feldman</name>
    <profile>https://www.themoviedb.org/person/11302</profile>
  </producer>
  <producer tmdbid="45861">
    <name>Tom Jacobson</name>
    <profile>https://www.themoviedb.org/person/45861</profile>
  </producer>
  <producer tmdbid="59944">
    <name>David Bombyk</name>
    <profile>https://www.themoviedb.org/person/59944</profile>
  </producer>
  <trailer/>
  <languages>inglés, alemán</languages>
  <dateadded>2024-02-05 19:58:23</dateadded>
  <lockdata>true</lockdata>
  <fileinfo>
    <streamdetails>
      <video>
        <codec>HEVC</codec>
        <aspect>1.85</aspect>
        <width>2560</width>
        <height>1392</height>
        <durationinseconds>6555</durationinseconds>
        <stereomode/>
      </video>
      <audio>
        <codec>AC3</codec>
        <language>spa</language>
        <channels>2</channels>
      </audio>
      <audio>
        <codec>AC3</codec>
        <language>eng</language>
        <channels>6</channels>
      </audio>
      <subtitle>
        <language>spa</language>
      </subtitle>
      <subtitle>
        <language>eng</language>
      </subtitle>
    </streamdetails>
  </fileinfo>
  <!--tinyMediaManager meta data-->
  <source>UNKNOWN</source>
  <edition>NONE</edition>
  <original_filename>Exploradores v.ext (1985, Joe Dante).(Spanish.English.Subs).WQHD.HEVC.10b-AC3.by.Geot.mkv</original_filename>
  <user_note/>
</movie>

So, I want a file named 'Ethan Hawke.nfo' with all his data, except the 'role' since I want to be able to add it to other files with different roles.

<?xml version="1.0" encoding="UTF-8"?>
<actor>
    <name>Ethan Hawke</name>
    <role></role>
    <thumb>https://image.tmdb.org/t/p/h632/a7rgJl8TYUWAfJuM4fxbLHgD7BL.jpg</thumb>
    <profile>https://www.themoviedb.org/person/569</profile>
    <tmdbid>569</tmdbid>
  </actor>

and so on...

So far I extract the name of all the actors with:

awk -F '>' '/^name/ {print $2}' RS='<' "$file"

I need to loop the result and create each file with the info from the original xml file.

I am using this code now that prevents name splitting:

awk -F '>' '/^name/ {print $2}' RS='<' "$file" | while read actor; do echo "- $actor" > ".actors/$actor.nfo"; done

But I stiil haven't figured out how to write the info of every actor in the file.

@jagonzalez , Welcome. Primarily the forum is a collaboration, you present your challenge along with your attempts, we come back with suggestions,fixes ( some of which may be solutions).

The last 4 names in your output from awk are not actors, they're producers.

You are right.
I am fixing it now.
Thanx

You are right as well.
I need help also filtering the names that do not correspond to an actor.
Thanx

to extract actors ... example

awk '/<actor>/,/<\/actor>/ { print $0 }' input.xml | head -22
  <actor>
    <name>Ethan Hawke</name>
    <role>Ben Crandall</role>
    <thumb>https://image.tmdb.org/t/p/h632/a7rgJl8TYUWAfJuM4fxbLHgD7BL.jpg</thumb>
    <profile>https://www.themoviedb.org/person/569</profile>
    <tmdbid>569</tmdbid>
  </actor>
  <actor>
    <name>River Phoenix</name>
    <role>Wolfgang Müller</role>
    <thumb>https://image.tmdb.org/t/p/h632/2kAB7jDmPfEa6BTmXlGcidCzMSz.jpg</thumb>
    <profile>https://www.themoviedb.org/person/741</profile>
    <tmdbid>741</tmdbid>
  </actor>
  <actor>
    <name>Jason Presson</name>
    <role>Darren Woods</role>
    <thumb>https://image.tmdb.org/t/p/h632/jmMHUinxfCmMUgYONlOEQf8m7Lz.jpg</thumb>
    <profile>https://www.themoviedb.org/person/103829</profile>
    <tmdbid>103829</tmdbid>
  </actor>
  <actor>

Thank you. That solves filtering all the names that are not actors.
I don't know how to export this data to individual files with the actors name and deleting the 'role' tag.

try

cat t.awk
/<actor>/,/<\/actor>/{
        if ( $0 ~ "<name>" ) {
                split($0,x,/[><]/);
                gsub(" ","",x[3]);
                actorsFile=x[3]".xml"
        }
        buffer=buffer $0"\n"
}
/<\/actor>/ {
        print buffer > actorsFile
        actorsFile=""
        buffer=""
}

awk -f t.awk input.xml

should produce a list of actor named .xml files

AmandaPeterson.xml
BobbyFite.xml
BradleyGregg.xml
BrookeBundy.xml
ChanceSchwass.xml
DanaIvey.xml
DannyNucci.xml

@jagonzalez ,
NB:this example can be further simplified, this is left to you to attempt

I don't know how to simplify really, but I have added a filter to let the tag 'role'' out as wanted:

/<actor>/,/<\/actor>/{
        if ( $0 ~ "<name>" ) {
                split($0,x,/[><]/);
                gsub(" ","_",x[3]);
                actorsFile="/route/to/file/"x[3]".xml"
        }
        if ( $0 !~ "<role>" ) {
        	buffer=buffer $0"\n"
        }

}
/<\/actor>/ {
        print buffer > actorsFile
        actorsFile=""
        buffer=""
}

Also changed the space between name and lastname with a '_' and added the possibility to add a route to the output.

1 Like

@jagonzalez , nice to see you've added some logic , below has a couple of additional tweaks that may also be helpful

BEGIN{ hdr="<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>" }

/<actor>/,/<\/actor>/{
        if ( $0 ~ /<actor>/ )
                buffer=hdr ORS

        if ( $0 ~ "<name>" ) {
                split($0,x,/[><]/);
                gsub(/[ ,\.]/,"_",x[3]); # get rid of commas , dots . spaces
                actorsFile=x[3]".xml"
        }

        if ( $0 ~ "<role>" ) # skip this line
                next

        buffer=buffer $0 ORS

        if ( $0 ~ /<\/actor>/ ) {
                print buffer > actorsFile
                actorsFile=""
                buffer=""
                next
        }
}

@jagonzalez

I'd initally posted an in-progress version, the one posted now should work. As always, you need to test and be satisfied it fits your needs.

Another technique is a state variable, avoids a repetition of /<actor>/ or /<\/actor>/

/<actor>/ {
        actor=1
        buffer="<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"
}
actor {
        if ( $0 ~ /<name>/ ) {
                split($0, x, /[><]/)
                gsub(" ", "_", x[3])
                actorsFile=(x[3] ".xml")
        }
        if ( $0 !~ /<role>/ ) {
                buffer=(buffer ORS $0)
        }
}
/<\/actor>/ {
        actor=0
        print buffer > actorsFile
}