Problem parsing

Hi,

I want to fetch a text.Clipping.

{version='1encoding='UTF-8'?><!DOCTYPExmeml><xversion='4'i='2C360C00115905800800460202155173'><name>C0001</name><duration>184</duration><rate><ntsc>FALSE</ntsc><timebase>25</timebase></rate><labels><label2></label2></labels><comments><mastercomment1></mastercomment1><mastercomment2>En-montage</mastercomment2><clipcommenta></clipcommenta><clipcommentb></clipcommentb></comments><logginginfo><lognote></lognote><description></description><scene>rugby-fabien-galtier-pour-tous-le-sport</scene><shottake>2011-12-1608:53:32</shottake><scenenote>FJean-philipJri</scenenote></logginginfo><fileid='2C360C00115905800800460202155173_file'><name>C0001</name><pathurl>file://localhost/Volumes/intermediation/4D760F00085905C12202080046020125/Clip/C0001MXF</pathurl><duration>184</duration></file></clip><cl='F3900D00115905800800460202154F6F'><name>C0001</name><duration>50</duration><rate><ntsc>FALSE</ntsc><timebase>25</timebase></rate><labels><label2></label2></labels><comments><mastercomment1></mastercomment1><mastercomment2>En-montage</mastercomment2><clipcommenta></clipcommenta><clipcommentb></clipcommentb></comments><logginginfo><lognote></lognote><description>Presse-Bouton-B-point-de-Tugny</description><scene>Disparition-KADA</scene><shottake>2011-12-09:52:45</shottake><scenenote>RENARDAurelieJournaliste-redactrice</scenenote></logginginfo><fileid='F3900D00115905800800460202154F6F_file'><name>C0001</name><pathurl>file://localhost/Volumes/intermediation/72A11400105905C12202080046020125/Clip/C0001MXF</pathurl><duration>50</duration></file></clip><cli='81C31000115905800800460202155176'><name>C0001</name><duration>586</duration><rate><ntsc>FALSE</ntsc><timebase>25</timebase></rate><labels><label2></label2></labels><comments><mastercomment1>BRI</mastercomment1><mastercomment2>En-montage</mastercomment2><clipcommenta></clipcommenta><clipcommentb></clipcommentb></comments><logginginfo><lognote></lognote><description></description><scene>musee-du-moulages</scene><shottake>2011-12-1612:12:25</shottake><scenenote>ACa/Redacteur</scenenote></logginginfo><filid='81C31000115905800800460202155176_file'><name>C0001</name><pathurl>file://localhost/Volumes/intermediation/C2EF1B00875805C20111216164346692/Clip/C0001MXF</pathurl><duration>586</duration></file></clip><importoptions><createnewproject>FALSE</createnewproject></importoptions></xmeml>};{utxtustl

i have the "file" in one line, i want all occurrences

4D760F00085905C12202080046020125/Clip/C0001MXF
72A11400105905C12202080046020125/Clip/C0001MXF
C2EF1B00875805C20111216164346692/Clip/C0001MXF

I try with awk but i don't know how to use the function substr, i try but without success

THX

hi,
maybe can try sed instead of awk :slight_smile:

# sed 's|intermediation/|\n|g' input.text|sed -n 's|^\([^ ]*/[^ ]*/[^ ]*\)</pathurl>.*|\1|p;'
4D760F00085905C12202080046020125/Clip/C0001MXF
72A11400105905C12202080046020125/Clip/C0001MXF
C2EF1B00875805C20111216164346692/Clip/C0001MXF

regards
ygemici

Thanks for your answer, but the first command sed don't substitute "intermediation" with a newline
My shell is bash and my OS is OSX

then try like this

sed 's/intermediation\//\'$'\n/g' input.text
1 Like

With OS X's BSD Sed try this:

sed 's|intermediation/|\
 |g' input.text |sed -n 's|^\([^ ]*/[^ ]*/[^ ]*\)</pathurl>.*|\1|p;'

excellent, it's works thx.

I've tried so:

awk '{gsub(/intermediation/,"\n",$0);print}'

it's work so.

anyway you have found the trick, i have don't thank to substitute the string intermediation by a newline.