Extracting parts of a file.

srivat79 · May 27, 2009, 1:07am

Hello,

I have a XML file as below and i would like to extract all the lines between <JOB & </JOB> for every such occurance. The number of lines between them is not fixed.

Anyways to do this awk?

============
<JOB APR="1" AUG="1" DEC="1" FEB="1" JAN="1" JUL="1" JUN="1" MAR="1" MAY="1" NOV="1" OCT="1" SEP="1" >
<QUANTITATIVE NAME="B2_ADJ" QUANT="1"/>
<QUANTITATIVE NAME="B2_NR" QUANT="1"/>
</JOB>
<JOB APR="1" AUG="1" DEC="1" FEB="1" JAN="1" JUL="1" JUN="1" MAR="1" MAY="1" NOV="1" OCT="1" SEP="1" >
<QUANTITATIVE NAME="B2_ADJ" QUANT="1"/>
<QUANTITATIVE NAME="B2_NR" QUANT="1"/>
</JOB>

zaxxon · May 27, 2009, 1:30am

sed:

sed '/^<JOB/,/^\/JOB/!d; /^<\/*JOB/d' infile

awk:

awk '/^<JOB/,/^\/JOB/ {if ( $0 ~ /^<\/*JOB/ ) {next} else {print}}' infile

ghostdog74 · May 27, 2009, 1:59am

if you have Python

#!/usr/bin/env python
f=0
for line in open("file"):
    if "</JOB" in line: f=0;continue
    if "<JOB" in line:
        f=1
        continue
    if f: print line.strip()

output

# ./test.py
<QUANTITATIVE NAME="B2_ADJ" QUANT="1"/>
<QUANTITATIVE NAME="B2_NR" QUANT="1"/>
<QUANTITATIVE NAME="B2_ADJ" QUANT="1"/>
<QUANTITATIVE NAME="B2_NR" QUANT="1"/>

srivat79 · May 27, 2009, 2:01am

Thanks guys.