Filter a .kml file (xml) to remove unwanted entries

Phear46 · May 10, 2011, 1:59pm

Ok, i have a .kml file that that i want to trim down and get rid of the rubbish from. its formatted like so:

<Placemark>
<name><![CDATA[BTHomeHub****]]></name>
<description><![CDATA[BSSID: <b>*********</b><br/>Capabilities: <b>[WPA-PSK-TKIP]</b><br/>Frequency: <b>2437</b><br/>Timestamp: <b>1304892397000</b><br/>Date: <b>2011-05-08 23:06:37</b>]]></description><styleUrl>#red</styleUrl>
<Point>
<coordinates>******************</coordinates></Point>
</Placemark>

there are about 1200 of these place mark tags within the file.

What i want to do is remove the entire placemark tag if the name tag contains a set string. eg, BTHomehub

I know this CAN be done, but i literally have no idea how. my shell scripting experience is lacking to say the least. Even a point in the right direction is more than welcome, thanks in advance for any help!
Nathan

bartus11 · May 10, 2011, 2:13pm

Try:

perl -p0e 's/<Placemark>.*?BTHomeHub.*?<\/Placemark>\n//sg' file

Phear46 · May 10, 2011, 2:35pm

thankyou for the speedy reply bartus. For some reason this isnt working, It returns the file with 5 placemark tags, none of which are the BTHomeHub ones (which is correct), but there should be a bunch more.

I thought maybe it was just outputting the first few so i ran

perl -p0e 's/<Placemark>.*?BTHomeHub.*?<\/Placemark>\n//sg' file | gedit

to pipe the output to a txt file but i just get a blank page.

Any ideas? Also could you explain what that line actually does? what does -p0e mean?

bartus11 · May 10, 2011, 3:00pm

I think you should show us more sample entries (like first 50 or something).
edit: nevermind, I know what is wrong. I'll try to get you right solution soon.
edit2: Try this:

perl -p0e 's/<Placemark>\n[^\n]+BTHomeHub.*?<\/Placemark>\n//sg' file

ahamed101 · May 10, 2011, 4:12pm

Not a one liner but does the job

#!/bin/bash
i=1;f=0
while read line
do
  if [ "$line" == "<Placemark>" ]; then s=0; fi
  echo "$line" | grep "$1" >/dev/null 2>&1
  if [ $? -eq 0 ] && [ $f -eq 0  ] ;then f=1;fi
  if [ "$line" == "</Placemark>" ]; then s=1; fi
  x[$i]="$line"
  if [ $s -eq 1 ];then
    if [ $f -eq 0 ];then
      for((j=1;j<=i;j++))
      do
        echo ${x[$j]}
      done
    fi
    i=0;unset x;f=0
  fi
  ((i=i+1))
done < infile

Usage : script pattern

regards,
Ahamed

Phear46 · May 10, 2011, 4:20pm

That works perfectly! Thankyou.

Now.... I assumed i would just replace 'BTHomeHub' with any other set string within the <placemark> tags to remove said tag, but that doesnt work. How are you telling perl to only search the name line?

I was going to remove the placemarks that are 'wpa-psk-tkip' enabled.

ahamed101 · May 10, 2011, 4:25pm

Try the shell script.

Usage : script pattern_to_be_removed

eg: ./script "wpa-psk-tkip"

bartus11 · May 10, 2011, 4:28pm

perl -p0e 's/<Placemark>\n[^\n]+\n[^\n]+WPA-PSK-TKIP.*?<\/Placemark>\n//sg' file

Phear46 · May 10, 2011, 4:43pm

Your shell script worked also ahamed, but i dont seem to be able to output the result to a txt file.

> file.kml returns an error
| gedit returns an empty page

Sorry im such a n00b. Im not really up on this sort of thing

ahamed101 · May 10, 2011, 4:46pm

what is the error you got? paste it
It should be like this

./script "pattern" >> file.kml

regards,
Ahamed

ygemici · May 11, 2011, 4:08pm

remove=BTHomeHub 
sed -ne '/<Placemark>/{N;s/\(.*\)\n\(.*'$remove'.*\)/x\n\2/};/x/,/<\/Placemark>/d;p' file >newfile1
or
sed -ne '/<Placemark>/{N;s/\(.*\)\n\(.*'$remove'.*\)/x\n\2/;};/x/,/<\/Placemark>/{;{/'$remove'/{;:x;;n;/<\/Placemark>/!bx;/<\/Placemark>/d;};}};p' file > newfile1

remove=WPA-PSK-TKIP
sed -ne '/<Placemark>/{N;N;s/\(.*\)\n.*\(.*'$remove'.*\)/x\n\2/};/x/,/<\/Placemark>/d;p' file >newfile2

regards
ygemici

snappy46 · May 19, 2011, 9:27pm

ahamed101:

Not a one liner but does the job

#!/bin/bash
i=1;f=0
while read line
do
  if [ "$line" == "<Placemark>" ]; then s=0; fi
  echo "$line" | grep "$1" >/dev/null 2>&1
  if [ $? -eq 0 ] && [ $f -eq 0  ] ;then f=1;fi
  if [ "$line" == "</Placemark>" ]; then s=1; fi
  x[$i]="$line"
  if [ $s -eq 1 ];then
   if [ $f -eq 0 ];then
   for((j=1;j<=i;j++))
   do
   echo ${x[$j]}
   done
   fi
   i=0;unset x;f=0
  fi
  ((i=i+1))
done < infile

Usage : script pattern

regards,
Ahamed

Does anyone knows how can this be implemented using busybox. The first issue I had was with the for loop which I was able to fix with this and a counter j=j+1 in the loop:

while j <= i

Which I think will do the same but now this statement:

echo ${x[$j]}

Causes this error: Syntax error: Bad substitution.

I am not really sure how this script does it's magic so I am not really sure how to make it work with busybox with the same results; which is to eliminate the whole xml element if the string is found in this element .... <Movie>....</Movie> vs <Placemark> in my case.

Thank you