How to use shell script's to get EPG Program with SED

single · November 6, 2007, 1:08pm

This is my test script version:

#!/bin/sh
wget -q -O /tmp/axn 'http://www.axn.pt/programacion/'
sed -e '/^$/ d' /tmp/axn > /tmp/temp     #Clean black space

sed -e 's/<[^>]*>//g' /tmp/temp > /tmp/temp1  #Remove HTML
rm -f /tmp/temp    # Step by step script then clean one by one
sed 's/>[    ]*</></g'  /tmp/temp1 > /tmp/temp2
#
rm -f /tmp/temp1
sed -e '/^$/ d' /tmp/temp2 > /tmp/temp3     #Clean empty space
rm -f /tmp/temp2
sed -n '102,130p' /tmp/temp3 > /tmp/temp4 # Print line 102 to 103
rm -f /tmp/temp3
sed -e '/^$/ d' /tmp/temp4 > /tmp/temp5     #Clean empty space
rm -f /tmp/temp4
sed 's/^\([0-9][0-9]:[0-9][0-9]\)/\n\1/g' /tmp/temp5 > /tmp/temp6
rm -f /tmp/temp5
sed -e 's/&.*;//g' /tmp/temp6 > /tmp/temp7
rm -f /tmp/temp6
cat /tmp/temp7  # Show temp7
fi

and the result is this:

Manh�
                
        06:26C.S.I. Nova Iorque07:10Hospital Central08:33Sem Rasto09:20Stingers: Infiltrados10:15C.S.I. Nova Iorque11:05Hospital Central12:30C.S.I. Miami13:25Asas nos P�s14:15Medium

This is the Programs of this channel, this channel have no EPG on it, then using internet i can go to that web pag. and grab the "EPG"

Well the problem is that is not what i have in mind is something like this:

Manh�               
06:26 C.S.I. Nova Iorque
07:10 Hospital Central
08:33 Sem Rasto
09:20 Stingers: Infiltrados
10:15 C.S.I. Nova Iorque
11:05 Hospital Central
12:30 C.S.I. Miami
13:25 Asas nos P�s
14:15 Medium

Another think this is not to be used in a PC but in STB Linux based.

Maby a xml aproch will be better, any ideas?

gus2000 · November 6, 2007, 2:36pm

I don't know which UNIX you're using. but my version of sed does not allow "\n" as an embedded carriage return. I need to do this:

#echo abcX123 | sed -e 's/X/\
/'

I don't know why there's no space after the timestamp, either, but both issues should be fixed by this:

# Note the space after the "\1"

sed 's/^\([0-9][0-9]:[0-9][0-9]\)/\
\1 /g' /tmp/temp5 > /tmp/temp6

That being said, there are probably much cleaner ways to parse this data. Sadly I know little of XML parsing.

aigles · November 6, 2007, 2:42pm

A possible solution with awk :

#!/bin/sh
wget -q -O /tmp/axn 'http://www.axn.pt/programacion/'
awk '
/<h1>.*<\/h1>/,/<\/div>/ {
   if (/<h1>/) {
      sub(/.*<h1>/,   "");
      sub(/<\/h1>.*/, "");
      print;
   } else if (/<dt>/) {
      gsub(/<\/a>/, "\n");
      gsub(/<[^>]*>/, " ");
      gsub(/[ \t]+/, " ");
      if (! /^ *$/) print;
   }
}
' /tmp/axn

Output:

Manh�
 06:18 C.S.I. Nova Iorque
 07:03 Hospital Central
 08:31 The Nine
 09:20 Stingers: Infiltrados
 10:15 C.S.I. Nova Iorque
 11:05 Hospital Central
 12:30 C.S.I. Miami
 13:25 Asas nos P�s
 14:15 Medium

Tarde
 15:10 Sem Rasto
 16:00 O Protector
 17:00 Investiga��o Criminal
 17:50 C.S.I. Nova Iorque
 18:40 Stingers: Infiltrados
 19:35 Medium

Noite
 20:33 Asas nos P�s
 21:30 Investiga��o Criminal
 22:26 C.S.I.
 23:20 C.S.I. Miami

Madrugada
 00:15 Investiga��o Criminal
 01:03 C.S.I.
 01:50 Corre Lola corre
 03:10 Stingers: Infiltrados
 03:58 Insert Coin
 04:24 Hospital Central
 05:40 Asas nos P�s

Jean-Pierre.

single · November 7, 2007, 11:56am

aigles:

A possible solution with awk :

#!/bin/sh
wget -q -O /tmp/axn 'http://www.axn.pt/programacion/'
awk '
/<h1>.*<\/h1>/,/<\/div>/ {
   if (/<h1>/) {
   sub(/.*<h1>/,   "");
   sub(/<\/h1>.*/, "");
   print;
   } else if (/<dt>/) {
   gsub(/<\/a>/, "\n");
   gsub(/<[^>]*>/, " ");
   gsub(/[ \t]+/, " ");
   if (! /^ *$/) print;
   }
}
' /tmp/axn

Output:

Manh�
 06:18 C.S.I. Nova Iorque
 07:03 Hospital Central
 08:31 The Nine
 09:20 Stingers: Infiltrados
 10:15 C.S.I. Nova Iorque
 11:05 Hospital Central
 12:30 C.S.I. Miami
 13:25 Asas nos P�s
 14:15 Medium

Tarde
 15:10 Sem Rasto
 16:00 O Protector
 17:00 Investiga��o Criminal
 17:50 C.S.I. Nova Iorque
 18:40 Stingers: Infiltrados
 19:35 Medium

Noite
 20:33 Asas nos P�s
 21:30 Investiga��o Criminal
 22:26 C.S.I.
 23:20 C.S.I. Miami

Madrugada
 00:15 Investiga��o Criminal
 01:03 C.S.I.
 01:50 Corre Lola corre
 03:10 Stingers: Infiltrados
 03:58 Insert Coin
 04:24 Hospital Central
 05:40 Asas nos P�s

Jean-Pierre.

You don't exist, heheh Tx. real nice script, seams so simple like you type...

Must test it, then i tould you result's...
About unix question, STB used this "MIX" to work:

+ CVS 25.08.2007
+ kernel v. 2.6.9 (1.10)
+ enigma v. 1.10 Mod (25.08.2007)
+ BusyBox v1.01 
+ Web Interface: 6.02
+ Gcc 3.4.4
+ FP Firmware 1.06
+ LZMA Patch

CU and regard's.