Advanced sed/awk help

I have thousands of files in HTML that looks like this:

    <!-- table horaire -->             <!-- table horaire -->
        <table border="0" cellspacing="0" cellpadding="0" class="tblHoraires" summary="Table des horaires de la ligne 12">
<th scope="row" class="horaireColFill_even">05h</th>
<td class="horaireColFill_even">22</td>
<td class="horaireColFill_even">38</td>
<td class="horaireColFill_even">52</td>
<td class="horaireColEmpty_even">�</td><td class="horaireColEmpty_even">�</td><td class="horaireColEmpty_even">�</td><td class="horaireColEmpty_even">�</td></tr>
<th scope="row" class="horaireColFill_odd">06h</th>

<td class="horaireColFill_odd">06</td>
<td class="horaireColFill_odd">19</td>
<td class="horaireColFill_odd">32</td>
<td class="horaireColFill_odd">44</td>
<td class="horaireColFill_odd">55</td>
<td class="horaireColEmpty_odd">�</td><td class="horaireColEmpty_odd">�</td></tr>
<th scope="row" class="horaireColFill_even">07h</th>
<td class="horaireColFill_even">06</td>
<td class="horaireColFill_even">16</td>

<td class="horaireColFill_even">26</td>
<td class="horaireColFill_even">36</td>
<td class="horaireColFill_even">47</td>
<td class="horaireColFill_even">58</td>
<td class="horaireColEmpty_even">�</td></tr>

I would like to extract data from all of them look like the following:


Where the numbers are the text from <th> appended to each corresponding <td>
Would that be possible using sed/awk?

Where is "filename1", "filename2", etc. located in the input file?

something along these lines:
nawk -f char.awk myFiles*

# field separator: either > or <
# first line in a current file? print the FILENAME of a current file
FNR==1 {printf("%c%s", (NR==1)?"":ORS, FILENAME)}

#second field contains "th scope=" pattern? save the value of the third field in a var "h"
$2 ~ "th scope=" { h=$3;next}

# second field contains "td class=.*ColFill.*" pattern?Print "#", followed by var "h", followed by value of the third field.
$2 ~ "td class=.*ColFill.*" { printf "#" h $3}


# print ORS/endOfLine for the last printf
  printf ORS
1 Like

How about perl ?

printf "$_";
open(FH,"$_") || die "FAIL - $!\n";
if(/^<td.*>(.+?)<\/td>$/){printf "#%s",$th.$1;}
printf "\n";


perl myfiles_*.html

Thank you for your support.

The awk script works perfectly fine, however it does not display the filename.
Will you kindly comment the code or explain what it does in details so i can make a few more changes to it?

The perl script does not return anything, it gives a blank line, i believe it would probably be a simple adjustment but i am not familiar with perl at all so i am not how to do it.

Commented the code.
How do you call the script? As suggested?
What OS are you on? If on Solaris, use nawk or /usr/xpg4/bin/awk (instead of old/plain/broken awk).

Thank you for the comments.
I am actually using Linux, am on Kubuntu. I have awk installed, i will try to get nawk and try with it. am executing from bash.

that's fine - you don't need 'nawk' - you can use 'awk' on Linux.
How do you execute the code for all input files?
Please post the exact execution sequence/script!

am typing:

awk -f awk.txt Folder/* > result.txt

anyway, i tried on cygwin and it seems to work fine.

i just added this line:

 /var point = new / { printf "#" $0  } 

to grab another piece of info, would it be possible to only print what is between parenthesis on this line only? its not very important as i can remove extras later with grep, but would be nice to have everything in one command :slight_smile:

post a sample data file - your previous sample doesn't contain 'var point = new'.

1 Like
var point = new GLatLng(46.20004142,6.168357236);

I would like to get the numbers for the Lat/Long

/var point = new / { split($0,"[()]",a);printf "#" a[2]  }

i get this error when i run it:

$ awk -f ../awk.awk.txt ligne12_aller_ermt
 ligne12_aller_ermtawk: ../awk.awk.txt:4: (FILENAME=ligne12_aller_ermt FNR=55) f
atal: split: second argument is not an array


/var point = new / { split($0,a,"[()]");printf "#" a[2]  }