thread_id=666&page=6666#666666">Post title 1</a><br><div style="padding:2px 0px 3px 0px;">Text from the post itself</div>
thread_id=666&page=6666#666666">Post title 2</a><br><div style="padding:2px 0px 3px 0px;">Text from the post itself</div>
thread_id=666&page=6666#666666">Post title 3</a><br><div style="padding:2px 0px 3px 0px;">Text from the post itself</div>
What I want as result:
I assume there would be a quite easy way to do this with awk?
Is there any way to apply this to a full html page, where there would be alot more than "what I have"? As in first find thread_id= and get that all the way to the fist </div>.
That doesn't seem to catch 'em all, though. It results in 4 titles and posts out of 25.
Looking at the result I get, I think it skips all posts which contains a quote, which is represented by [user:quoted text], or a link. For example, these two aren't caught:
thread_id=8083&page=3034#1299591">Sandl�dan - Prataomvadsomhelstn�rsomhelst-tr�den</a><br><div style="padding:2px 0px 3px 0px;">[The Ultra:De �r menade att skjutas upp i luften egentligen.]
Den d�r typen av flares �r ju inte till f�r att skjutas upp.</div>
thread_id=48046&page=1#1287691"><b>Arcaflex...?</b></a><br><div style="padding:2px 0px 3px 0px;">www.gamer.se
Youmeet, vad �r upp?</div>
Or is it maybe the <b> </b> that breaks it in the second example?
Dirty or not, I piped the first one to the second one, and it works like a charm! Thank you so much.
---------- Post updated at 10:38 PM ---------- Previous update was at 02:53 PM ----------
If someone has time to help me, I would need some addition help with the following:
In a html file, I have this text:
KidCactus';"><div class="forum_thread_text"><span class="forum_text_quote"><strong>Andreas Berg:</strong> Det var inte m�nga �r sedan jag tog hj�lp av Google f�r att koka ett �gg <img src="http://gameplayer.se/gfx/smilies/blush.gif" alt="[blush]" border=0 width=15 height=15> (till mitt f�rsvar �ter jag i princip aldrig �gg och har knappt gjort det alls, s� det har inte riktigt funnits anledning f�r mig att veta hur l�nge ett �gg ska koka <img src="http://gameplayer.se/gfx/smilies/crazy.gif" alt="[crazy]" border=0 width=15 height=15>)</span><br/>�r du fr�n <a class="forum_text_url" href="http://www.svd.se/nyheter/inrikes/artikel_774535.svd" target="_blank">Storbritannien</a>?<br/><br/>Jag googlar r�tt ofta f�r att r�ttstava ord, eller f�r att kolla om vissa ord ens existerar utanf�r min hj�rna.</div>
Anywhere in the file where this is found:
KidCactus';"><div class="forum_thread_text">
I want to cut out the text between that and:
</div>
So the result would be:
<span class="forum_text_quote"><strong>Andreas Berg:</strong> Det var inte m�nga �r sedan jag tog hj�lp av Google f�r att koka ett �gg <img src="http://gameplayer.se/gfx/smilies/blush.gif" alt="[blush]" border=0 width=15 height=15> (till mitt f�rsvar �ter jag i princip aldrig �gg och har knappt gjort det alls, s� det har inte riktigt funnits anledning f�r mig att veta hur l�nge ett �gg ska koka <img src="http://gameplayer.se/gfx/smilies/crazy.gif" alt="[crazy]" border=0 width=15 height=15>)</span><br/>�r du fr�n <a class="forum_text_url" href="http://www.svd.se/nyheter/inrikes/artikel_774535.svd" target="_blank">Storbritannien</a>?<br/><br/>Jag googlar r�tt ofta f�r att r�ttstava ord, eller f�r att kolla om vissa ord ens existerar utanf�r min hj�rna.
If the <br/> also could be converted to a new line at the same time, that would be awesome. I have tried this, but I guess something is wrong since I don't get anything at all: