This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

html parser

Help for those learning Tcl or writing their own scripts.
Post Reply
r
romprod
Halfop
Posts: 49
Joined: Fri Oct 19, 2001 8:00 pm

html parser

Post by romprod »

Trying create a basic script to rip info from a url and spit it out to a channel but for some reason it aint working. Can anyone point out the obvious to me please as it's driving me crazy! :)

Code: Select all

# Config
set url "http://feed43.com/3222412860174114.xml"
set dcctrigger "test"
# End of config

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
  putlog "egghttp_example.tcl has not been loaded as a result."
} else {
  proc your_callbackproc {sock} {
    global url
    set headers [egghttp:headers $sock]
    set body [egghttp:data $sock]
  
    regsub -all "\n" $body "" body
    regsub -all -nocase {<br>} $body "<br>\n" body

    regexp {<b>(.*)<br/>} $body - team

    putlog "Team: $team"
  }

  bind dcc o|o $dcctrigger our:dcctrigger
  proc our:dcctrigger {hand idx text} {
    global url 
    set sock [egghttp:geturl $url your_callbackproc]
    return 1
  }  

  putlog "egghttp_example.tcl has been successfully loaded."
}
r
romprod
Halfop
Posts: 49
Joined: Fri Oct 19, 2001 8:00 pm

Post by romprod »

The above script didn't work because of the page it was getting data from, i've changed the source now and it is working but I'm unable to make it loop through to the next line of text. I'll also include a sample of the html code i'm trying to parse.

Code: Select all

set rssfeed "http://www.fred.co.uk"
set trigger "!latest"
set channel "#12321"

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
  putlog "egghttp_example.tcl has not been loaded as a result."
} else {
  proc your_callbackproc {sock} {

    global rssfeed channel

    set headers [egghttp:headers $sock]
    set body [egghttp:data $sock]

    regexp {"><h2>(.*?)</h2>} $body - date
    puthelp "PRIVMSG $channel : $date"

    set xml { $body }  
    foreach line [split $xml "\n"] {
    regexp {<td valign="top" class="tblRow colmNum000">(.*?)</td><td valign="top" class="tblRow">(.*?)</td></tr>} $body - time1 game1
    puthelp "PRIVMSG $channel : $time1 $game1"
   }
  }

  bind pub -|* $trigger top:trigger
  proc top:trigger {nick host hand chan text} {
    global rssfeed
    set sock [egghttp:geturl $rssfeed your_callbackproc]
    return 1
  } 
}
HTML that I need to parse

Code: Select all

<div class="content"><h1>Barclays Premier League fixtures</h1></div><div class="tblContain"><h2>4 Dec 2010</h2><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz1</td><td valign="top" class="tblRow">xxx1</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz2</td><td valign="top" class="tblRow">xxx2</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz3</td><td valign="top" class="tblRow">xxx3</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz4</td><td valign="top" class="tblRow">xxx4</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz5</td><td valign="top" class="tblRow">xxx5</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz6</td><td valign="top" class="tblRow">xxx6</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz7</td><td valign="top" class="tblRow">xxx7</td></tr></table><br/><h2>5 Dec 2010</h2><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz8</td><td valign="top" class="tblRow">xxx8</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz9</td><td valign="top" class="tblRow">xxx9</td></tr></table><br/><h2>6 Dec 2010</h2><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz10</td><td valign="top" class="tblRow">xxx10</td></tr></table><br/><h2>11 Dec 2010</h2><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz11</td><td valign="top" class="tblRow">xxx11</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz12</td><td valign="top" class="tblRow">xxx12</td></tr></table></div><div class="content infoArea">
The only outcome will now be

Code: Select all

[02:43:56] <@nick> !latest
[02:44:01] <+bot> 4 Dec 2010
[02:44:03] <+bot> zzz1 xxx1
But I would like

Code: Select all

[02:43:56] <@nick> !latest
[02:44:01] <+bot> 4 Dec 2010
[02:44:03] <+bot> zzz1 xxx1
[02:44:03] <+bot> zzz2 xxx2
[02:44:03] <+bot> zzz3 xxx3
[02:44:03] <+bot> zzz4 xxx4
[02:44:03] <+bot> zzz5 xxx5
etc etc etc
Thanks in davance! :)
Post Reply