This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Scraping a little bit of text

Help for those learning Tcl or writing their own scripts.
Post Reply
p
paulOr
Voice
Posts: 10
Joined: Sat Nov 01, 2008 10:32 am

Scraping a little bit of text

Post by paulOr »

Code: Select all

package require http
setudef flag serverinfo
variable serverquery "http://www.imghostr.net/"
variable servertimeout 10

bind pub - "!imghostr" checkserver


proc checkserver {nick host hand chan rest} {
       # chanset catch, use .chanset #yourchan +serverinfo to enable
       if {[lsearch -exact [channel info $chan] +serverinfo] == -1} { return 0 }
       # browser agent
       set http [::http::config -useragent "Mozilla"]

       # get url with error control
       catch {set http [::http::geturl "$::serverquery" -timeout [expr 1000 * $::servertimeout]]} error

       # case 1, no socket
       if {[string match -nocase "*couldn't open socket*" $error]} {
              putserv "privmsg $chan : Cannot open socket. Try again later."
              ::http::cleanup $http
              return 0
       }

       # case 2, timeout
       if { [::http::status $http] == "timeout" } {
              putserv "privmsg $chan : Website has timed out. Try again later."
              ::http::cleanup $http
              return 0
       }

       # case 3, success, get html
       set html [::http::data $http]

       # scrape the page
       if {![regexp -- {<li><label>Currently Hosting:</label>.*?</li>} $html - s_login]} {set s_login Unknown}

       # reformat scraped information and message to irc.
       puthelp "privmsg $chan :images : $s_login"
       return 1
}
So i done some searching and found what i think should do the job, iv added in the HTML sarounding what im wanting to show.

http://imghostr.net <-- i want the current image count: Currently Hosting ### Images.

Can anyone see where im going wrong?
User avatar
Papillon
Owner
Posts: 724
Joined: Fri Feb 15, 2002 8:00 pm
Location: *.no

Post by Papillon »

try:

Code: Select all

if {![regexp -- {<li><label>Currently Hosting:</label>(.+)</li>} $html - s_login]} {set s_login Unknown}
Elen sila lúmenn' omentielvo
User avatar
arfer
Master
Posts: 436
Joined: Fri Nov 26, 2004 8:45 pm
Location: Manchester, UK

Post by arfer »

Code: Select all

package require http

setudef flag images

set vTimeout 10
set vUrl http://imghostr.net/

bind PUB - !images pImages

proc pImages {nick uhost hand channel txt} {
    global vTimeout vUrl
    if {[channel get $channel images]} {
        set agent [::http::config -useragent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"]
        if {![catch {set http [::http::geturl $vUrl -timeout [expr {$vTimeout * 1000}]]}]} {
            switch -- [::http::status $http] {
                "timeout" {putserv "PRIVMSG $channel :attempt to scrape $vUrl timed out after $vTimeout seconds"}
                "error" {putserv "PRIVMSG $channel :attempt to scrape $vUrl returned error [::http::error $http]"}
                "ok" {
                    switch -- [::http::ncode $http] {
                        200 {
                            regexp -- {Currently Hosting:\</label\>(.+?)Images} [::http::data $http] -> images
                            if {([info exists images]) && ([regexp -- {[0-9]+} $images])} {
                                putserv "PRIVMSG $channel :$vUrl is currently hosting [string trim $images] images"
                            } else {putserv "PRIVMSG $channel :the number of images hosted by $vUrl could not be found"}
                        }
                        default {putserv "PRIVMSG $channel :attempt to scrape $vUrl returned ncode [::http::ncode $http]"}
                    }
                }
            }
            ::http::cleanup $http
        } else {putserv "PRIVMSG $channel :attempted connection to $vUrl failed"}
    }
    return 0
}
I must have had nothing to do
Post Reply