This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Trying to get some user stats of a forum with regexp. SOLVED

Help for those learning Tcl or writing their own scripts.
Post Reply
a
arcanedreams
Voice
Posts: 9
Joined: Thu Sep 06, 2007 2:48 am

Trying to get some user stats of a forum with regexp. SOLVED

Post by arcanedreams »

Ok, so ...the user types "!joined <username>"

Then the bot should check the site and get the user's join date. In the source code for the page that the query is set to, the join date appears like this:

Code: Select all

<div style="padding:3px">
					Join Date: <strong>09-30-2005</strong>
				</div>

So..this is what I came up with. It doesn't work at all. Perhaps you guys can tell me where I went wrong?

This script is eventually going to be expanded upon to get the users post count, posts per day, birthday, ect...

Code: Select all

bind pub - !joined joined
proc joined {nick host handle chan text} {


set query "http://forums.shooshtime.com/member.php?username=$text"
regexp {Join Date: <strong>(.*?)</strong>} $data - join

putserv "PRIVMSG $chan :$text joined on $join"
}

Thanks in advance!




It has come to my attention that the bot must log-in in order to be able to see the info it needs. So that is yet another dilemma I have to overcome.
Last edited by arcanedreams on Thu Sep 06, 2007 9:35 pm, edited 1 time in total.
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Post by rosc2112 »

Where is the geturl call? There's nothing in the $data var cos there's nothing getting it, like:

set data [::http::geturl $query]

As far as how to authenticate using the facilities available in tcl, I have not attempted it myself and don't know if its even possible (aside from a plain query/post type login). Maybe someone else has some info about that.
a
arcanedreams
Voice
Posts: 9
Joined: Thu Sep 06, 2007 2:48 am

Post by arcanedreams »

Sorry, I'm new to the whole TCL thing.

What I have been doing is picking apart other scripts and figuring out how they work..then writing my own stuff based on that.

So far it has worked..until now.

Could you explain what the bit of code you posted does?


From looking at it I would assume that it sets the variable $data with whatever is in the [] brackets.

I am guessing that [::http::geturl $query] is a command that actually access the webpage in the $query variable.

So..I could actually do away with my $query variable all together and simply input the URL directly.


I am however still a little fuzzy on the regexp function. Is mine set up correctly?

The way I am understanding it is that it searches for the string that is in the {} brackets located in the $data variable.

It then inputs the (.*?) area into the variable $join.

Is the (.*?) a standard wildcard to take everything in that area..or does the *, and the ? signify different things? I know generally the * is a wildcard for any character, but I am not familiar with the . or the ?

So correct code would be:

Code: Select all

bind pub - !joined joined

proc joined {nick host handle chan text} {

set data [::http::geturl http://forums.shooshtime.com/member.php?username=$text] 


regexp {Join Date: <strong>(.*?)</strong>} $data - join

putserv "PRIVMSG $chan :$text joined on $join"
}
If I'm wrong let me know. I won't have a chance to test it out until I get out of class.

Thanks for your help though!
a
arcanedreams
Voice
Posts: 9
Joined: Thu Sep 06, 2007 2:48 am

Post by arcanedreams »

Oh, and if anyone wants..they can try this version as a test for me if they really really wanted to, which shouldn't require any type of login at all.

Code: Select all

bind pub - !shooshstats shooshstats

proc shooshstats {nick host handle chan text} {

set data [::http::geturl http://forums.shooshtime.com/] 


regexp {<div>Threads: (.*?),  Posts: (.*?), Members: (.*?)</div>} $data - threads posts members

putserv "PRIVMSG $chan :There are currently $threads total threads, $posts total posts, and $members total members."
}
r
r0t3n
Owner
Posts: 507
Joined: Tue May 31, 2005 6:56 pm
Location: UK

Post by r0t3n »

You should read the tcl http docs, you need to get the body/data of the website using http::data, you then need to clean it up using http::cleanup.


Code: Select all

bind pub - !joined joined

proc joined {nick host handle chan text} {
    set query "[http::formatQuery username $text]"
    set token [::http::geturl http://forums.shooshtime.com/member.php -query $query]
    set data [http::data $token]
    if {[http::status] == "error"} {
        putserv "PRIVMSG $chan :Error grabbing join data of $text."
    } else {
        set found "0"
        foreach line [split $data \n] {
            if {$line != "" && [regexp -nocase {Join Date: <strong>(.*?)</strong>} $line full date]} {
                putserv "PRIVMSG $chan :$text joined on $date."
                set found "1"
                break
            }
        }
        if {!$found} {
            putserv "PRIVMSG $chan :Could not find join date for $text."
        }
    }
    http::cleanup $token
}
Not tested, give it a try.
r0t3n @ #r0t3n @ Quakenet
a
arcanedreams
Voice
Posts: 9
Joined: Thu Sep 06, 2007 2:48 am

Post by arcanedreams »

What is all that extra stuff for? I realize the error checking thing..but a lot of that is all new to me.

This is my finished tested and working code I came up with:

Code: Select all

package require http

bind pub - !shooshstats shooshstats

proc shooshstats {nick host handle chan text} {

set query "http://forums.shooshtime.com"
set http [::http::geturl $query]
set html [::http::data $http]

regexp {<div>Threads: (.*?), Posts: (.*?), Members: (.*?)</div>} $html - threads posts members


putserv "PRIVMSG $chan :There are currently $threads total threads, $posts total posts, and $members total members."
}

Is there a reason I should do it the way you suggested over mine? Is it more stable or faster or what?

What happens if I dont use the cleanup function?
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Post by rosc2112 »

You have 2 threads with the same topic going, so pick one and merge them =p

If you don't clean up your open sockets, eventually you'll have lots and lots of open sockets and it'll probably slow the system down and may even eventually be unable to open more, I assume most os's do impose a certain finite limit on the number of open sockets they can handle (like open files.)

The other benefits in that type of code is handling http errors so you know what's going on, and don't think the bot is not working, It's just good practice to test for error conditions and handle them gracefully.

And yes the man pages will be very useful if you're taking bits and pieces of code from here and there.. Understand what the code does, then you'll be writing your own code without looking at docs in a very short time (regexp, and the http & socket code are a good challenge to start with, everything after that is easy ;)
a
arcanedreams
Voice
Posts: 9
Joined: Thu Sep 06, 2007 2:48 am

Post by arcanedreams »

I put SOLVED in the title of this one..as this thread was made for problems with the actual regexp function.

My new thread is devoted to being able to login and redirect without errors.

;)

Oh, and I have added the ::http::cleanup thing..but I am not sure if it is working due to the $http variable that keeps increasing by 1 everytime I make a call.

But that can be discussed in my other thread.
Post Reply