handling regular expression

Help for those learning Tcl or writing their own scripts.
Post Reply
r
rix
Halfop
Posts: 42
Joined: Wed Sep 21, 2005 1:04 pm
Location: Estonia

handling regular expression

Post by rix »

Hey!

Im new to this, so I try to make myself clear as possible. I need to take url and text from this link and show them in the channel.

I tryed something like this:

Code: Select all

set url "http://www.starpump.ee/linkme.php"

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
  putlog "lasttopics.tcl has not been loaded as a result."
} else {
  proc your_callbackproc {sock} {
    global url
    set headers [egghttp:headers $sock]
    set body [egghttp:data $sock]

    foreach line [split $body \n] {
      regexp -all {<a href=".*?">(.*?)</a>} $line text
    }

    putserv "PRIVMSG $chan : $text
  }

  bind pub - !lasttopics our:dcctrigger
  proc our:dcctrigger {hand idx text} {
    global url 
    set sock [egghttp:geturl $url your_callbackproc]
    set chan #star
    return 1
  }  

  putlog "egghttp_example.tcl has been successfully loaded."
}
It gives error: Tcl error [our:dcctrigger]: called "our:dcctrigger" with too many arguments. I thought I can even use 5 arguments in the brackets.

Anyway, there's enough tutorials saying what to do. Yes, that's good. The one major problem is that i can't find any instructions how to turn it to actually working script.
User avatar
avilon
Halfop
Posts: 64
Joined: Tue Jul 13, 2004 6:58 am
Location: Germany

Post by avilon »

User avatar
De Kus
Revered One
Posts: 1361
Joined: Sun Dec 15, 2002 11:41 am
Location: Germany

Post by De Kus »

Read TCLCommands.doc, section "BIND" especially for DCC and PUB. Its not up to, it must match exactly (well, actually there are workarounds, but they should be only used if dynamic is requried).
Tip: you will probably NOT want to output unfiltered regexp -all output.
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
r
rix
Halfop
Posts: 42
Joined: Wed Sep 21, 2005 1:04 pm
Location: Estonia

Post by rix »

Okay, I did a little research again and got it working. Just one thing more. It shows only one (first) result. How do i loop this thingie?

Code: Select all

set url "http://www.starpump.ee/linkme.php"
set trigger "!latestposts"
set channel "#star"

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
  putlog "latestposts.tcl has not been loaded as a result."
} else {
  proc setup {sock} {

    global url channel

    set headers [egghttp:headers $sock]
    set body [egghttp:data $sock]

   foreach line [split $body \n] {
    regexp {<a href=".*?"target="_blank">(.*?)</a>} $body - title
    regexp {<a href="(.*?)"target="_blank">} $body - link
   }
    puthelp "PRIVMSG $channel :$title - $link "
  }

  bind pub -|* $trigger action
  proc action {nick host hand chan text} {
    global rssfeed
    set sock [egghttp:geturl $url setup]
    return 1
  } 

  putlog "latestposts.tcl has been successfully loaded."
}
Thanks!
User avatar
De Kus
Revered One
Posts: 1361
Joined: Sun Dec 15, 2002 11:41 am
Location: Germany

Post by De Kus »

include 'puthelp "PRIVMSG $channel :$title - $link "' in the foreach loop.

btw you can merge the 2 regexp into 1:
regexp {<a href="([^"]+)" target="_blank">([^<]+)</a>} $body {} link title

Note: I assumed the missing space in the expression was an error. And I usually prefer something like [^<] to . because regexp tries to find the widest match, but you dont want the match to pass the closing </ tags or " braces, so matching every character but that ones are usually more predictable :). I hope it doesnt make a too big slowdown, haven't comapred it yet to be honest :D.
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
r
rix
Halfop
Posts: 42
Joined: Wed Sep 21, 2005 1:04 pm
Location: Estonia

Post by rix »

The speed is not important at the moment. By the way, I merged 2regexp into 1 and included puthelp right after regexp in the loop, but it keeps repeating one result. :s

[20:01:10] <rix> !latestposts
[20:01:13] <StarBot> Q: DVD subtiitrid - http://www.starpump.ee/viewthread.php?tid=29612
[20:01:15] <StarBot> Q: DVD subtiitrid - http://www.starpump.ee/viewthread.php?tid=29612
[20:01:17] <StarBot> Q: DVD subtiitrid - http://www.starpump.ee/viewthread.php?tid=29612
...and so on

How do I avoid that?
E
Ehlanna
Voice
Posts: 15
Joined: Thu Jul 21, 2005 12:08 pm

Post by Ehlanna »

At a rough guess, and speaking as a self-taught , beginner tcl coder, I would say that you want to swap $body to $line in the regexp command as $line is the variable that changes with each iteration of the loop.
User avatar
De Kus
Revered One
Posts: 1361
Joined: Sun Dec 15, 2002 11:41 am
Location: Germany

Post by De Kus »

Ehlanna wrote:I would say that you want to swap $body to $line in the regexp command as $line is the variable that changes with each iteration of the loop.
lol, beat why I didnt notice that myself, was probably too faithfully pasting together :D.
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
E
Ehlanna
Voice
Posts: 15
Joined: Thu Jul 21, 2005 12:08 pm

Post by Ehlanna »

probably too faithfully pasting
Wood/trre syndrome, far too easy to suffer from!
r
rix
Halfop
Posts: 42
Joined: Wed Sep 21, 2005 1:04 pm
Location: Estonia

Post by rix »

Big thanks, guys!

I hope I'm not too arrogant when asking another thing. :roll:

I have another script based on same structure, but it doesn't recognize the "?" in the url. It's important, as channel= will give appropriate feed. It's weird, cause other scripts are doing well. :S

Code: Select all

set feed "http://www.w3.ee/export/tv.php?channel=2"
set trigger "!kanal2"
set channel "#star"

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
  putlog "tv2.tcl has not been loaded as a result."
} else {
  proc callback{sock} {

    global feed channel

    set headers [egghttp:headers $sock]
    set body [egghttp:data $sock]

   foreach line [split $body \n] {
    regexp {(.*?)<b>} $body - time
    regexp {<b>(.*?)</b>} $body - title
    regexp {</b>.*?<br>(.*?)<br>} $body - desc
   }
   puthelp "PRIVMSG $channel :(Algus: $time) \002$title\002"
   puthelp "PRIVMSG $channel :$desc"
  }
  bind pub -|* $trigger top:trigger
  proc top:trigger {nick host hand chan text} {
    global feed
    set sock [egghttp:geturl $rssfeed callback]
    return 1
  }


  putlog "tv2.tcl has been successfully loaded."
}
It takes information from w3.ee/export/tv.php though it should take it from w3.ee/export/tv.php?channel=2.
E
Ehlanna
Voice
Posts: 15
Joined: Thu Jul 21, 2005 12:08 pm

Post by Ehlanna »

My best guess would be that it was being treated as a special character, and you would thus need to quote it ... \?
r
rix
Halfop
Posts: 42
Joined: Wed Sep 21, 2005 1:04 pm
Location: Estonia

Post by rix »

Im still messing around with these files. :(

First script still gives only one result. I only changed the regexp command. Somehow the 2in1 regexp doesn't work anymore. Just no result at all. If I add puthelp command in loop, it doesn't show any result either.

Code: Select all

set url "http://www.starpump.ee/linkme.php"
set trigger "!latestposts"
set channel "#star"

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
  putlog "latestposts.tcl has not been loaded as a result."
} else {
  proc setup {sock} {

    global url channel

    set headers [egghttp:headers $sock]
    set body [egghttp:data $sock]

   foreach line [split $body \n]  {
    regexp {<a href=".*?"target="_blank">(.*?)</a>} $line - title
    regexp {<a href="(.*?)"target="_blank">} $line - link
   }
    puthelp "PRIVMSG $channel :$title - $link "
  }

  bind pub -|* $trigger action
  proc action {nick host hand chan text} {
    global url
    set sock [egghttp:geturl $url setup]
    return 1
  }

  putlog "latestposts.tcl has been successfully loaded."
} 
If puthelp command is in the loop also, it doesn't reply anything.
User avatar
De Kus
Revered One
Posts: 1361
Joined: Sun Dec 15, 2002 11:41 am
Location: Germany

Post by De Kus »

all characters under REGULAR EXPRESSION SYNTAX on http://www.tcl.tk/man/tcl8.4/TclCmd/re_syntax.htm must be escaped. Reading the manual usually enligthens :).
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
Post Reply