This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Parse url from web content

Help for those learning Tcl or writing their own scripts.
Post Reply
E
Elfriede
Halfop
Posts: 67
Joined: Tue Aug 07, 2007 4:21 am

Parse url from web content

Post by Elfriede »

Hopefully someone can tell me whats wrong on that. Im going to parse a url out of a webpage, but all i got is Data: many times ^^ Ive searched alot on this Forum, but im not getting the point, how to parse :/ I just wanna output the first matching url.

Code: Select all

bind pub - !geturl geturl:proc
proc geturl:proc {nick host handle channel text} {
	set url [lindex $text 0]
	set token [::http::geturl $url]
	set content [::http::data $token]
	::http::cleanup $content
	foreach line [split $content \n] {
		if {[regexp -nocase {http(.*?)} $content match url]} {
			sendmsg #test "Data: [join $url]"
		}
	}
}
n
nml375
Revered One
Posts: 2860
Joined: Fri Aug 04, 2006 2:09 pm

Post by nml375 »

Try using the greedy quantifier * instead of the non-greedy *?
Also, the output is most likely not a list, so don't use join. Similarly, $text is a string, not a list, so use split before attempting to use lindex:
Next, use $line, not $content in your regular expression, otherwize the foreach loop would be pretty pointless...

Code: Select all

bind pub - !geturl geturl:proc
proc geturl:proc {nick host handle channel text} {
   set url [lindex [split $text] 0]
   set token [::http::geturl $url]
   set content [::http::data $token]
   ::http::cleanup $content
   foreach line [split $content \n] {
      if {[regexp -nocase {http(.*)} $line match url]} {
         sendmsg #test "Data: $url"
      }
   }
} 
NML_375
E
Elfriede
Halfop
Posts: 67
Joined: Tue Aug 07, 2007 4:21 am

Post by Elfriede »

Many thanks for ur answer, but the output looks atm like:

Data: ://imdb.de/title/... Û

The http is cutted and theres a space after the url, where the output should end - can u please add that ? :)

PS: how to stop eg on first match ? ^^
n
nml375
Revered One
Posts: 2860
Joined: Fri Aug 04, 2006 2:09 pm

Post by nml375 »

If you want the full line matching the url, please use $match instead of $url in your sendmsg command.

To stop further processing within the foreach-loop, use the break command just after the sendmsg command.
NML_375
E
Elfriede
Halfop
Posts: 67
Joined: Tue Aug 07, 2007 4:21 am

Post by Elfriede »

Many thanks!!! Now its working, like ive wanted it :)
Post Reply