fewyn wrote:
Yeah I saw Birdy but it's a bit more than I want actually. Just want to keep it simple.
Problem is, with that /#!/ in the url. It is going to use twitter's api. This uses json xml payloads. You won't be able to go about it as easily as you are thinking reading those. Because they are handled under oauth.
Instead... What you can do is use their non-api method, which still exists.
<speechles> !webby twitter.com/wilw/statuses/108277782815051776 --regexp class="entry-content">(.*?)</span>.*?<span class="published timestamp".*?>(.*?)</a></span>--
<sp33chy> regexp: capture1 ( Our caterers put out franks and beans to go with lunch, labeled as "Tubesteak Chowder." My inner 12 year-old is still laughing. #Eureka )
<sp33chy> regexp: capture2 ( about 2 hours ago via Twitter for iPad )
Webby proves that can be done.. So here's what I'm giving you.
Code: Select all
# Script to grab titles from webpages - Copyright C.Leonhardt (rosc2112 at yahoo com) Aug.11.2007
# http://members.dandy.net/~fbn/urltitle.tcl.txt
# Loosely based on the tinyurl script by Jer and other bits and pieces of my own..
# Hacked by speechles, to add special things for twitter! :P
################################################################################################################
# Usage:
# 1) Set the configs below
# 2) .chanset #channelname +urltitle ;# enable script
# 3) .chanset #channelname +logurltitle ;# enable logging
# Then just input a url in channel and the script will retrieve the title from the corresponding page.
# When reporting bugs, PLEASE include the .set errorInfo debug info!
# Read here: http://forum.egghelp.org/viewtopic.php?t=10215
################################################################################################################
# Configs:
set urltitle(ignore) "bdkqr|dkqr" ;# User flags script will ignore input from
set urltitle(pubmflags) "-|-" ;# user flags required for channel eggdrop use
set urltitle(length) 5 ;# minimum url length to trigger channel eggdrop use
set urltitle(delay) 1 ;# minimum seconds to wait before another eggdrop use
set urltitle(timeout) 60000 ;# geturl timeout (1/1000ths of a second)
################################################################################################################
# Script begins:
package require http ;# You need the http package..
set urltitle(last) 111 ;# Internal variable, stores time of last eggdrop use, don't change..
setudef flag urltitle ;# Channel flag to enable script.
setudef flag logurltitle ;# Channel flag to enable logging of script.
set urltitlever "0.01a"
bind pubm $urltitle(pubmflags) {*://*} pubm:urltitle
proc pubm:urltitle {nick host user chan text} {
global urltitle
if {([channel get $chan urltitle]) && ([expr [unixtime] - $urltitle(delay)] > $urltitle(last)) && \
(![matchattr $user $urltitle(ignore)])} {
foreach word [split $text] {
if {[string length $word] >= $urltitle(length) && \
[regexp {^(f|ht)tp(s|)://} $word] && \
![regexp {://([^/:]*:([^/]*@|\d+(/|$))|.*/\.)} $word]} {
set urltitle(last) [unixtime]
set urtitle [urltitle $word]
if {[llength $urtitle] < 2} {
puthelp "PRIVMSG $chan :$nick: URL Title for $word - \002[join [url_map $urtitle]]\002"
} else {
puthelp "PRIVMSG $chan :[url_map [url_map [lindex $urtitle 0]]]"
}
break
}
}
}
if {[channel get $chan logurltitle]} {
foreach word [split $text] {
if {[string match "*://*" $word]} {
putlog "<$nick:$chan> $word -> $urtitle"
}
}
}
# change to return 0 if you want the pubm trigger logged additionally..
return 1
}
proc urltitle {url} {
if {[info exists url] && [string length $url]} {
catch {set http [::http::geturl $url -timeout $::urltitle(timeout)]} error
if {[string match -nocase "*couldn't open socket*" $error]} {
return "Error: couldn't connect..Try again later"
}
if { [::http::status $http] == "timeout" } {
return "Error: connection timed out while trying to contact $url"
}
set data [::http::data $http]
regsub -all {(?:\n|\t|\v|\r|\x01)} $data " " data
set ncode [http::ncode $http]
::http::cleanup $http
set title ""
if {[string match *twitter.com* $url]} {
while {[string match 30* $ncode]} {
regexp -nocase -- {<a href="(.*?)">} $data - url
regexp -all -- {/#!} $url "" url
catch {set http [::http::geturl $url -timeout $::urltitle(timeout)]} error
if {[string match -nocase "*couldn't open socket*" $error]} {
return "Error: couldn't connect..Try again later"
}
if { [::http::status $http] == "timeout" } {
return "Error: connection timed out while trying to contact $url"
}
set data [::http::data $http]
set ncode [::http::ncode $http]
http::cleanup $http
}
regsub -all {(?:\n|\t|\v|\r|\x01)} $data " " data
if {[regexp -nocase {class="entry-content">(.*?)</span>.*?<span class="published timestamp" data=".*?">(.*?)</a></span>} $data match ltweet ago]} {
# scrub html elements out
regsub -all {<.*?>} $ago "" ago
regsub -all -nocase -- {<a href="(/search\?q\=.*?)".*?>(.*?)</a>} [string trim $ltweet] "\\2 ( http://twitter.com\\1 \)" ltweet
regsub -all -nocase -- {<a href="(.*?)".*?>(.*?)</a>} $ltweet "( \\1 \)" ltweet
regsub -all -nocase -- {(?!<@)<a class="tweet.*?href="(.*?)".*?>(.*?)</a>(?!>)} $ltweet "\\2 \( _\?\=\\1 \)" ltweet
regsub -all -nocase -- {<\+@<a class="tweet.*?href=".*?".*?>(.*?)</a>>} $ltweet "<+@\\1>" ltweet
regsub -all -nocase -- {<\@<a class="tweet.*?href=".*?".*?>(.*?)</a>>} $ltweet "<@\\1>" ltweet
regsub -all -nocase -- {_\?\=(.*?) } $ltweet "http://twitter.com\\1 " ltweet
regexp -nocase {class="tweet-url screen-name" hreflang="[a-z]{2}" title="(.*?)">(.*?)</a>} $data match real screen
return [list "\002$screen\002 ($real): $ltweet ( $ago )" "twitter"]
}
} elseif {[regexp -nocase {<title>(.*?)</title>} $data match title]} {
return [list "[string map { {href=} "" " "" } $title]"]
} else {
return [list "No title found."]
}
}
}
proc url_map {text {char "utf-8"} } {
# code below is neccessary to prevent numerous html markups
# from appearing in the output (ie, ", ᘧ, etc)
# stolen (borrowed is a better term) from tcllib's htmlparse ;)
# works unpatched utf-8 or not, unlike htmlparse::mapEscapes
# which will only work properly patched....
set escapes {
\xa0 ¡ \xa1 ¢ \xa2 £ \xa3 ¤ \xa4
¥ \xa5 ¦ \xa6 § \xa7 ¨ \xa8 © \xa9
ª \xaa « \xab ¬ \xac \xad ® \xae
¯ \xaf ° \xb0 ± \xb1 ² \xb2 ³ \xb3
´ \xb4 µ \xb5 ¶ \xb6 · \xb7 ¸ \xb8
¹ \xb9 º \xba » \xbb ¼ \xbc ½ \xbd
¾ \xbe ¿ \xbf À \xc0 Á \xc1 Â \xc2
à \xc3 Ä \xc4 Å \xc5 Æ \xc6 Ç \xc7
È \xc8 É \xc9 Ê \xca Ë \xcb Ì \xcc
Í \xcd Î \xce Ï \xcf Ð \xd0 Ñ \xd1
Ò \xd2 Ó \xd3 Ô \xd4 Õ \xd5 Ö \xd6
× \xd7 Ø \xd8 Ù \xd9 Ú \xda Û \xdb
Ü \xdc Ý \xdd Þ \xde ß \xdf à \xe0
á \xe1 â \xe2 ã \xe3 ä \xe4 å \xe5
æ \xe6 ç \xe7 è \xe8 é \xe9 ê \xea
ë \xeb ì \xec í \xed î \xee ï \xef
ð \xf0 ñ \xf1 ò \xf2 ó \xf3 ô \xf4
õ \xf5 ö \xf6 ÷ \xf7 ø \xf8 ù \xf9
ú \xfa û \xfb ü \xfc ý \xfd þ \xfe
ÿ \xff ƒ \u192 Α \u391 Β \u392 Γ \u393 Δ \u394
Ε \u395 Ζ \u396 Η \u397 Θ \u398 Ι \u399
Κ \u39A Λ \u39B Μ \u39C Ν \u39D Ξ \u39E
Ο \u39F Π \u3A0 Ρ \u3A1 Σ \u3A3 Τ \u3A4
Υ \u3A5 Φ \u3A6 Χ \u3A7 Ψ \u3A8 Ω \u3A9
α \u3B1 β \u3B2 γ \u3B3 δ \u3B4 ε \u3B5
ζ \u3B6 η \u3B7 θ \u3B8 ι \u3B9 κ \u3BA
λ \u3BB μ \u3BC ν \u3BD ξ \u3BE ο \u3BF
π \u3C0 ρ \u3C1 ς \u3C2 σ \u3C3 τ \u3C4
υ \u3C5 φ \u3C6 χ \u3C7 ψ \u3C8 ω \u3C9
ϑ \u3D1 ϒ \u3D2 ϖ \u3D6 • \u2022
… \u2026 ′ \u2032 ″ \u2033 ‾ \u203E
⁄ \u2044 ℘ \u2118 ℑ \u2111 ℜ \u211C
™ \u2122 ℵ \u2135 ← \u2190 ↑ \u2191
→ \u2192 ↓ \u2193 ↔ \u2194 ↵ \u21B5
⇐ \u21D0 ⇑ \u21D1 ⇒ \u21D2 ⇓ \u21D3 ⇔ \u21D4
∀ \u2200 ∂ \u2202 ∃ \u2203 ∅ \u2205
∇ \u2207 ∈ \u2208 ∉ \u2209 ∋ \u220B ∏ \u220F
∑ \u2211 − \u2212 ∗ \u2217 √ \u221A
∝ \u221D ∞ \u221E ∠ \u2220 ∧ \u2227 ∨ \u2228
∩ \u2229 ∪ \u222A ∫ \u222B ∴ \u2234 ∼ \u223C
≅ \u2245 ≈ \u2248 ≠ \u2260 ≡ \u2261 ≤ \u2264
≥ \u2265 ⊂ \u2282 ⊃ \u2283 ⊄ \u2284 ⊆ \u2286
⊇ \u2287 ⊕ \u2295 ⊗ \u2297 ⊥ \u22A5
⋅ \u22C5 ⌈ \u2308 ⌉ \u2309 ⌊ \u230A
⌋ \u230B 〈 \u2329 〉 \u232A ◊ \u25CA
♠ \u2660 ♣ \u2663 ♥ \u2665 ♦ \u2666
" \x22 & \x26 < \x3C > \x3E O&Elig; \u152 œ \u153
Š \u160 š \u161 Ÿ \u178 ˆ \u2C6
˜ \u2DC \u2002 \u2003 \u2009
\u200C \u200D \u200E \u200F – \u2013
— \u2014 ‘ \u2018 ’ \u2019 ‚ \u201A
“ \u201C ” \u201D „ \u201E † \u2020
‡ \u2021 ‰ \u2030 ‹ \u2039 › \u203A
€ \u20AC ' \u0027
};
if {![string equal $char [encoding system]]} { set text [encoding convertfrom $char $text] }
set text [string map [list "\]" "\\\]" "\[" "\\\[" "\$" "\\\$" """ "\\""] [string map $escapes $text]]
regsub -all -- {&#([[:digit:]]{1,5});} $text {[format %c [string trimleft "\1" "0"]]} text
regsub -all -- {&#x([[:xdigit:]]{1,4});} $text {[format %c [scan "\1" %x]]} text
catch { set text "[subst "$text"]" }
if {![string equal $char [encoding system]]} { set text [encoding convertto $char $text] }
return "$text"
}
putlog "Url Title Grabber $urltitlever (rosc) script loaded.. (super action rocket missles by speechles :P)"
Basically, yeah, it's rosc2112's good old url-title grabber. Hacked together with my url decoder thingy from my other scripts. So this script will also, show the title of url's pasted. But when it's a twitter one it sees, it will reply differently. I hacked in small modification that is extensible. This is the direction the script Webby I wrote is headed in. It will have custom parsing templates and all sorts of crazy stuff. But this may take time, months, years.. who knows.. But I do know how to accomplish it, all I need is time, and motivation... haw