Twitter tweet displayer

fewyn · Post by **fewyn** » Mon Aug 29, 2011 4:51 pm

I've seen bots that have done this in the past but for the life of me I can't find a script anywhere that does something like this.

Basically when someone pastes a a twitter url into chat it goes and fetches the tweet and spits it back out into IRC.

Example:
<fewyn> https://twitter.com/#!/wilw/statuses/108277782815051776
<Bot> Our caterers put out franks and beans to go with lunch, labeled as "Tubesteak Chowder." My inner 12 year-old is still laughing. #Eureka (wilw, 9m ago)

Any help?

speechles · Post by **speechles** » Mon Aug 29, 2011 5:19 pm

Have you seen Birdy?

While it doesn't yet allow referencing an exact tweet via ID, such as what you are doing. It allows for much more "interaction" than just spewing what someone on twitter tweets. With Birdy you can also, reply to them via !tweet. !retweet them. Etc Etc

Maybe this is more than what you wanted. Basically any url-title fetching type script with minor changes to it's parsing template can do what you are asking. As basically, you are feeding it URL's. Pretty dead simple. Instead give the users of your IRC channel something more to play with. Their own twitter account for the channel to use. This is what Birdy allows you to do.

Anyways, didn't know if you've already seen Birdy or not. It shall be updated soon to fix the shortcomings and errors it presently has. It is a constant work-in-progress and users dictate its evolution. So if you have a feature or something to include within Birdy. Shout it out.

fewyn · Post by **fewyn** » Mon Aug 29, 2011 5:28 pm

speechles wrote:Have you seen Birdy?

While it doesn't yet allow referencing an exact tweet via ID, such as what you are doing. It allows for much more "interaction" than just spewing what someone on twitter tweets. With Birdy you can also, reply to them via !tweet. !retweet them. Etc Etc

Maybe this is more than what you wanted. Basically any url-title fetching type script with minor changes to it's parsing template can do what you are asking. As basically, you are feeding it URL's. Pretty dead simple. Instead give the users of your IRC channel something more to play with. Their own twitter account for the channel to use. This is what Birdy allows you to do.

Anyways, didn't know if you've already seen Birdy or not. It shall be updated soon to fix the shortcomings and errors it presently has. It is a constant work-in-progress and users dictate its evolution. So if you have a feature or something to include within Birdy. Shout it out.

Yeah I saw Birdy but it's a bit more than I want actually. Just want to keep it simple.

speechles · Post by **speechles** » Mon Aug 29, 2011 7:40 pm

fewyn wrote:
Yeah I saw Birdy but it's a bit more than I want actually. Just want to keep it simple.

Problem is, with that /#!/ in the url. It is going to use twitter's api. This uses json xml payloads. You won't be able to go about it as easily as you are thinking reading those. Because they are handled under oauth.

Instead... What you can do is use their non-api method, which still exists.

<speechles> !webby twitter.com/wilw/statuses/108277782815051776 --regexp class="entry-content">(.*?)</span>.*?<span class="published timestamp".*?>(.*?)</a></span>--
<sp33chy> regexp: capture1 ( Our caterers put out franks and beans to go with lunch, labeled as "Tubesteak Chowder." My inner 12 year-old is still laughing. #Eureka )
<sp33chy> regexp: capture2 ( about 2 hours ago via Twitter for iPad )

Webby proves that can be done.. So here's what I'm giving you.

<speechles> http://twitter.com/wilw/statuses/108277782815051776
<sp33chy> wilw (Wil Wheaton): Our caterers put out franks and beans to go with lunch, labeled as "Tubesteak Chowder." My inner 12 year-old is still laughing. #Eureka ( http://twitter.com/search?q=%23Eureka ) ( about 3 hours ago via Twitter for iPad )

Code: Select all

# Script to grab titles from webpages - Copyright C.Leonhardt (rosc2112 at yahoo com) Aug.11.2007 
# http://members.dandy.net/~fbn/urltitle.tcl.txt
# Loosely based on the tinyurl script by Jer and other bits and pieces of my own..
# Hacked by speechles, to add special things for twitter! :P

################################################################################################################

# Usage: 

# 1) Set the configs below
# 2) .chanset #channelname +urltitle        ;# enable script
# 3) .chanset #channelname +logurltitle     ;# enable logging
# Then just input a url in channel and the script will retrieve the title from the corresponding page.

# When reporting bugs, PLEASE include the .set errorInfo debug info! 
# Read here: http://forum.egghelp.org/viewtopic.php?t=10215

################################################################################################################

# Configs:

set urltitle(ignore) "bdkqr|dkqr" 	;# User flags script will ignore input from
set urltitle(pubmflags) "-|-" 		;# user flags required for channel eggdrop use
set urltitle(length) 5	 		;# minimum url length to trigger channel eggdrop use
set urltitle(delay) 1 			;# minimum seconds to wait before another eggdrop use
set urltitle(timeout) 60000 		;# geturl timeout (1/1000ths of a second)

################################################################################################################
# Script begins:

package require http			;# You need the http package..
set urltitle(last) 111 			;# Internal variable, stores time of last eggdrop use, don't change..
setudef flag urltitle			;# Channel flag to enable script.
setudef flag logurltitle		;# Channel flag to enable logging of script.

set urltitlever "0.01a"
bind pubm $urltitle(pubmflags) {*://*} pubm:urltitle
proc pubm:urltitle {nick host user chan text} {
	global urltitle
	if {([channel get $chan urltitle]) && ([expr [unixtime] - $urltitle(delay)] > $urltitle(last)) && \
	(![matchattr $user $urltitle(ignore)])} {
		foreach word [split $text] {
			if {[string length $word] >= $urltitle(length) && \
			[regexp {^(f|ht)tp(s|)://} $word] && \
			![regexp {://([^/:]*:([^/]*@|\d+(/|$))|.*/\.)} $word]} {
				set urltitle(last) [unixtime]
				set urtitle [urltitle $word]
				if {[llength $urtitle] < 2} {
					puthelp "PRIVMSG $chan :$nick: URL Title for $word - \002[join [url_map $urtitle]]\002"
				} else {
					puthelp "PRIVMSG $chan :[url_map [url_map [lindex $urtitle 0]]]"
				}
				break
			}
		}
        }
	if {[channel get $chan logurltitle]} {
		foreach word [split $text] {
			if {[string match "*://*" $word]} {
				putlog "<$nick:$chan> $word -> $urtitle"
			}
		}
	}
	# change to return 0 if you want the pubm trigger logged additionally..
	return 1
}

proc urltitle {url} {
	if {[info exists url] && [string length $url]} {
		catch {set http [::http::geturl $url -timeout $::urltitle(timeout)]} error
		if {[string match -nocase "*couldn't open socket*" $error]} {
			return "Error: couldn't connect..Try again later"
		}
		if { [::http::status $http] == "timeout" } {
			return "Error: connection timed out while trying to contact $url"
		}
		set data [::http::data $http]
		regsub -all {(?:\n|\t|\v|\r|\x01)} $data " " data
		set ncode [http::ncode $http]
		::http::cleanup $http
		set title ""
		if {[string match *twitter.com* $url]} {
			while {[string match 30* $ncode]} {
				regexp -nocase -- {<a href="(.*?)">} $data - url
				regexp -all -- {/#!} $url "" url
				catch {set http [::http::geturl $url -timeout $::urltitle(timeout)]} error
				if {[string match -nocase "*couldn't open socket*" $error]} {
					return "Error: couldn't connect..Try again later"
				}
				if { [::http::status $http] == "timeout" } {
					return "Error: connection timed out while trying to contact $url"
				}
				set data [::http::data $http]
				set ncode [::http::ncode $http]
				http::cleanup $http
			}
			regsub -all {(?:\n|\t|\v|\r|\x01)} $data " " data
			if {[regexp -nocase {class="entry-content">(.*?)</span>.*?<span class="published timestamp" data=".*?">(.*?)</a></span>} $data match ltweet ago]} {
				# scrub html elements out
				regsub -all {<.*?>} $ago "" ago
				regsub -all -nocase -- {<a href="(/search\?q\=.*?)".*?>(.*?)</a>} [string trim $ltweet] "\\2 ( http://twitter.com\\1 \)" ltweet
				regsub -all -nocase -- {<a href="(.*?)".*?>(.*?)</a>} $ltweet "( \\1 \)" ltweet
				regsub -all -nocase -- {(?!<@)<a class="tweet.*?href="(.*?)".*?>(.*?)</a>(?!>)} $ltweet "\\2 \( _\?\=\\1 \)" ltweet
				regsub -all -nocase -- {<\+@<a class="tweet.*?href=".*?".*?>(.*?)</a>>} $ltweet "<+@\\1>" ltweet
				regsub -all -nocase -- {<\@<a class="tweet.*?href=".*?".*?>(.*?)</a>>} $ltweet "<@\\1>" ltweet
				regsub -all -nocase -- {_\?\=(.*?) } $ltweet "http://twitter.com\\1 " ltweet
				regexp -nocase {class="tweet-url screen-name" hreflang="[a-z]{2}" title="(.*?)">(.*?)</a>} $data match real screen
				return [list "\002$screen\002 ($real): $ltweet ( $ago )" "twitter"]
			}
		} elseif {[regexp -nocase {<title>(.*?)</title>} $data match title]} {
			return [list "[string map { {href=} "" " "" } $title]"]
		} else {
			return [list "No title found."]
		}
	}
}

proc url_map {text {char "utf-8"} } {
	# code below is neccessary to prevent numerous html markups
	# from appearing in the output (ie, ", ᘧ, etc)
	# stolen (borrowed is a better term) from tcllib's htmlparse ;)
	# works unpatched utf-8 or not, unlike htmlparse::mapEscapes
	# which will only work properly patched....
	set escapes {
		  \xa0 ¡ \xa1 ¢ \xa2 £ \xa3 ¤ \xa4
		¥ \xa5 ¦ \xa6 § \xa7 ¨ \xa8 © \xa9
		ª \xaa « \xab ¬ \xac  \xad ® \xae
		¯ \xaf ° \xb0 ± \xb1 ² \xb2 ³ \xb3
		´ \xb4 µ \xb5 ¶ \xb6 · \xb7 ¸ \xb8
		¹ \xb9 º \xba » \xbb ¼ \xbc ½ \xbd
		¾ \xbe ¿ \xbf À \xc0 Á \xc1 Â \xc2
		Ã \xc3 Ä \xc4 Å \xc5 Æ \xc6 Ç \xc7
		È \xc8 É \xc9 Ê \xca Ë \xcb Ì \xcc
		Í \xcd Î \xce Ï \xcf Ð \xd0 Ñ \xd1
		Ò \xd2 Ó \xd3 Ô \xd4 Õ \xd5 Ö \xd6
		× \xd7 Ø \xd8 Ù \xd9 Ú \xda Û \xdb
		Ü \xdc Ý \xdd Þ \xde ß \xdf à \xe0
		á \xe1 â \xe2 ã \xe3 ä \xe4 å \xe5
		æ \xe6 ç \xe7 è \xe8 é \xe9 ê \xea
		ë \xeb ì \xec í \xed î \xee ï \xef
		ð \xf0 ñ \xf1 ò \xf2 ó \xf3 ô \xf4
		õ \xf5 ö \xf6 ÷ \xf7 ø \xf8 ù \xf9
		ú \xfa û \xfb ü \xfc ý \xfd þ \xfe
		ÿ \xff ƒ \u192 Α \u391 Β \u392 Γ \u393 Δ \u394
		Ε \u395 Ζ \u396 Η \u397 Θ \u398 Ι \u399
		Κ \u39A Λ \u39B Μ \u39C Ν \u39D Ξ \u39E
		Ο \u39F Π \u3A0 Ρ \u3A1 Σ \u3A3 Τ \u3A4
		Υ \u3A5 Φ \u3A6 Χ \u3A7 Ψ \u3A8 Ω \u3A9
		α \u3B1 β \u3B2 γ \u3B3 δ \u3B4 ε \u3B5
		ζ \u3B6 η \u3B7 θ \u3B8 ι \u3B9 κ \u3BA
		λ \u3BB μ \u3BC ν \u3BD ξ \u3BE ο \u3BF
		π \u3C0 ρ \u3C1 ς \u3C2 σ \u3C3 τ \u3C4
		υ \u3C5 φ \u3C6 χ \u3C7 ψ \u3C8 ω \u3C9
		ϑ \u3D1 ϒ \u3D2 ϖ \u3D6 • \u2022
		… \u2026 ′ \u2032 ″ \u2033 ‾ \u203E
		⁄ \u2044 ℘ \u2118 ℑ \u2111 ℜ \u211C
		™ \u2122 ℵ \u2135 ← \u2190 ↑ \u2191
		→ \u2192 ↓ \u2193 ↔ \u2194 ↵ \u21B5
		⇐ \u21D0 ⇑ \u21D1 ⇒ \u21D2 ⇓ \u21D3 ⇔ \u21D4
		∀ \u2200 ∂ \u2202 ∃ \u2203 ∅ \u2205
		∇ \u2207 ∈ \u2208 ∉ \u2209 ∋ \u220B ∏ \u220F
		∑ \u2211 − \u2212 ∗ \u2217 √ \u221A
		∝ \u221D ∞ \u221E ∠ \u2220 ∧ \u2227 ∨ \u2228
		∩ \u2229 ∪ \u222A ∫ \u222B ∴ \u2234 ∼ \u223C
		≅ \u2245 ≈ \u2248 ≠ \u2260 ≡ \u2261 ≤ \u2264
		≥ \u2265 ⊂ \u2282 ⊃ \u2283 ⊄ \u2284 ⊆ \u2286
		⊇ \u2287 ⊕ \u2295 ⊗ \u2297 ⊥ \u22A5
		⋅ \u22C5 ⌈ \u2308 ⌉ \u2309 ⌊ \u230A
		⌋ \u230B 〈 \u2329 〉 \u232A ◊ \u25CA
		♠ \u2660 ♣ \u2663 ♥ \u2665 ♦ \u2666
		" \x22 & \x26 < \x3C > \x3E O&Elig; \u152 œ \u153
		Š \u160 š \u161 Ÿ \u178 ˆ \u2C6
		˜ \u2DC   \u2002   \u2003   \u2009
		‌ \u200C ‍ \u200D ‎ \u200E ‏ \u200F – \u2013
		— \u2014 ‘ \u2018 ’ \u2019 ‚ \u201A
		“ \u201C ” \u201D „ \u201E † \u2020
		‡ \u2021 ‰ \u2030 ‹ \u2039 › \u203A
		€ \u20AC &apos; \u0027
	};
	if {![string equal $char [encoding system]]} { set text [encoding convertfrom $char $text] }
	set text [string map [list "\]" "\\\]" "\[" "\\\[" "\$" "\\\$" """ "\\""] [string map $escapes $text]]
	regsub -all -- {&#([[:digit:]]{1,5});} $text {[format %c [string trimleft "\1" "0"]]} text
	regsub -all -- {&#x([[:xdigit:]]{1,4});} $text {[format %c [scan "\1" %x]]} text
	catch { set text "[subst "$text"]" }
	if {![string equal $char [encoding system]]} { set text [encoding convertto $char $text] }
	return "$text"
}


putlog "Url Title Grabber $urltitlever (rosc) script loaded.. (super action rocket missles by speechles :P)"

Basically, yeah, it's rosc2112's good old url-title grabber. Hacked together with my url decoder thingy from my other scripts. So this script will also, show the title of url's pasted. But when it's a twitter one it sees, it will reply differently. I hacked in small modification that is extensible. This is the direction the script Webby I wrote is headed in. It will have custom parsing templates and all sorts of crazy stuff. But this may take time, months, years.. who knows.. But I do know how to accomplish it, all I need is time, and motivation... haw