This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Selection of news on a site

Requests for complete scripts or modifications/fixes for scripts you didn't write. Response not guaranteed, and no thread bumping!
Post Reply
n
neoclust
Halfop
Posts: 55
Joined: Fri Aug 14, 2009 11:03 am

Selection of news on a site

Post by neoclust »

Hello I want to publish news from this site http://www.casafree.com/modules/news/ and this one randomly in an interval of 30 minutes,if you can help me do thanks
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

This is why I wrote webby, it can easily illustrate this.
<speechles> !webby http://www.casafree.com/modules/news/ --regexp <div class="articletitre"><a href.*?>.*?<a href='(.*?)'>(.*?)</a>.*?<div class="itemBody">(.*?)</div>--
<sp33chy> regexp: capture1 ( http://www.casafree.com/modules/news/ar ... ryid=44181 )
<sp33chy> regexp: capture2 ( France : ouverture du 32e Festival International de films de femmes de Créteil )
<sp33chy> regexp: capture3 ( La 32e édition du Festival international de films de femmes de Créteil se déroule depuis vendredi à la Maison des Arts de Créteil, et prendra fin le 11 avril, rapporte un site web français spécialisé dans les affaires culturelles. )
Using this as a construct, this would be your news parser to show all the news.

Code: Select all

set casafreenews [regexp -inline {<div class="articletitre"><a href.*?>.*?<a href='(.*?)'>(.*?)</a>.*?<div class="itemBody">(.*?)</div>} $html]
foreach {junk url title description} $casafreenews {
   puthelp "privmsg $chan :\002$title\002: $description @ $url"
}
Hopefully you already understand how to use http package: http::config, http::geturl, http::data, and http::cleanup commands and the know how to turn this into a full script. ;)

Hint: the code to make it RSS style, and show only new news items would be like this:

Code: Select all

set casafreenews [regexp -inline {<div class="articletitre"><a href.*?>.*?<a href='(.*?)'>(.*?)</a>.*?<div class="itemBody">(.*?)</div>} $html]
foreach {junk url title description} $casafreenews {
   if {[info exists ::casaLast] && [string equal $url $::casaLast]} {
      break
   }
   puthelp "privmsg $chan :\002$title\002: $description @ $url"
}
set ::casaLast [lindex $casafreenews 1]
Have a fun. Happy egg gathering, don't drop any. Today is the one day an "egg drop" is a bad thing. :P
n
neoclust
Halfop
Posts: 55
Joined: Fri Aug 14, 2009 11:03 am

Post by neoclust »

Thanks
Last edited by neoclust on Sun Apr 04, 2010 5:37 pm, edited 2 times in total.
n
neoclust
Halfop
Posts: 55
Joined: Fri Aug 14, 2009 11:03 am

Post by neoclust »

when I tested the code the first time he looks good, but after when I retyped !news I receive no response, you can tell me why plz
[21:23:48] (me): !news
[21:23:53] (egg): France : ouverture du 32e Festival International de films de femmes de Créteil: La 32e édition du Festival international de films de femmes de Créteil se déroule depuis vendredi à la Maison des Arts de Créteil, et prendra fin le 11 avril, rapporte un site web français spécialisé dans les affaires culturelles. @ http://www.casafree.com/modules/news/ar ... ryid=44181

Code: Select all

bind pub - !news pub:news
proc pub:news {nick host hand chan arg} {
	set lien "http://www.casafree.com/modules/news/"
	set http [http::config -useragent mozilla]
	set http [http::geturl $lien -timeout [expr 1000 * 10]]
	set html [http::data $http]
	set casafreenews [regexp -inline {<div class="articletitre"><a href.*?>.*?<a href='(.*?)'>(.*?)</a>.*?<div class="itemBody">(.*?)</div>} $html]
	foreach {junk url title description} $casafreenews {
		if {[info exists ::casaLast] && [string equal $url $::casaLast]} {
			break
		}
               	puthelp "privmsg $chan :\002$title\002: $description @ $url"
	}
	set ::casaLast [lindex $casafreenews 1]
}
n
neoclust
Halfop
Posts: 55
Joined: Fri Aug 14, 2009 11:03 am

Post by neoclust »

i'm using this codebut I want to publish the article randomly to avoid
monotony
[23:55:20] (me): !news
[23:55:29] (egg): [USA - Russie : Moscou et Washington signeraient jeudi à Prague un nouveau traité de réduction de leurs arsenaux nucléaires]: Le président américain Barack Obama et son homologue russe Dimitri Medvedev devraient signer jeudi à Prague un nouveau traité de réduction de leurs arsenaux nucléaires. @ http://www.casafree.com/modules/news/ar ... ryid=44220
[23:55:54] (me): !news
[23:55:55] (egg): [ USA - Russie : Moscou et Washington signeraient jeudi à Prague un nouveau traité de réduction de leurs arsenaux nucléaires ]: Le président américain Barack Obama et son homologue russe Dimitri Medvedev devraient signer jeudi à Prague un nouveau traité de réduction de leurs arsenaux nucléaires. @ http://www.casafree.com/modules/news/ar ... ryid=44220

Code: Select all

namespace eval news {}
setudef flag nopubnews
set news(pref) "!"
set news(commands) "news"
set news(time) 30
set news(page) http://www.casafree.com/modules/news/ 
set news(version) "1.0"
package require http
foreach bind [split $news(commands) " "] {
	bind pub -|- $news(pref)$bind ::news::pub
	bind msg -|- $news(pref)$bind ::news::msg
}
proc ::news::msg {nick uhost hand arg} {
	::news::news $nick $uhost $hand $nick $arg
}

proc ::news::pub {nick uhost hand chan arg} {
	if {[channel get $chan nopubnews]} return
	::news::news $nick $uhost $hand $chan $arg
}
proc ::news::news {nick uhost hand chan arg} {
	global news lastbind
	set arg [lindex [split $arg] 0]
	set agent "Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1"
	set news(host,$uhost) 1
	set news(timer,$uhost) [utimer $news(time) [list ::news::reset $uhost ] ]
	set news_tok [::http::config -useragent $agent]
	set news_tok [::http::geturl $news(page) -timeout 30000]
	set html [::http::data $news_tok]
	::http::cleanup $news_tok
    set casafreenews [regexp -inline {<div class="articletitre"><a href.*?>.*?<a href='(.*?)'>(.*?)</a>.*?<div class="itemBody">(.*?)</div>} $html]
    foreach {junk url title description} $casafreenews {
    puthelp "privmsg $chan :\[ $title \]: $description"
    }
  return
}

proc ::news::reset { uhost } {
	global news
	catch {killutimer $news(timer,$uhost)}
	catch {unset news(timer,$uhost)}
	catch {unset news(host,$uhost)}
}
putlog "news.tcl v$news(version) loaded."
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

Code: Select all

# Puh Rum Pum Pum Pum

package require http
setudef flag nopubnews

namespace eval news {
	# config
	set ary(pref) "!"
	set ary(commands) "news"
	set ary(throttle) 2
	set ary(throttle_time) 30
	set ary(bind_time) "30*"
	set ary(page) http://www.casafree.com/modules/news/
	set ary(regex) {<div class="articletitre"><a href.*?>.*?<a href='(.*?)'>(.*?)</a>.*?<div class="itemBody">(.*?)</div>}
	set ary(max_bot) 5
	set ary(max_user) 5
	set ary(version) "2.0"
}

# binds
foreach bind [split $::news::ary(commands)] {
	bind pub -|- "$::news::ary(pref)$::news::ary(commands)" ::news::pub_
	bind msg -|- "$::news::ary(pref)$::news::ary(commands)" ::news::msg_
}
bind time - $::news::ary(bind_time) ::news::magic_

namespace eval news {
	# main - time bind - magic
	proc magic_ {args} {
		news_ $::botnick [getchanhost $::botnick] $::botnick "all" "magic"
	}

	# main - msg bind - notice
	proc msg_ {nick uhost hand arg} {
   		news_ $nick $uhost $hand $nick "notice"
	}

	# main - pub bind - privmsg
	proc pub_ {nick uhost hand chan arg} {
		if {[channel get $chan nopubnews]} { return }
		news_ $nick $uhost $hand $chan "privmsg"
	}

	# sub - give news
	proc news_ {nick uhost hand chan arg} {
		if {[throttle_ $uhost,$chan,news $::news::ary(throttle_time)]} {
			putserv "$arg $chan :$nick, you have been Throttled! Your going too fast and making my head spin!"
		}
		set a "Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1"
		set t [::http::config -useragent $a]
		catch { set t [::http::geturl $::news::ary(page) -timeout 30000] } error
		# error condition 1, socket error or other general error
		if {![string match -nocase "::http::*" $error] && ![isbotnick $nick]} {
			putserv "$arg $chan :[string totitle [string map {"\n" " | "} $error]] \( $::news::ary(page) \)"
			return
		}
		# error condition 2, http error
		if {![string equal -nocase [::http::status $t] "ok"] && ![isbotnick $nick]} {
			putserv "$arg $chan :[string totitle [::http::status $t]] \( $::news::ary(page) \)"
			return
		}
		set html [::http::data $t]
		::http::cleanup $t
		set casafreenews [regexp -all -inline "$::news::ary(regex)" $html]
		set c 0
		foreach {junk url title description} $casafreenews {
			incr c
			if {[isbotnick $nick]} {
				if {$c > $::news::ary(max_bot) || ([info exists ::casaLast] && [string equal $url $::casaLast])} { break }
			} elseif {$c > $::news::ary(max_user)} { break }
			if {![string equal "magic" $arg]} {
				puthelp "$arg $chan :\[ [mapit_ $title] \]: [mapit_ $description] @ $url"
			} else {
				foreach ch [channels] {
					if {[channel get $ch nopubnews]} { continue }
					puthelp "$arg $ch :\[ [mapit_ $title] \]: [mapit_ $description] @ $url"
				}
			}	
		}
		if {[string equal "magic" $arg]} {
			set ::casaLast [lindex $casafreenews 1]
		}
	}

	# sub - map it
	proc mapit_ {t} { return [string map [list "'" "'"] $t] }

	# Throttle Proc (slightly altered, super action missles) - Thanks to user
	# see this post: http://forum.egghelp.org/viewtopic.php?t=9009&start=3
	proc throttle_ {id seconds} {
   		global ::news::throttle
   		if {[info exists ::news::throttle($id)]&&[lindex $::news::throttle($id) 0]>[clock seconds]} {
			set ::news::throttle($id) [list [lindex $::news::throttle($id) 0] [set value [expr {[lindex $::news::throttle($id) 1] +1}]]]
			if {$value > $::news::ary(throttle)} { set id 1 } { set id 0 }
		} {
			set ::news::throttle($id) [list [expr {[clock seconds]+$seconds}] 1]
			set id 0
		}
	}
	# delete expired entries every 10 minutes
	bind time - ?0* throttleclean_
	proc throttleclean_ {args} {
   		global ::news::throttle
		set now [clock seconds]
		foreach {id time} [array get ::news::throttle] {
			if {[lindex $time 0]<=$now} {unset ::news::throttle($id)}
		}
	}
}

putlog "news.tcl v$::news::ary(version) loaded."
Try this out. I've tested it and it works, when it doesn't timeout that is. That site needs that 30000 (30 second) timeout, it's veryyy slow. It would also be very easy for someone to use this same script to parse their own type of site. This allows a pub trigger, message trigger, and a timed event which will spam the newest news items from the page every 30 minutes. If you need any config options explained, just ask... :P
User avatar
Linux
Halfop
Posts: 71
Joined: Sun Apr 04, 2004 4:20 pm
Location: Under The Sky

Post by Linux »

Code: Select all

Tcl error [throttleclean_]: invalid command name "throttleclean_"
I'm an idiot, At least this one [bug] took about 5 minutes to find...
n
neoclust
Halfop
Posts: 55
Joined: Fri Aug 14, 2009 11:03 am

Post by neoclust »

hello thanks dude, it works fine except that I get this error message in the clearing process
Tcl error [throttleclean_]: invalid command name "throttleclean_"
And if you can add the bind time must publish the news automatically on the channel in an hour
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

Code: Select all

Cannot post the code, this forum renders the html elements even within [c o d e] tags.. quite lame
Get it HERE instead.

To make it auto-post new news items every hour, instead of every 30 minutes.

Code: Select all

#change this
set ary(bind_time) "30*"
#to this
set ary(bind_time) "00*"
I've also commented the config section more thorough so you can understand what each setting does. Enjoy ;)
n
neoclust
Halfop
Posts: 55
Joined: Fri Aug 14, 2009 11:03 am

Post by neoclust »

thanks man, i changed the time by 5 minutes to see if the timer will trigger the action to automatically publish the news on the channel but in vain
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

neoclust wrote:thanks man, i changed the time by 5 minutes to see if the timer will trigger the action to automatically publish the news on the channel but in vain
Oh ho ho.. perhaps I should've tested that timer part. I realize the problem, and to make up for it I've added full configuration into the script. Check out the new config below:

Code: Select all

namespace eval news {
   # config - make your changes here
   # trigger character
   set ary(pref) "!"

   # command used to reply to user
   # this can be a list of space delimited commands
   set ary(commands) "news"

   # amount user can issue before throttle
   set ary(throttle) 2

   # throttle time
   set ary(throttle_time) 30

   # time to announce new news items
   # this can be a list of space delimited time binds.
   # the one you wish to use for bind_time uncommented.
   # set ary(bind_time) "00* 15* 30* 45*" ; # every 15 minutes
   # set ary(bind_time) "00* 30*" ; # every 30 minutes
   set ary(bind_time) "00*" ; # every 60 minutes at exactly the start of the hour

   # url to news page
   set ary(page) http://www.casafree.com/modules/news/

   # parsing regex used to gather news
   set ary(regex) {<div class="articletitre"><a href.*?>.*?<a href='(.*?)'>(.*?)</a>.*?<div class="itemBody">(.*?)</div>}

   # max amount of news items to announce
   set ary(max_bot) 5

   # max amount of news items for users
   set ary(max_user) 5

   # display format for news messages, variables are: %description, %title, %url 
   # these can be used and will be replaced with actual values, newline (\n) will
   # let you span multiple lines if you wish. If something is too long it will
   # be cut off, be aware of this... use colors, bold, but remember to \escape any
   # special tcl characters. 
   set ary(display_format) "\[\002%title\002\] %url\n%description"

   # script version
   set ary(version) "2.2"
}
Here is samples of how this looks on irc.
<speechles> !news
<sp33chy> [Grande-Bretagne : L'économie britannique progresserait plus rapidement que la majorité des pays du G7 selon l'OCDE] http://www.casafree.com/modules/news/ar ... ryid=44326
[5:17pm] [sp33chy] L'économique britannique devrait enregistrer au premier semestre de 2010, une croissance plus rapide que la majorité des pays du G7, a indiqué un rapport de l'Organisation de coopération et de développement économiques (OCDE) relayé par la presse britannique.
<sp33chy> [Israël est la principale menace à la paix au Moyen-Orient , dit le PM turc] http://www.casafree.com/modules/news/ar ... ryid=44328
<sp33chy> Israël est la principale menace pour la paix au Moyen-Orient, a indiqué mercredi le Premier ministre turc Recep Tayyip Erdogan, qui est actuellement en visite à Paris.
<sp33chy> [UE - Afrique : : Les compagnies aériennes de 11 pays africains interdites de vol en Europe] http://www.casafree.com/modules/news/ar ... ryid=44327
<sp33chy> La Commission européenne a décidé d'interdire de vol en Europe 278 compagnies aériennes dont celles de 11 pays africains, annonce un communiqué publié mercredi à Bruxelles.

..a short while later..

<sp33chy> [Maroc : Plus de 1200 personnes ont bénéficié des services du Samusocial Casablanca en 2009] http://www.casafree.com/modules/news/ar ... ryid=44329
<sp33chy> 1217 personnes recueillies dans la rue ont bénéficié, en 2009, des services du Centre d'hébergement du service d'aide mobile d'urgence social (Samusocial) de Casablanca.
Get the new script HERE and have a fun. ;)
n
neoclust
Halfop
Posts: 55
Joined: Fri Aug 14, 2009 11:03 am

Post by neoclust »

u have missing } at the end, and i waited that the timer automatically launches a news but nothing happens
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

My bad. Here's a version that should behave correctly.

click here
n
neoclust
Halfop
Posts: 55
Joined: Fri Aug 14, 2009 11:03 am

Post by neoclust »

Well I set the timer on each twenty minute 00 * 30 * and then one .restart the result is

Code: Select all

[23:31:29] *** Eggy (lamestxpa@network.org) has left #test (Left all channels)
[23:31:45] *** Eggy (lamestxih@network.org) has joined #test 
[23:35:03] (Eggy): [Russie - USA : Moscou et Washington déterminés à adopter de nouvelles sanctions contre l'Iran] http://www.casafree.com/modules/news/article.php?storyid=44370
[23:35:06] (Eggy): Le président américain Barack Obama et son homologue russe Dimitri Medvedev ont réaffirmé jeudi à Prague leur détermination à adopter une nouvelle série de sanctions contre l'Iran si ce pays refusait de mettre fin à son programme nucléaire.
[23:35:12] (Eggy): [France : La COPEAM veut avancer dans la concrétisation de ses projets pour l'émergence d'un paysage audiovisuel méditerranéen] http://www.casafree.com/modules/news/article.php?storyid=44369
[23:35:15] (Eggy): La 17ème Conférence Permanente de l'Audiovisuel Méditerranéen (COPEAM) s'ouvre jeudi à Paris avec l'ambition "d'aller plus loin" dans la concrétisation de ses projets phares destinés à contribuer à l'émergence d'un paysage audiovisuel méditerranéen.
[08:14:38] (neoclust): huh
when I'm gone again what happens on the channel after 6 hours, there was not no big thing after the first action when he connected.. it gets stuck somewhere you do not?

See this code : http://paste.tclhelp.net?id=6g1
n
neoclust
Halfop
Posts: 55
Joined: Fri Aug 14, 2009 11:03 am

Post by neoclust »

I would like to adapt the script with the rss I tried but it fails :s thanks http://www.zdnet.fr/feeds/rss/
Post Reply