I guess that makes me santa...holycrap wrote:Maybe Santa will give us a present on this one.
Code: Select all
# valid, set our url
set input(url) "http://horoscopes.astrology.com/daily${input(query)}.html"
foreach sign [split ${incith::horoscope::en_chinese} " "] {
if {$input(query) == $sign} {
set input(url) "http://horoscopes.astrology.com/dailychinese${input(query)}.html"
break
}
}
Code: Select all
# valid, set our url
set input(url) "http://feeds.astrology.com/dailyoverview"
foreach sign [split ${incith::horoscope::en_chinese} " "] {
if {$input(query) == $sign} {
set input(url) "http://feeds.astrology.com/dailychinese"
break
}
}
Code: Select all
# html parsing
#
# fetch the sign and the horoscope
regexp {<div class="all_about_head_pad">ALL ABOUT (.+?)</div>} $html - output(sign)
regexp {<p style="margin-bottom: 20px;">(.*?)</p>} $html - output(horoscope)
Code: Select all
regsub -all {(?:<!\[CDATA\[)} $html {} html
# html parsing
#
# fetch the sign and the horoscope
set output(sign) [string totitle $output(query)]
set regex "<title>$output(sign) (.*?)</title>.+<description><p>(.*?)</p>"
regexp $regex $html - junk output(horoscope)
Code: Select all
regexp -nocase {<lastbuilddate>(.*?)</lastbuilddate>} $html - ::horoscope(lastbuild)
regexp -all -inline -nocase -- {<item>(.*?)</item>} $html - parents
foreach {junk child} $parents {
regexp -nocase {<title>(.*?) Horoscope} $child - title
regexp -nocase {<link>(.*?)<\link>} $child - link
regexp -nocase {<desc>(.*?)<\link>} $child - desc
set ::horoscope($title) "$desc @ $link"
}
Together with the last-modified header attribute, you can know exactly when you should download new html. But polling the site wouldn't need to occur until it is 1 hour until midnight on their servers time. Which appears to be GMT 0 making this exercise pretty easy.<speechles> !webby http://feeds.astrology.com/dailyoverview --header
<sp33chy> Astrology.com Daily Overview Horoscopes ( http://cli.gs/mAq1D )( 200; text/xml; utf-8; 25423 bytes )
<sp33chy> Server=GFE/2.0; Last-Modified=Tue, 15 Dec 2009 23:02:29 GMT; Expires=Tue, 15 Dec 2009 23:02:29 GMT; ETag=UbQ446fscwam0RXAMW7hNN8HDGI; Date=Tue, 15 Dec 2009 23:02:29 GMT; Cache-Control=private, max-age=0
Note: a validate request cannot detect page body size or title, hence "validated" and "0 bytes" in those places.<speechles> !webby http://feeds.astrology.com/dailyoverview --validate
<sp33chy> Validated: http://feeds.astrology.com ( http://is.gd/5p58d )( 200; text/xml; utf-8; 0 bytes )
<sp33chy> Server=GFE/2.0; Last-Modified=Tue, 15 Dec 2009 23:39:57 GMT; Expires=Tue, 15 Dec 2009 23:39:57 GMT; ETag=UbQ446fscwam0RXAMW7hNN8HDGI; Date=Tue, 15 Dec 2009 23:39:57 GMT; Cache-Control=private, max-age=0
<sp33chy> X-XSS-Protection=0; X-Content-Type-Options=nosniff
<speechles> !webby http://feeds.astrology.com/dailyoverview --validate
<sp33chy> Validated: http://feeds.astrology.com ( http://cli.gs/497hU )( 200; text/xml; utf-8; 0 bytes )
<sp33chy> Server=GFE/2.0; Last-Modified=Tue, 15 Dec 2009 23:38:17 GMT; Expires=Tue, 15 Dec 2009 23:40:08 GMT; ETag=UbQ446fscwam0RXAMW7hNN8HDGI; Date=Tue, 15 Dec 2009 23:40:08 GMT; Cache-Control=private, max-age=0
<sp33chy> X-XSS-Protection=0; X-Content-Type-Options=nosniff
<speechles> !webby http://feeds.astrology.com/dailyoverview --validate
<sp33chy> Validated: http://feeds.astrology.com ( http://tinyurl.com/klkhyv )( 200; text/xml; utf-8; 0 bytes )
<sp33chy> Server=GFE/2.0; Last-Modified=Tue, 15 Dec 2009 23:39:38 GMT; Expires=Tue, 15 Dec 2009 23:40:18 GMT; ETag=UbQ446fscwam0RXAMW7hNN8HDGI; Date=Tue, 15 Dec 2009 23:40:18 GMT; Cache-Control=private, max-age=0
<sp33chy> X-XSS-Protection=0; X-Content-Type-Options=nosniff
1.) Because I wanted to make as few changes as possible to the original code - to simplify the changes for people.speechles wrote:Here's a question.. If it is taken from an rss page, why do you need to query the website always?
Can you paste the full code? Top part you are asking us to change don't match with what we all have and this is this:Trixar_za wrote:1.) Because I wanted to make as few changes as possible to the original code - to simplify the changes for people.speechles wrote:Here's a question.. If it is taken from an rss page, why do you need to query the website always?
2.) I'm just starting in TCL coding - lol
Oh and I fixed the ram/goat/sheep bug too... same link as before ^^
Code: Select all
proc fetch_html {input} {
set query "http://horoscopes.astrology.com/"
set input [string tolower $input]
regsub -- "(?q)${incith::horoscope::command_char}" $input {} input
Maybe I'm using an older version... I'll download the latest and have look quickly.cache wrote: Can you paste the full code? Top part you are asking us to change don't match with what we all have and this is this:
Code: Select all
proc fetch_html {input} { set query "http://horoscopes.astrology.com/" set input [string tolower $input] regsub -- "(?q)${incith::horoscope::command_char}" $input {} input
Code: Select all
# ram or goat becomes sheep
if {$input(query) == "ram" || $input(query) == "goat"} {
set input(query) "sheep"
}
Code: Select all
# ram or sheep becomes goat
if {$input(query) == "ram" || $input(query) == "sheep"} {
set input(query) "goat"
}
Sounds like a lot of neat functionality, but I don't think personally I would use it. I just use the astrology script kind of as a laugh or when no one is talking to start conversations.Trixar_za wrote:The reason I'm asking here first is because this to see if there is any interest for such a script and secondl, maybe people should choose what they want in there first
Code: Select all
# valid, set our url
set input(url) "http://feeds.astrology.com/dailyoverview"
foreach sign [split ${incith::horoscope::en_chinese} " "] {
if {$input(query) == $sign} {
set input(url) "http://feeds.astrology.com/dailychinese"
break
}
}
Code: Select all
# valid, set our url
set input(url) "http://www.astrology.com/horoscopes/daily-horoscope.rss"
foreach sign [split ${incith::horoscope::en_chinese} " "] {
if {$input(query) == $sign} {
set input(url) "http://www.astrology.com/horoscopes/daily-chinese.rss"
break
}
}