incith:horoscope (r94) (Jan. 20th, 2009)

holycrap · Post by **holycrap** » Sat Dec 12, 2009 7:57 pm

Maybe Santa will give us a present on this one.

Trixar_za · Post by **Trixar_za** » Tue Dec 15, 2009 3:44 pm

holycrap wrote:Maybe Santa will give us a present on this one.

I guess that makes me santa...

Here is what you need to change:

Code: Select all

    # valid, set our url
    set input(url) "http://horoscopes.astrology.com/daily${input(query)}.html"
    foreach sign [split ${incith::horoscope::en_chinese} " "] {
      if {$input(query) == $sign} {
        set input(url) "http://horoscopes.astrology.com/dailychinese${input(query)}.html"
        break
      }
    }

to

Code: Select all

    # valid, set our url
    set input(url) "http://feeds.astrology.com/dailyoverview"
    foreach sign [split ${incith::horoscope::en_chinese} " "] {
      if {$input(query) == $sign} {
        set input(url) "http://feeds.astrology.com/dailychinese"
        break
      }
    }

and then you have to change this:

Code: Select all

    # html parsing
    #
    # fetch the sign and the horoscope
    regexp {<div class="all_about_head_pad">ALL ABOUT (.+?)</div>} $html - output(sign)
    regexp {<p style="margin-bottom: 20px;">(.*?)</p>} $html - output(horoscope)

to

Code: Select all

    regsub -all {(?:<!\[CDATA\[)} $html {} html

    # html parsing
    #
    # fetch the sign and the horoscope
    set output(sign) [string totitle $output(query)]
    set regex "<title>$output(sign) (.*?)</title>.+<description><p>(.*?)</p>"
    regexp $regex $html - junk output(horoscope)

And now the script will be using the rss feeds and should be working again - Enjoy!

Edit: The lazy can download it here

speechles · Post by **speechles** » Tue Dec 15, 2009 7:14 pm

Here's a question.. If it is taken from an rss page, why do you need to query the website always?

Code: Select all

regexp -nocase {<lastbuilddate>(.*?)</lastbuilddate>} $html - ::horoscope(lastbuild)
regexp -all -inline -nocase -- {<item>(.*?)</item>} $html - parents
foreach {junk child} $parents {
  regexp -nocase {<title>(.*?) Horoscope} $child - title
  regexp -nocase {<link>(.*?)<\link>} $child - link
  regexp -nocase {<desc>(.*?)<\link>} $child - desc
  set ::horoscope($title) "$desc @ $link"
}

The code above would let you create an array based on the sign itself, each array element would be composed of "horoscope @ link". This is how an rss based approach should be. Then you don't need to make an http request for every query. You time the query to start checking for an update based upon the header field returned from the reply like below.

<speechles> !webby http://feeds.astrology.com/dailyoverview --header
<sp33chy> Astrology.com Daily Overview Horoscopes ( http://cli.gs/mAq1D )( 200; text/xml; utf-8; 25423 bytes )
<sp33chy> Server=GFE/2.0; Last-Modified=Tue, 15 Dec 2009 23:02:29 GMT; Expires=Tue, 15 Dec 2009 23:02:29 GMT; ETag=UbQ446fscwam0RXAMW7hNN8HDGI; Date=Tue, 15 Dec 2009 23:02:29 GMT; Cache-Control=private, max-age=0

Together with the last-modified header attribute, you can know exactly when you should download new html. But polling the site wouldn't need to occur until it is 1 hour until midnight on their servers time. Which appears to be GMT 0 making this exercise pretty easy.

1) initialize and gather required elements:
2) parse html from daily and chinese sites
3) add elements to horocopes arrays, store last-modified timestamp taken from header

Then when a user types anything, you simply compare against the array. If it's not an array element, then it's not a valid sign. This makes it much more intuitive. When a sign is found, we simply return what is stored within the array making it incredibly fast. No http waits at all for any user.

At one hour before midnight GMT 0 (england) time, the script will set a state telling itself that every 5 minutes or so it should start checking the site's headers (http::geturl with -validate option, wouldn't make sense to read the body until we need it) against its stored timestamp, if it isn't equal. It's time to initialize, make another http::geturl without validating and store our array and timestamp so we can do this again in 24 or so hours.

Perhaps this is the real christmas gift...

and while it's possible I could do this, I like waiting to see if incith pays attention to this thread... ;D

-> Update: Well, seems they already knew people might do this so appears they do some devious playing with header attributes...

<speechles> !webby http://feeds.astrology.com/dailyoverview --validate
<sp33chy> Validated: http://feeds.astrology.com ( http://is.gd/5p58d )( 200; text/xml; utf-8; 0 bytes )
<sp33chy> Server=GFE/2.0; Last-Modified=Tue, 15 Dec 2009 23:39:57 GMT; Expires=Tue, 15 Dec 2009 23:39:57 GMT; ETag=UbQ446fscwam0RXAMW7hNN8HDGI; Date=Tue, 15 Dec 2009 23:39:57 GMT; Cache-Control=private, max-age=0
<sp33chy> X-XSS-Protection=0; X-Content-Type-Options=nosniff
<speechles> !webby http://feeds.astrology.com/dailyoverview --validate
<sp33chy> Validated: http://feeds.astrology.com ( http://cli.gs/497hU )( 200; text/xml; utf-8; 0 bytes )
<sp33chy> Server=GFE/2.0; Last-Modified=Tue, 15 Dec 2009 23:38:17 GMT; Expires=Tue, 15 Dec 2009 23:40:08 GMT; ETag=UbQ446fscwam0RXAMW7hNN8HDGI; Date=Tue, 15 Dec 2009 23:40:08 GMT; Cache-Control=private, max-age=0
<sp33chy> X-XSS-Protection=0; X-Content-Type-Options=nosniff
<speechles> !webby http://feeds.astrology.com/dailyoverview --validate
<sp33chy> Validated: http://feeds.astrology.com ( http://tinyurl.com/klkhyv )( 200; text/xml; utf-8; 0 bytes )
<sp33chy> Server=GFE/2.0; Last-Modified=Tue, 15 Dec 2009 23:39:38 GMT; Expires=Tue, 15 Dec 2009 23:40:18 GMT; ETag=UbQ446fscwam0RXAMW7hNN8HDGI; Date=Tue, 15 Dec 2009 23:40:18 GMT; Cache-Control=private, max-age=0
<sp33chy> X-XSS-Protection=0; X-Content-Type-Options=nosniff

Note: a validate request cannot detect page body size or title, hence "validated" and "0 bytes" in those places.

The page is still the same, but the last-modified is a lie. Smells like cake to me. So the timestamp will NOT help here, but since they update before midnight rolls over just set a simple bind to "00 00 * * *", but of course this would only work for those using GMT 0. You must adjust the bind to fit your time-zone. Then every single day when midnight comes to england the script will initialize the horoscope array and update it for everyone. This means only one query is required (pretend we never timeout/error) every 24 hours regardless of how many people are hammering your bot.

Incith, are you listening?

Trixar_za · Post by **Trixar_za** » Tue Dec 15, 2009 7:54 pm

speechles wrote:Here's a question.. If it is taken from an rss page, why do you need to query the website always?

1.) Because I wanted to make as few changes as possible to the original code - to simplify the changes for people.
2.) I'm just starting in TCL coding - lol

Oh and I fixed the ram/goat/sheep bug too... same link as before ^^

cache · Post by **cache** » Wed Dec 16, 2009 12:30 am

Trixar_za wrote:
speechles wrote:Here's a question.. If it is taken from an rss page, why do you need to query the website always?
1.) Because I wanted to make as few changes as possible to the original code - to simplify the changes for people.
2.) I'm just starting in TCL coding - lol

Oh and I fixed the ram/goat/sheep bug too... same link as before ^^

Can you paste the full code? Top part you are asking us to change don't match with what we all have and this is this:

Code: Select all

    proc fetch_html {input} {
      set query "http://horoscopes.astrology.com/"
      set input [string tolower $input]
      regsub -- "(?q)${incith::horoscope::command_char}" $input {} input

Trixar_za · Post by **Trixar_za** » Wed Dec 16, 2009 4:20 am

cache wrote: Can you paste the full code? Top part you are asking us to change don't match with what we all have and this is this:
Code: Select all
    proc fetch_html {input} {
      set query "http://horoscopes.astrology.com/"
      set input [string tolower $input]
      regsub -- "(?q)${incith::horoscope::command_char}" $input {} input

Maybe I'm using an older version... I'll download the latest and have look quickly.

Ok... The newest svn version uses th same proc's as mine, but comes with the botnet support added in.

Yours seems to be in your fetch_html proc too where mine isn't. If you upload yours I would have a quick look if you want.

Oh and for those that are interested... to fix the ram/sheep/goat bug you need to change:

Code: Select all

    # ram or goat becomes sheep
    if {$input(query) == "ram" || $input(query) == "goat"} {
      set input(query) "sheep"
    }

to

Code: Select all

    # ram or sheep becomes goat
    if {$input(query) == "ram" || $input(query) == "sheep"} {
      set input(query) "goat"
    }

cache · Post by **cache** » Thu Dec 17, 2009 2:11 am

Thank you very much.. for some reason version 1.2 worked till now, I had no idea there was a new version to edit.

pogue · Post by **pogue** » Sun Dec 20, 2009 2:10 pm

Can someone paste the updated code on pastebin or something?

Thanks in advance,
pogue

Trixar_za · Post by **Trixar_za** » Mon Dec 21, 2009 12:32 pm

You can see the pastebin'd script here
and you can download the already modified copy here

Hope that helps

Trixar_za · Post by **Trixar_za** » Tue Jan 05, 2010 4:19 am

I'm thinking of expanding this script a bit to include some of the other feeds provided by http://www.astrology.com/rss

Just with the daily horoscopes you get dailyoverview, dailyquickie, dailyextended, dailyastroslam (one of my favorites), dailysinglelove, dailycoupleslove, dailyflirt, dailyteenhoroscope, dailybeautyscope, dailybabyscope, dailycatscope, dailydogscope and dailyhomeandgarden which all use a similar feed format, while dailygayscope, dailyfoodscope, dailylesbianscope, dailyfinancescope, dailygreenscope and dailymomscope use a different layout, which the script can probably be adjusted for. There is also weekly and monthly feeds (although, luckly less than the daily ones).

The reason I'm asking here first is because this to see if there is any interest for such a script and secondl, maybe people should choose what they want in there first

pogue · Post by **pogue** » Sun Jan 17, 2010 8:39 pm

Trixar_za wrote:The reason I'm asking here first is because this to see if there is any interest for such a script and secondl, maybe people should choose what they want in there first

Sounds like a lot of neat functionality, but I don't think personally I would use it. I just use the astrology script kind of as a laugh or when no one is talking to start conversations.

I can think of some other cool websites that would be fun to grab RSS feeds and other stuff off of though...

But, thanks for updating this nonetheless!
pogue

achilles1900 · Post by **achilles1900** » Tue Aug 31, 2010 9:44 am

Hi Trixar,

just wanted to say thanks a lot for fixing this, it helped me out today.

Really appreciate your work helping the TCL scripting community. By the way, i want to start to learn TCL myself, can you give me any pointers on where to go to start?

thanks in advance,

Achilles

Trixar_za · Post by **Trixar_za** » Thu Sep 02, 2010 8:23 pm

If you really want my advice about this, then I would suggest reading this article about TCL: http://antirez.com/articoli/tclmisunderstood.html

Next try looking at simple scripts and try figuring out how they work. Eggdrop adds a whole host of it's own commands and syntax which requires some getting used to.

Start by figuring out how they work, then try modifying the code to work like you want it to. Complicated scripts that don't follow normal programming logic is best (like some stats scripts), then take everything you learned from that and write a completely original script from scatch. That's how I did it.

cache · Post by **cache** » Mon Apr 29, 2013 12:44 pm

This horoscope script stopped working. It has been repeating the same old horoscope a few days now but the website is showing new ones each day. Anyone else?

crazyVTr · Post by **crazyVTr** » Mon Apr 29, 2013 1:59 pm

the feeds need to be updated
i got mine working again by changing:

Code: Select all

    # valid, set our url
    set input(url) "http://feeds.astrology.com/dailyoverview"
    foreach sign [split ${incith::horoscope::en_chinese} " "] {
      if {$input(query) == $sign} {
        set input(url) "http://feeds.astrology.com/dailychinese"
        break
      }
    }

to this

Code: Select all

    # valid, set our url
    set input(url) "http://www.astrology.com/horoscopes/daily-horoscope.rss"
    foreach sign [split ${incith::horoscope::en_chinese} " "] {
      if {$input(query) == $sign} {
        set input(url) "http://www.astrology.com/horoscopes/daily-chinese.rss"
        break
      }
    }