This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

A simple tcl to pull a word origin from etymonline.com

Requests for complete scripts or modifications/fixes for scripts you didn't write. Response not guaranteed, and no thread bumping!
Post Reply
m
manipulativeJack
Voice
Posts: 13
Joined: Tue Feb 17, 2009 9:52 pm

A simple tcl to pull a word origin from etymonline.com

Post by manipulativeJack »

My channel often discusses the meaning of words and where the word
comes from, I refer to www.etymonline.com daily and was thinking it
would be great to have a script to offer the word origin information:


<Jack>!origin cat
<Bot> cat O.E. (c.700), from W.Gmc. (c.400-450), from P.Gmc. *kattuz, from L.L. cattus. The near-universal European word now, it appeared in Europe as L. catta (Martial, c.75 C.E.), Byzantine Gk. katta (c.350) and was in general use on the continent by c. 700, replacing L. feles. Probably ult. Afro-Asiatic (cf. Nubian kadis, Berber kadiska, both meaning "cat"). Ar. qitt "tomcat" may be from the same source. Cats were domestic in Egypt from c.2000 B.C.E., but not a familiar household animal to classical Greeks and Romans. The nine lives have been proverbial since at least c.1562. Extended to lions, tigers, etc. 1607. As a term of contempt for a woman, from c.1225. Slang sense of "prostitute" is from at least 1401. Slang sense of "fellow, guy," is from 1920, originally in U.S. Black Eng.; narrower sense of "jazz enthusiast" is recorded from 1931. Catcall first recorded 1659; catnap is from 1823; catfish is from 1620; catwalk is from 1917. Cat's-cradle is from 1768. Cat-o'-nine-tails (1695), probably so called in reference to its "claws," was legal instrument of punishment in British Navy until 1881. Cat's paw (1769, but cat's foot in the same sense, 1597) refers to old folk tale in which the monkey tricks the cat into pawing chestnuts from a fire; the monkey gets the nuts, the cat gets a burnt paw. To rain cats and dogs (c.1652) is probably an extension of cats and dogs as proverbial for "strife, enmity" (1579). Cat-witted "small-minded, obstinate, and spiteful" (1673) deserved to survive. For Cat's meow, cat's pajamas, see bee's knees.


...and maybe some kind of option to determine how much text it will spam, so you could have it only do two or three lines worth and then have it followed by Visit http://www.etymonline.com/index.php?sea ... hmode=none for the full info!

If no one seems interested I will keep giving it a try on my own, but advice would be great also! Thanks!
User avatar
arfer
Master
Posts: 436
Joined: Fri Nov 26, 2004 8:45 pm
Location: Manchester, UK

Post by arfer »

Difficult to know how to deal with the html without knowing all the possible tags that are returned. Anyway, lets call this a first effort.

Only deals with a single word. You didn't say if you ever put phrases in.

In the partyline, requires .chanset #channelname +etymology

Requires Tcl 8.4+ (anything less probably won't deal correctly with the regsub commands)

Requires text embellishments (colour) to be allowed in the channel

vEtymologyMaxWords is the number of words from the html response that will appear in the channel output. The total number of characters in these words MUST be less than that permitted in a single output for the network you are on. I have assumed around 7 characters per word x the configured value of 40 words = 280 (this is considerably less than the max character output permitted on DALnet).

Code: Select all

set vEtymologyMaxWords 40

package require http

setudef flag etymology

set vEtymologyTimeout 10
set vEtymologyUrl http://www.etymonline.com/index.php

bind PUB - !etymology pEtymologySearch

proc pEtymologySearch {nick uhost hand channel txt} {
  global vEtymologyTimeout vEtymologyUrl vEtymologyMaxWords
  if {[channel get $channel etymology]} {
    set arguments [string trim $txt]
    if {[llength [split $arguments]] == 1} {
      set agent [::http::config -useragent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"]
      if {![catch {set http [::http::geturl $vEtymologyUrl\?search=$txt -timeout [expr {$vEtymologyTimeout * 1000}]]}]} {
        switch -- [::http::status $http] {
          "timeout" {putserv "PRIVMSG $channel :attempt to scrape $vEtymologyUrl timed out after $vEtymologyTimeout seconds"}
          "error" {putserv "PRIVMSG $channel :attempt to scrape $vEtymologyUrl returned error [::http::error $http]"}
          "ok" {
            switch -- [::http::ncode $http] {
              200 {
                regexp -- {<dd class=\"highlight\">(.+?)</dd>} [::http::data $http] -> data
                if {[info exists data]} {
                  set data [regsub -all -- {<span class=\"foreign\">([^<]*?)</span>} $data "\00305\\1\003"]
                  set data [regsub -all -- {\(see.*<a href=\"/index\.php([^"]*?)\".*\)} $data "\[see \00312$vEtymologyUrl\\1\003\]" ]
                  set data [regsub -all -- {\(([^)]*?)\)} $data \(\00314\\1\003\)]
                  set char \00303[encoding convertto utf-8 \u25CF]\003
                  set data "\00304$txt\003 [join [lrange [split $data] 0 [expr {$vEtymologyMaxWords - 1}]]] [string repeat $char 5]"
                  putserv "PRIVMSG $channel :$data"
                  putserv "PRIVMSG $channel :$vEtymologyUrl\?search=$txt"
                } else {putserv "PRIVMSG $channel :$txt not found at $vEtymologyUrl"}
              }
              default {putserv "PRIVMSG $channel :attempt to scrape $vEtymologyUrl returned ncode [::http::ncode $http]"}
            }
          }
        }
        ::http::cleanup $http
      } else {putserv "PRIVMSG $channel :attempted connection to $vEtymologyUrl failed"}
    } else {putserv "PRIVMSG $channel :correct syntax is !etymology <word>"}
  }
  return 0 
}
Image

Don't you just love colours!! WHOOPEE!!!
I must have had nothing to do
m
manipulativeJack
Voice
Posts: 13
Joined: Tue Feb 17, 2009 9:52 pm

Post by manipulativeJack »

Oh, _so_ nice.

It looks perfect in your screen shot, _exactly_ what I was thinking.

However when I try to implement it myself it is giving me some problems - when I copy and paste it it looks like some of the lines have wrapped - I went in and tried to make it look right again but for some reason when I type:

!etymology cat

I just get the URL

<Jack> !etymology cat
<MyBot> http://www.etymonline.com/index.php?search=cat

I will keep playing with it and let you know if I get it right - I am SURE it is something I did on my end (with the pasting) because it looks perfect in your screenshot.

Thanks in advance.
User avatar
arfer
Master
Posts: 436
Joined: Fri Nov 26, 2004 8:45 pm
Location: Manchester, UK

Post by arfer »

You really should use a decent quality text editor that won't wrap lines. Random line breaks in the script will make a mess for sure.

If you definitely had spurious line breaks, scrap it and start again. Don't 'play' with it.

Editpad Lite seems OK and is free.

I use ActiveState's Komodo Edit mostly (also free) because it checks syntax as I'm coding, though the current version requires the Tcl Linter to be installed separately which isn't exactly intuitively easy.

One other thing to check is your Tcl version. It should be 8.4+. I don't think 8.3 will correctly interpret the regsub commands I'm using. However, such an event would yield a partyline Tcl error.
I must have had nothing to do
m
manipulativeJack
Voice
Posts: 13
Joined: Tue Feb 17, 2009 9:52 pm

Post by manipulativeJack »

Well, I was just copying what was displayed on the forum screen here and pasting it into the editor on my shell (pico/nano) it does not wrap lines but it looks like they are wrapped here on the forum ... unless I am just seeing things wrong?
User avatar
arfer
Master
Posts: 436
Joined: Fri Nov 26, 2004 8:45 pm
Location: Manchester, UK

Post by arfer »

You might be right. They aren't wrapped on my screen but I'm at a high resolution.

http://www.nomorepasting.com/getpaste.php?pasteid=24822

Try that one, though I doubt if the text box on this forum adds line breaks.

You might want to try a windows based editor (assuming you use windows) and then ftp upload the script. Command line editors are pretty basic by comparison.

One other very important thing. If you exceed the characters allowed on your network OR the channel doesnt allow text embellishment (+c on DALnet) then in either case there will be no output at all. I say this because the URL line is not in colour and has relatively few characters. You do see that line. My 50 pence each way is on this being the problem.
I must have had nothing to do
m
manipulativeJack
Voice
Posts: 13
Joined: Tue Feb 17, 2009 9:52 pm

Post by manipulativeJack »

Hmm, I am not sure what I am doing wrong.

I still get the same thing, just the URL displaying.

My admin said he is using the newest version of TCL and I am not seeing any partyline errors. (I am not sure how to check the TCL version myself.)

I am using egghttp.tcl - is that what you use?
User avatar
arfer
Master
Posts: 436
Joined: Fri Nov 26, 2004 8:45 pm
Location: Manchester, UK

Post by arfer »

Nope, the script uses http.tcl

You would get an error from the line 'package require http' if it wasn't available.

Did you read my last post? Colour?? Characters ??
I must have had nothing to do
User avatar
spithash
Master
Posts: 248
Joined: Thu Jul 12, 2007 9:21 am
Location: Libera
Contact:

Post by spithash »

This script is just awesome :D

I tweaked it a little bit to suit my needs and it's just GREAT :)

Thanks Arfer :)
Libera ##rtlsdr & ##re - Nick: spithash
Click here for troll.tcl
Post Reply