UNOFFICIAL incith-google 2.1x (Nov30,2o12)

speechles · Post by **speechles** » Wed Feb 02, 2011 7:04 pm

spithash wrote:ok, adding an extra } fixed it, but it's not working at all now.

      # parse the html
      while {$results < $incith::google::youtube_results} {
        # somewhat extenuated regexp due to allowing that there might be an image next to the title
        if {[regexp -nocase {<span class="video-time">(.*?)</span.*?href="/watch\?v=(.+?)".+?title=".+?">(.+?)</a>.*?id="video\-description.*?>(.*?)</p.*?class="date\-added">(.+?)</span.*?class="viewcount">(.+?)</span} $html - ded4 cid desc ded ded2 ded3]} {
          if {[string match "*</span>*" $desc]} {
            regexp -nocase {<span class="video-time">(.*?)</span.*?href="/watch\?v=(.+?)".+?title="(.*?)">.*?id="video\-description.*?">(.*?)</p.*?class="date\-added">(.+?)</span.*?class="viewcount">(.+?)</span} $html - ded4 cid desc ded ded2 ded3
          }
          regsub -nocase {<span class="img">.*?</div>      </div>} $html "" html
        }
	}

my script looks like this ^

But after I !youtube search, it just shows/does nothing

EDIT: the bot ping timeouts and never comes back after the searching.

Buyer beware, you can't guess at how to fix it. Your bot is endlessly looping that while, forever... It is harder than one thinks to alter a script and have it function correctly, isn't it? Yes. In this case it is....

Why? Because you've merely changed the scrape, not the scrub as well. That isn't an inline regexp you see. That is your plain jane ordinary ol regular one that will continue to match. There is a corresponding scrubber (in this case, the regsub below) that goes hand-in-hand with this type of scraping method. If the regsub cannot scrub, then the regexp will continue to match the exact same parts of text. Forever. I didn't make it this way, it was made this way originally by incith. Here is how you should likely alter that regsub and fix the scrub and that nasty endless looping. Change the regsub below:

Code: Select all

regsub -nocase {<span class="img">.*?</div>      </div>} $html

"<span class="img>" and "</div> </div>" used to encapsulate each item. It no longer does, this will also need correcting. Hopefully this weekend I'll have a correct fix for this soon, until then try changing that above regsub.. to this:

Code: Select all

regsub -nocase {<span class="video-time">.*?</span.*?href="/watch\?v=.+?".+?title=".+?">.+?</a>.*?id="video\-description.*?>.*?</p.*?class="date\-added">.+?</span.*?class="viewcount">.+?</span} $html "" html

This is a complex scrubber that wastes clock cycles, but until I get around to fixing it properly. See if this works.

neocratic · Post by **neocratic** » Fri Feb 04, 2011 1:40 pm

speechles wrote: There will be shortly. It's just with this script I need to devote a serious slice of continuous time. It can't be short bursts of 15 minutes here or there. This weekend I will have that time to eliminate some of the problems that have over time resurfaced: google time, wikipedia, wikimedia, youtube, etc ... These all have issues in one way or another. When fixing these I will likely find even more issues and correct these along the way. This is why I tend to let things stack up before releasing a fix because I want to evolve it forward correcting long standing issues (like no bold in results when utf-8 patched?), inconsistent encodings, etc. The things that in the long run will create a better end product. Rushing to fix regex parsing bugs is a short term fix with no evolution to me..

But suffice to say, you don't need to read any of that diatribe above if you don't want to. It's just words. But expect a new version of this script this weekend. In that it will most assuredly correct the "time" problem you are experiencing.

Thanks a lot for the reply, i now have understood what you are trying to tell.I will be waiting for the next version update

spithash · Post by **spithash** » Mon Feb 07, 2011 1:43 pm

speechles: do you have the youtube fix somewhere uploaded?

I either must did something wrong idk, but I tried what you said on your previous post without any luck =/

I see sp33chy is working great though..

bfoos · Post by **bfoos** » Mon Feb 07, 2011 1:48 pm

spithash wrote:speechles: do you have the youtube fix somewhere uploaded?

I either must did something wrong idk, but I tried what you said on your previous post without any luck =/

I see sp33chy is working great though..

http://forum.egghelp.org/viewtopic.php?p=95867#95867

spithash · Post by **spithash** » Mon Feb 07, 2011 1:57 pm

bfoos wrote:!yt was more broken than that. A better temporary solution is to set...

variable youtube_results 0

Then add...

"yt:g:site:youtube.com %search%"

Under Custom Trigger Phrasing.

speechles is due to address this issue amongst others in an upcoming update.

Actually to be honest, it was my fault for not reading that post..

It worked great after doing so..

Thanks!

pogue · Post by **pogue** » Thu Feb 17, 2011 3:06 am

I'm seeing some problems in the wikipedia lookup now. I attempted to setup debug to see if there was any error, but nothing was sent to me.

Here is the query, all queries produce the same result:

[12:58am] <~pogue> !wiki suez canal
[12:58am] <+BodyBuildingBot> Jump to: navigation, search

I am using 2.0.0a

Info on the bot:

I am BodyBuild, running eggdrop v1.6.19: 13 users (mem: 841k).
Online for 1 day, 00:54 (background) - CPU: 00:06 - Cache hit: 4.0%
Admin: Kelso
Config file: bbbot.conf
OS: Linux 2.6.18-194.17.1.el5
Tcl library: /usr/share/tcl8.4
Tcl version: 8.4.13 (header version 8.4.13)
Tcl is threaded.

Here is the full text of the script I'm using (only alterations in the options section @ the beginning)
http://tcl.pastebin.com/8gd9GE3R

Help would be appreciated!

Thanks,
pogue

spithash · Post by **spithash** » Thu Feb 17, 2011 3:23 pm

They changed wikipedia' s website, that's why you get this error.

spithash · Post by **spithash** » Fri Feb 25, 2011 3:09 pm

OK, speechles fixed the wiki and added the youtube temporary fix in a kinda working pre-release so I thought it would be great to share it to you.

Make sure you guys set the script up because it is edited by me the way I have it loaded.

NOTE: MY BOT IF UTF-8 PATCHED SO YOU NEED TO CHANGE THOSE BACK TO DEFAULT (SEE A PREVIOUS RELEASE OR SOMETHING)

Code: Select all

 variable dirty_decode 1

    # enable gzip compression for bandwidth savings? Keep in mind
    # this semi-breaks some of the present utf-8 work-arounds and
    # eggdrop may mangle encodings when gzip compression that it 
    # doesn't when uncompressed html it used (default). A setting
    # of 0 defaults to uncompressed html, a 1 or higher gzip.
    # ------
    # NOTE: If you do not have Trf or zlib packages setting this
    # to 0 is recommened. Leaving it at 1 is fine as well, as the
    # script will attempt to find these commands or packages every
    # rehash or restart. But to keep gzip from ever being used it
    # is best to set the below variable to 0.
    # NOTE2: If you have Trf or zlib packages present, then this
    # should always be set to 1. You save enormous bandwidth and
    # time using this. If your bot is patched and you have Trf/zlib
    # then you should definitely leave this at 1 and you will never
    # suffer issues.
    # ------
    variable use_gzip 0

    # THIS IS TO BE USED TO DEVELOP A BETTER LIST FOR USE BELOW.
    # To work-around certain encodings, it is now necessary to allow
    # the public a way to trouble shoot some parts of the script on
    # their own. To use these features involves the two settings below.
    # -- DEBUG INFORMATION GOES BELOW --
    # set debug and administrator here
    # this is used for debugging purposes
    # ------
    variable debug 1
    variable debugnick spithashhh

    # AUTOMAGIC
    # with this set to 1, the bottom encode_strings setting will become
    # irrelevant. This will make the script follow the charset encoding
    # the site is telling the bot it is. 
    # This DOES NOT affect wiki(media/pedia), it will not encode automatic.
    # Wiki(media/pedia) still requires using the encode_strings section below.
    # ------
    # NOTE: If your bot is utf-8 pathced, leave this option at 1, the only
    # time to change this to 0 is if your having rendering problems.
    # ------
    variable automagic 1

    # UTF-8 Work-Around (for eggdrop, this helps automagic)
    # If you use automagic above, you may find that any utf-8 charsets are
    # being mangled. To keep the ability to use automagic, yet when utf-8
    # is the charset defined by automagic, this will make the script instead
    # follow the settings for that country in the encode_strings section below.
    # ------
    # NOTE: If you bot is utf-8 patched, set this to 0. Everyone else, use 1.
    # ------
    variable utf8workaround 0

So anyway, speechles is way too busy to make a complete release but soon enough, he will get back to that.

Until then, play with this one, here it is:

http://bsdunix.info/spithash/nagger/inc ... EMPfix.tcl

pogue · Post by **pogue** » Tue Mar 01, 2011 3:51 am

spithash wrote:OK, speechles fixed the wiki and added the youtube temporary fix in a kinda working pre-release so I thought it would be great to share it to you.

Thanks spithash & speechles!

Mabus4444 · Post by **Mabus4444** » Mon Mar 07, 2011 10:51 am

2.0 fixes the youtube problem but the wiki problem still isn't fixed for me. I get this error in the console;

Tcl error [incith::google::public_message]: Unknown option -urlencoding, must be: -accept, -proxyfilter, -proxyhost, -proxyport, -useragent

Trixar_za · Post by **Trixar_za** » Tue Mar 08, 2011 10:04 am

Probably because your using an older version of http.tcl - all you need is to find a newer copy of it and load it before this script. You can grab my copy @ http://www.trixarian.za.net/downloads/http.tcl

Mabus4444 · Post by **Mabus4444** » Sat Mar 12, 2011 7:45 pm

I'm using http.tcl version 2.5.2

I tried loading your copy instead, and restarted the bot. Same error message.

spithash · Post by **spithash** » Mon Mar 21, 2011 2:24 pm

Here is a newer version/update of http.tcl

http://bsdunix.info/spithash/nagger/http.tcl

Mabus4444 · Post by **Mabus4444** » Tue Mar 22, 2011 11:07 am

Thanks for the updated version.

The problem persists however, tried a rehash and a full restart to no avail. I get the following message in the console;

Tcl error [incith::google::public_message]: Unknown option -urlencoding, must be: -accept, -proxyfilter, -proxyhost, -proxyport, -useragent

speechles · Post by **speechles** » Thu Mar 24, 2011 1:46 am

Mabus4444 wrote:Thanks for the updated version.

The problem persists however, tried a rehash and a full restart to no avail. I get the following message in the console;

Tcl error [incith::google::public_message]: Unknown option -urlencoding, must be: -accept, -proxyfilter, -proxyhost, -proxyport, -useragent

My good sir, the answer is simple. The answer is clear, the answer is close, and the answer is here:

Code: Select all

set http [::http::config -useragent $ua -urlencoding "utf-8"]

Change that, to look like this...

Code: Select all

set http [::http::config -useragent $ua]

Also, there might be one or two of these to change:

Code: Select all

set http [::http::config -useragent $ua -urlencoding "utf-8"]

To look like this:

Code: Select all

set http [::http::config -useragent $ua]

Done... Ready for more?

Now before you begin, and apply all these changes.
Use the version below to update and exchange them.

::::: >> @everyone, especially Mabus4444
New Version: Incith:Google v2.0.0b
The version above corrects several small bugs, and enhances mediawiki/wikimedia and now parses wikia sites 100% as well. This brings a plethora of new custom trigger-phrases built-in, with literally thousands more for you to design yourself.

Code: Select all

      "fg:wm:.familyguy.wikia.com %search%"
      "ad:wm:.americandad.wikia.com %search%"
      "sp:wm:.southpark.wikia.com %search%"
      "sw:wm:.starwars.wikia.com %search%"
      "na:wm:.naruto.wikia.com %search%"
      "in:wm:.inuyasha.wikia.com %search%"
      "gr:wm:.gremlins.wikia.com %search%"
      "wow:wm:.wowwiki.com %search%"
      "smf:wm:.smurf.wikia.com %search%"
      "sm:wm:.sailormoon.wikia.com %search%"
      "pk:wm:.pokemon.wikia.com %search%"
      "ss:wm:.strawberryshortcake.wikia.com %search%"
      "mlp:wm:.mlp.wikia.com %search%"
      "lps:wm:.lps.wikia.com %search%"
      "ant:wm:.ants.wikia.com %search%"
      "gm:wm:.gaming.wikia.com %search%"
      "nt:wm:.nothing.wikia.com %search%"
      "ff:wm:.finalfantasy.wikia.com %search%"

All of these "custom trigger" phrases above allow short-cuts to access wikimedia long-names. If you need any explanation of how to construct custom trigger phrases ask. Triggers with very nested and complex combinations are possible which may not be overly apparent to the mere user of this script.

If you experience issues, shout them out. Yes, youtube is still technically broken. It is merely wrapped through google, with some custom trigger-phrasing logic to make it give the appearance it does work. This _will_ eventually be addressed when time comes. This demonstrates the power of custom trigger phrases and the potential they have to do wonderful things. So remember, youtube doesn't work. It will soon, until then, investigate the !video or !v trigger which does work. Everybody forgets about that trigger...

egghelp/eggheads community

UNOFFICIAL incith-google 2.1x (Nov30,2o12)

Re: About !g time country