forked: UNOFFICIAL incith-google 2.0.0c (Sep9,2o11)

xREVx · Post by **xREVx** » Sun Feb 05, 2012 10:52 am

oh i just thought you were gone, speechles. glad to hear from you!!

in the mean time, here's a quick fix for google results:

add these lines below the code which the comment says "# added because of recent google changes, needed to clean-up *.google.* links":

Code: Select all

regsub -all {%20class=(.*)$} $link { } link
regsub -all {</(.+)>} $desc { } desc
regsub -all {<div(.+)$} $desc { } desc
regsub -all {</(.*)$} $desc { } desc
regsub -all {<a href=(.*)$} $desc { } desc

so your code will look like this:

Code: Select all

# added because of recent google changes, needed to clean-up *.google.* links
if {[string match "*url\?*" $link]} {
regexp -- {url\?q=(.+?)$} $link - link
regexp -- {(.+?)\&sig=} $link - link
regexp -- {(.+?)\&usg=} $link - link
regexp -- {\?url=(.+?)$} $link - link
}

# quick fix
regsub -all {%20class=(.*)$} $link { } link
regsub -all {</(.+)>} $desc { } desc
regsub -all {<div(.+)$} $desc { } desc
regsub -all {</(.*)$} $desc { } desc
regsub -all {<a href=(.*)$} $desc { } desc

this should get us rolling while speechles doesn't release the official fix

before the quick fix:

<~User> !g test
<~Bot> Test.com Web Based Testing and Certification Software v2.0</a></h3><div clas @ http://test.com/%20class=l%20onmousedown=return%20rwt
(this,'','','','1','AFQjCNFOu11ntRBzX7MsPNhB_fDzErp8qg','','0CDQQFjAA',null,event)

after:

<~User> !g test
<~Bot> Test.com Web Based Testing and Certification Software v2.0 @ http://test.com/

Arkadietz · Post by **Arkadietz** » Sun Feb 05, 2012 3:52 pm

10x a lot xREVx

xREVx · Post by **xREVx** » Sun Feb 05, 2012 7:35 pm

np mate!

just added another line of code on my previous post, check it out

tommytom · Post by **tommytom** » Thu Feb 23, 2012 2:02 am

I wasn't having that problem, but I will add it anyways in case it comes up (can't stand when HTML is spewed all over IRC like that).

I need to get back to fixing these things. I was on 1.99 (and fixed it) for so long I still need to understand how this one works.

I really want weather working, but I did it the wrong way the last time. I meant to put more time up for this script myself but haven't, so I know where speechles is coming from.

tommytom · Post by **tommytom** » Thu Feb 23, 2012 4:15 am

normal search fix
search for this around line 1647 (mine might vary)

Code: Select all

# regular search

and then modify this:

Code: Select all

        # regular search
        } else {
          if {![regexp -- {class=g(?!b).*?<a href="(.+?)".*?>((?!<).+?)</a>} $html - link desc]} {
            if {[regexp -- {class=r.*?<a href="(.+?)".*?>((?!<).+?)</a>} $html - link desc ]} {
              regsub -- {class=r.*?<a href=".+?".*?>(?!<).+?</a>} $html "" html
            }
          } else {
            regsub -- {class=g(?!b).*?<a href=".+?".*?>.+?</a>} $html "" html

To this:

Code: Select all

        # regular search
        } else {
          if {![regexp -- {class="?g(?!b).*?<a href="(.+?)".*?>((?!<).+?)</a>} $html - link desc]} {
            if {[regexp -- {class="?r.*?<a href="(.+?)".*?>((?!<).+?)</a>} $html - link desc ]} {
              regsub -- {class="?r.*?<a href=".+?".*?>(?!<).+?</a>} $html "" html
            }
          } else {
            regsub -- {class="?g(?!b).*?<a href=".+?".*?>.+?</a>} $html "" html

Alternatively, you can manually find/replace all instances of:

Code: Select all

class=r

with:

Code: Select all

class="?r"?

and do the same for "g" or any others that might vary. Please not this won't work in the wildcard-only "match" parts. This only works in regex strings!

The single ? means the letter to the left may or may not exist. It will match either and only one character can fit in there (unlike .*?). This will stop the breakage of parts of the script due to the introduction or removal of the quotes around the class id. I strongly suggest doing it to all instance of "r", but I'm not doing it just yet as I am still troubleshooting other parts and don't want to make regressions elsewhere.

Before:

Code: Select all

<~TommyTom> !g average penis length
<~TTBot> 1,410,000 results
<~TommyTom> !g test
<~TTBot> 3,410,000,000 results | Test Your Awareness: Do The Test - YouTube @ http://www.youtube.com/watch?v=Ahg6qcgoay4
<~TommyTom> !g test pdf
<~TTBot> 1,820,000,000 results

As you can see, you get no results, or only one (usually videos, it seems).

After:

Code: Select all

<~TommyTom> !g average penis length
<~TTBot> 1,410,000 results | Human penis size - Wikipedia, the free encyclopedia <di @ http://en.wikipedia.org/wiki/Human_penis_size | Human penis size - Wikipedia, the free encyclopedia <di @ http://en.wikipedia.org/wiki/Human_penis_size | Human penis size - Wikipedia, the free encyclopedia <di @ http://en.wikipedia.org/wiki/Human_penis_size
<~TommyTom> !g test
<~TTBot> 3,410,000,000 results | Test.com Web Based Testing and Certification Software v2.0 @ http://test.com/ | Test.com Web Based Testing and Certification Software v2.0 @ http://test.com/ | Test.com Web Based Testing and Certification Software v2.0 @ http://test.com/
<~TommyTom> !g test pdf
<~TTBot> 1,820,000,000 results | [PDF] PDF Test Page www.educati @ http://www.education.gov.yk.ca/pdf/pdf-test.pdf | [PDF] PDF Test Page www.educati @ http://www.education.gov.yk.ca/pdf/pdf-test.pdf | [PDF] PDF Test Page www.educati @ http://www.education.gov.yk.ca/pdf/pdf-test.pdf

I think there are some truncated <div>s in there, so they don't get stripped out. Probably need some cleanup code BEFORE the desc truncation.

=================
Time fix

~line 1512
find:

Code: Select all

        # time:
        } elseif {[string match "*src=\"http://www.google.com/chart*chc=localtime*" $html] == 1} { 
          regexp -nocase -- {src="http://www.google.com/chart\?chs=.*?chc=localtime.*?><td valign=[a-z]+>(.+?)</table>} $html - desc
          regsub -- {<br>} $desc ". " desc
          regsub -all {<.*?>} $desc "" desc
          regsub -- {chc=localtime} $html {} html

replace with:

Code: Select all

        # time:
        } elseif {[string match "*class=\"g tpo\"*class=\"s rbt\"*class=obcontainer*" $html] == 1} { 
          regexp -nocase -- {class="g tpo".*?class="s rbt".*?class=obcontainer.*?<table.*?<td.*?>(.+?)</table>} $html - desc
          regsub -- {<br>} $desc ". " desc 
          regsub -all {<.*?>} $desc "" desc
          regsub -- {class="g tpo".*?class="s rbt".*?class=obcontainer.*?<table.*?<td.*?>.+?</table>} $html {} html

Before:

Code: Select all

<~TommyTom> !g time in new york
<~TTBot> 11,400,000,000 results | Current time in New York, United States - daylight savings time 2012 ... @ http://24timezones.com/world_directory/current_new_york_time.php | Current time in New York, United States - daylight savings time 2012 ... @ http://24timezones.com/world_directory/current_new_york_time.php | Current time in New York, United States - daylight savings time 2012 ... @
<~TTBot> http://24timezones.com/world_directory/current_new_york_time.php

After:

Code: Select all

<~TommyTom> !g time in new york
<~TTBot> 3:38am Thursday (EST) - Time in New York, NY

Since it's plain-text now (no URLs or images), I removed the cleanup code.

Be careful with this one because I don't know if those class IDs will change or get reused. I tried to match on "time in" as well even with the bold tags, but it wouldn't so it's not as solid of a match as I would like. Wish they had put some kinda of image URL to match on...

======
Going to bed now. Was looking into the weird "apple" search result (probably has to do with it showing the map of an apple store) and also the "define:" area. Need to figure out what match is being triggered for apple (probably just going into the wrong area because of all the "answers" and the ad(s)) and would be extremely helpful if I could see what "define:" output should look like (old logs or old posts, if anyone has any) as I don't quite get the code in there a don't recall what it looks like (plus, it's been broken since I found this script, so I dunno if it's changed).

Edit:
Fixed the regsub in time: to allow to get "long answer" if you have that option set (default is short).

Trixar_za · Post by **Trixar_za** » Thu Feb 23, 2012 9:59 am

The trouble with the replacement of class=r and class=g is that using ? like that only works with regex and not string matches (which uses glob). That could potentially break the script. The second probably is that some of the regex searches for the whole thing, so class="?r> wouldn't work, it needs to be class="?r"?>. Just a thought

Well done on the rest of it through

tommytom · Post by **tommytom** » Thu Feb 23, 2012 3:51 pm

Yes, you are right. That's why I didn't do it. What I really meant was "add it manually/automatically to each properly", the properly was assumed and didn't check into it. I realized the "?r" situation later (I will go strike out my previous post).

You could probably build a regex s&r in certain editors, but that's still asking for trouble. If it ain't broke, don't fix it. I guess just add the "?r"? type stuff when it comes up (new one that changed and it got broken by this minuet problem).

Also, to add to the crap it would have broke doing a s&r, it would break class="r blah". Well, it would still match, but there is no need for the ? in there since it will always have quotes because of the space (AFAIK).

Sooooo many things broken still. "population of", "define:", "apple" (and others like it), etc. Seems like they are all falling into a "short answer" trap and only showing one result or none at all.

I put my old weather fix into my script just to have weather working (bit lackluster and I undo some stuff I shouldn't have, but I gotta learn how it is supposed to look.. no one has posted a log/ss for me to see).

Edit:
I revised it. And you are absolutely correct about the regular wildcard match string (glob?). I used class=r myself and didn't even think of this. Would have broken them.

tommytom · Post by **tommytom** » Fri Feb 24, 2012 12:01 am

"more answers" fix

@line ~1426
Find this:

Code: Select all

        # more answers
        } elseif {[string match "*\{google.rrep('answersrep'*" $html]} {
          regexp -- {<div id=res class=med role=main>.*?<h3 class=r>(.*?)</h3>} $html - desc

Change to:

Code: Select all

        # more answers
        } elseif {[string match "*\{google.rrep('answersrep'*" $html]} {
          regexp -- {class=\"g answers.+?>.+?class=\"?r\"?>(.+?)<} $html - desc

before:

Code: Select all

<~TommyTom> !g shrek release date
(desc variable error in console. No output in IRC.)

after:

Code: Select all

<~TommyTom> !g shrek release date
<~TTBot> Best guess for Shrek Release Date is May 18, 2001

No sure if it's supposed to be that plain. No bolding. In Opera, the date is bold.

tommytom · Post by **tommytom** » Fri Feb 24, 2012 4:43 pm

I refixed the "time:" section in this post: http://forum.egghelp.org/viewtopic.php?p=98848#98848

Please recopy that part.
I left the stripping in (don't think it's needed) and put that last regsub back in and replaced it with the new regex that finds it.

Didn't realize at the time that the regsub is to strip the match criteria so that the next loop(s) will find the "long answer" search results (normal results, not answers).

Honestly, I think this is inefficient. Could you not set a $answerFound variable or something so that the second loop will not go into that section?

You could skip all the elseifs as well making the code more optimal.

Each regex/wildcard match update will only require updating that one line, not the regsub at the end as well to strip it. Each of those would just have "set answerFound 1" at the end and never have to edit that part again.

(psuedo code.. don't know TCL that well)

Code: Select all

set answerFound 0
if (!$answerFound){
  #do answer stuff in here
  if (someanswer regex match){
    #blah blah
    set answerFound 1
  }
} else {
  #regular search results here
}

Not familiar enough with this code yet to say if that is possible, but it should be with some restructuring.

speechles · Post by **speechles** » Fri Feb 24, 2012 11:49 pm

<~TommyTom> !g average penis length
<~TTBot> 1,410,000 results | Human penis size - Wikipedia, the free encyclopedia <di @ http://en.wikipedia.org/wiki/Human_penis_size | Human penis size - Wikipedia, the free encyclopedia <di @ http://en.wikipedia.org/wiki/Human_penis_size | Human penis size - Wikipedia, the free encyclopedia <di @ http://en.wikipedia.org/wiki/Human_penis_size
<~TommyTom> !g test
<~TTBot> 3,410,000,000 results | Test.com Web Based Testing and Certification Software v2.0 @ http://test.com/ | Test.com Web Based Testing and Certification Software v2.0 @ http://test.com/ | Test.com Web Based Testing and Certification Software v2.0 @ http://test.com/
<~TommyTom> !g test pdf
<~TTBot> 1,820,000,000 results | [PDF] PDF Test Page www.educati @ http://www.education.gov.yk.ca/pdf/pdf-test.pdf | [PDF] PDF Test Page www.educati @ http://www.education.gov.yk.ca/pdf/pdf-test.pdf | [PDF] PDF Test Page www.educati @ http://www.education.gov.yk.ca/pdf/pdf-test.pdf

This is not fixing anything. Notice it's the same 3 results. Instead of clogging my thread with all this FIX-THIS-PART bullshit, please CREATE YOUR OWN THREAD. I plan on fixing this script correctly, myself. Hence this thread should not contain a flood of fixes by anyone else. Does this make it clear? Do we all understand? I've had some work issues, health issues, and other real life things happening in my life. Please take it upon yourself to create yourself a new thread and stop cluttering mine with this junk...

Tommytom, thanks for the effort but it is not correct, breaks the multi-language featueres, and appears rough and messy. Worst of all, it is not based on the latest google which I have not made public yet

. It uses webby's encoders to correct encodings. And it fixes quite a few issues. The more people tamper with my thread. The longer I shall delay releasing that version here.

When I release fixes, It will be a complete script. If you want to continue this "edit here", "find this"... Please do it in ANOTHER thread... Please GO BACK and REMOVE your posts from my thread. Re-create them in your own. This makes my thread look like utter SH!T having all this bullsh!t....

You said yourself, you don't understand the code and are making shots in the dark at fixing it. I know the code, I have a debug version which makes this easy. I know the script like the back of my hand.

Can a moderator please remove all of these posts, this one included, any that were made after this date --> Posted: Sun Feb 05, 2012 12:05 am

Thanks to that moderator aplenty.

spithash · Post by **spithash** » Sat Feb 25, 2012 12:28 am

speechles is right. this doesn't fix anything. it's just a mess up. it will only show the same result 3 times.

I don't want to be rude to anybody but, I will say this clearly: patience is gold.

there are reasons some things don't happen when we want them to happen. most of the time it means that if it happens when we want them to happen, disaster comes along. despite that we always, but always we have to double think.

why is this script one of the most wanted?
the answer is that because it's so awesome that everyone is using it.

another question is that, why don't we all let the coder himself fix it when he is ready? BECAUSE WE'RE ALLL TRYING TO STEAL SOME OF HIS GLORY/AND/OR RUIN THE SCRIPT. seriously, guys, I would never speak to you like this but it came to a point that nobody is patient to things that are nothing to us but a hobby

I wish I didn't offend anybody.

peace.

PS: I personally admire some people's efford on fixing stuff, but as speechles said, keep it out of here. hence this is the official thread for the script.

Post by **nml375** » Sat Feb 25, 2012 12:40 pm

Moderated: Split from original thread UNOFFICIAL incith-google 2.0.0c (Sep9,2o11).

/NML_375

tommytom · Post by **tommytom** » Sun Feb 26, 2012 10:57 am

Man, you guys are pretty ungrateful.
Some guy tries to help fix things and you boot him out.

I guess I will just leech then.

I'm not fixing it perfectly, sure, but I am giving hotfixes for those that actually want the script to work (even partially), not 90-99% broken.

I don't see why you have a unreleased script if it is so perfect/better than my fixes. Kinda rude yourself to keep something like to yourself if it works.

Anyways, if anyone wants me to continue to share, PM or reply.
Doubt I will be back.

Edit: Thanks for the feedback on the duplicated results (didn't notice in my haste), but I still don't like the attitude. I would have made it better, but NVM.

xREVx · Post by **xREVx** » Sun Feb 26, 2012 11:21 am

No need to walk away like that, tommytom. This is a forked thread so we should be allowed to do whatever we want here.

This could be the thread where people get fixes faster

I'm grateful for what he has been doing, I've been using the script for quite a while now but I also agree he's pretty arrogant sometimes, and maybe because he thinks he's so great he's also too afraid of making mistakes, and that could be why he takes ages to release a fix...

I've shared my quick fix with the only intention of helping the community, because I thought it would be a bad attitude on my part if I kept the fix to myself while there are many others using the script and needing the fix.

Anyways, please keep doing it. Also, as a suggestion, maybe it'd be easier for everyone if you hosted the whole tcl file and just said what you've changed in it

tommytom · Post by **tommytom** » Sun Feb 26, 2012 11:59 am

Well, my fixes are experimental and only to get parts of it working. I have edited my copy so much, I will eventually have to take a clean copy and put my own fixes back in. I don't want to share this mangled copy I have.

I will make a fully edited beta script or something later maybe but my script has some custom bug fixes only for my editor (notepad++) that no one else would need.

Do you have a better suggestion for a syntax highlighting editor (not many for TCL, I'm sure)? notepad++ isn't that great (for TCL anyways). elseif isn't colored and unclosed regex quotes break the coloring (until another quote is found, the lines between are colorless as if the whole this is a quoted text). I currently have to add ;#" at the end of problem regex lines to fixes the lines after them.

I'm more of a utilitarian. I get it working, nothing more.
If I have to break a few things to get it working partially (over not at all), then I will. However, I don't want to turn that into a collection of these and call it some special script. If speechles wants to take my regexs or something to fix the main one "properly" without breaking the other languages (I don't care for that. We have google translate script(s) and mostly everyone speaks/reads English and 100% do in my channel). My goal was to get the important parts working, even partially, and speechles could take SOMETHING out of it if he hadn't fixed it already.

That said, if you want to take a clean copy (pre-fork), apply the fixes, and share it, then feel free. I really don't care. I'm fixing it for myself (and my channel) and posting how I did it. If someone has a better way of doing it, I don't care. Do it and share it if you like.