normal search fix
search for this around line 1647 (mine might vary)
and then modify this:
Code: Select all
# regular search
} else {
if {![regexp -- {class=g(?!b).*?<a href="(.+?)".*?>((?!<).+?)</a>} $html - link desc]} {
if {[regexp -- {class=r.*?<a href="(.+?)".*?>((?!<).+?)</a>} $html - link desc ]} {
regsub -- {class=r.*?<a href=".+?".*?>(?!<).+?</a>} $html "" html
}
} else {
regsub -- {class=g(?!b).*?<a href=".+?".*?>.+?</a>} $html "" html
To this:
Code: Select all
# regular search
} else {
if {![regexp -- {class="?g(?!b).*?<a href="(.+?)".*?>((?!<).+?)</a>} $html - link desc]} {
if {[regexp -- {class="?r.*?<a href="(.+?)".*?>((?!<).+?)</a>} $html - link desc ]} {
regsub -- {class="?r.*?<a href=".+?".*?>(?!<).+?</a>} $html "" html
}
} else {
regsub -- {class="?g(?!b).*?<a href=".+?".*?>.+?</a>} $html "" html
Alternatively, you can manually find/replace all instances of:
with:
and do the same for "g" or any others that might vary. Please not this won't work in the wildcard-only "match" parts. This only works in regex strings!
The single ? means the letter to the left may or may not exist. It will match either and only one character can fit in there (unlike .*?). This will stop the breakage of parts of the script due to the introduction or removal of the quotes around the class id. I strongly suggest doing it to all instance of "r", but I'm not doing it just yet as I am still troubleshooting other parts and don't want to make regressions elsewhere.
Before:
Code: Select all
<~TommyTom> !g average penis length
<~TTBot> 1,410,000 results
<~TommyTom> !g test
<~TTBot> 3,410,000,000 results | Test Your Awareness: Do The Test - YouTube @ http://www.youtube.com/watch?v=Ahg6qcgoay4
<~TommyTom> !g test pdf
<~TTBot> 1,820,000,000 results
As you can see, you get no results, or only one (usually videos, it seems).
After:
Code: Select all
<~TommyTom> !g average penis length
<~TTBot> 1,410,000 results | Human penis size - Wikipedia, the free encyclopedia <di @ http://en.wikipedia.org/wiki/Human_penis_size | Human penis size - Wikipedia, the free encyclopedia <di @ http://en.wikipedia.org/wiki/Human_penis_size | Human penis size - Wikipedia, the free encyclopedia <di @ http://en.wikipedia.org/wiki/Human_penis_size
<~TommyTom> !g test
<~TTBot> 3,410,000,000 results | Test.com Web Based Testing and Certification Software v2.0 @ http://test.com/ | Test.com Web Based Testing and Certification Software v2.0 @ http://test.com/ | Test.com Web Based Testing and Certification Software v2.0 @ http://test.com/
<~TommyTom> !g test pdf
<~TTBot> 1,820,000,000 results | [PDF] PDF Test Page www.educati @ http://www.education.gov.yk.ca/pdf/pdf-test.pdf | [PDF] PDF Test Page www.educati @ http://www.education.gov.yk.ca/pdf/pdf-test.pdf | [PDF] PDF Test Page www.educati @ http://www.education.gov.yk.ca/pdf/pdf-test.pdf
I think there are some truncated <div>s in there, so they don't get stripped out. Probably need some cleanup code BEFORE the desc truncation.
=================
Time fix
~line 1512
find:
Code: Select all
# time:
} elseif {[string match "*src=\"http://www.google.com/chart*chc=localtime*" $html] == 1} {
regexp -nocase -- {src="http://www.google.com/chart\?chs=.*?chc=localtime.*?><td valign=[a-z]+>(.+?)</table>} $html - desc
regsub -- {<br>} $desc ". " desc
regsub -all {<.*?>} $desc "" desc
regsub -- {chc=localtime} $html {} html
replace with:
Code: Select all
# time:
} elseif {[string match "*class=\"g tpo\"*class=\"s rbt\"*class=obcontainer*" $html] == 1} {
regexp -nocase -- {class="g tpo".*?class="s rbt".*?class=obcontainer.*?<table.*?<td.*?>(.+?)</table>} $html - desc
regsub -- {<br>} $desc ". " desc
regsub -all {<.*?>} $desc "" desc
regsub -- {class="g tpo".*?class="s rbt".*?class=obcontainer.*?<table.*?<td.*?>.+?</table>} $html {} html
Before:
Code: Select all
<~TommyTom> !g time in new york
<~TTBot> 11,400,000,000 results | Current time in New York, United States - daylight savings time 2012 ... @ http://24timezones.com/world_directory/current_new_york_time.php | Current time in New York, United States - daylight savings time 2012 ... @ http://24timezones.com/world_directory/current_new_york_time.php | Current time in New York, United States - daylight savings time 2012 ... @
<~TTBot> http://24timezones.com/world_directory/current_new_york_time.php
After:
Code: Select all
<~TommyTom> !g time in new york
<~TTBot> 3:38am Thursday (EST) - Time in New York, NY
Since it's plain-text now (no URLs or images), I removed the cleanup code.
Be careful with this one because I don't know if those class IDs will change or get reused. I tried to match on "time in" as well even with the bold tags, but it wouldn't so it's not as solid of a match as I would like. Wish they had put some kinda of image URL to match on...
======
Going to bed now. Was looking into the weird "apple" search result (probably has to do with it showing the map of an apple store) and also the "define:" area. Need to figure out what match is being triggered for apple (probably just going into the wrong area because of all the "answers" and the ad(s)) and would be extremely helpful if I could see what "define:" output should look like (old logs or old posts, if anyone has any) as I don't quite get the code in there a don't recall what it looks like (plus, it's been broken since I found this script, so I dunno if it's changed).
Edit:
Fixed the regsub in time: to allow to get "long answer" if you have that option set (default is short).