UNOFFICIAL incith-google 2.1x (Nov30,2o12)

MellowB · Post by **MellowB** » Sat May 24, 2008 2:23 pm

speechles wrote: This should work... The problem with adding wikimedia sites, is the entire site becomes your 'country'. So to add say, 'thissite.com/wiki' to the encode strings with utf-8 encoding is as simple as adding the below to encode_strings:
thissite.com/wiki:utf-8

I did actually try this already but it is not working.
Using it with wiki.theppn.org - try searching for Utada Hikaru for example. Its still giving me gibberish for the Japanese name in the output.

speechles wrote: You will be suprised to know, it already does this. You have !translate (google's version of it) included with this script. Type !help translate in a channel where the script is running.
<speechles> !help translate
-sp33chy- --> Bot triggers available:
-sp33chy- !tr,!trans,!translate region@region <text> with 1 results.
<speechles> !tr en@fr hello france, this is english translated to french.
<sp33chy> Google says: (en->fr) bonjour la France, c'est Anglais traduit en français.

Oh gawd, haha, I feel pretty stupid now. I've been actually using this since some time then... lol. Thought it was done by some older script that i still have running, didn't realize that it was actually your script doing this... haha. Well thanks for clearing that up for me then. xD

Renegade · Post by **Renegade** » Sun May 25, 2008 4:38 am

speechles wrote:This fixes the problem with calculations/conversions/etc
http://ereader.kiczek.com/incith-google-v1.98e.tcl 269 KB (276,224 bytes)
If problems persist, clear your web cache and reget this file. There is no version update for this fix since it was simplistic.

Thank you very, very much...works fine again

speechles wrote:For google safe_search anamolies, I will need to check all google related site queries. There may be spots where I've removed this (for debugging purposes) and forgot to put it back in. I'm thinking video may allow sexual nature even with safe_search on because of this (to debug some sections the query line is temporarily changed). Over this coming weekend I will make the safe_search fully compliant on ALL google related sites (including youtube), to address that very issue.

I actually meant the results on Google's side, not the script. Sometimes, Google returns results that are not PG-13 even with safe search on - be it that the query is too scientific, or that certain trigger words are missing on the page. That's not fixable by you, of course, it's just another reason the blacklist might be helpful.

As for the blacklist, I was actually not thinking of doing it based on the result - just refusing service for certain words. Like, if somebody does "!wiki anal sex", it goes "Sorry, I do not return results for that query.". Just a blacklist for search terms.

Lastly, I understand your reasons for not implementing the other searches. Of course my reason for requesting them was that I didn't want to install additional scripts, but if you would get in trouble for that, it's just not worth it. It's easy enough to install the other scripts.

Again, thank you very much for your help, and your work in general. The script is a favorite among the users.

(Especially the !locate trigger, for some reason...I added an extra trigger !stalk for it now.)

speechles · Post by **speechles** » Sun May 25, 2008 8:22 pm

MellowB wrote:
speechles wrote: This should work... The problem with adding wikimedia sites, is the entire site becomes your 'country'. So to add say, 'thissite.com/wiki' to the encode strings with utf-8 encoding is as simple as adding the below to encode_strings:
thissite.com/wiki:utf-8
I did actually try this already but it is not working.
Using it with wiki.theppn.org - try searching for Utada Hikaru for example. Its still giving me gibberish for the Japanese name in the output.

<speechles> !wm .wiki.theppn.org Utada Hikaru
<sp33chy> Utada Hikaru | Utada Hikaru (‡0Ò«ë) is one of Japan's most successful artists of all time. Her debut album, First Love, is the best-selling album ever in Japan with over 7.65 million copies sold in Japan alone. She has sold over 41 million records worldwide (with over 34 million in home nation Japan). Moreover, 3 of her albums are in the Top 10 best-selling album of all time in Japan (#1, #4, #8),
<sp33chy> making her one of the most indefinitely successful and popular singers in J-pop history. She is bilingual as she was raised in both New York and Tokyo. Utada Hikaru is also known in the west under her English language project name 'Utada'. Utada also sang the Kingdom Hearts themes, Hikari / Simple and Clean and the theme songs for @ http://wiki.theppn.org/Utada_Hikaru

For me it works (keep in mind nothing is associated for this site in my encode_strings, I'm using eggdrops standard encoing for this) but for the japanese I get gibberish. This is unavoidable at the moment until I learn more about the apparent utf-8 problem with eggdrop and how to handle multiple transcodings within the same page..

The problem is the script can only transcode to a single encoding. While utf-8 supports all (supposedly). The work-around for the eggdrop utf-8 problem is to transcode from/to exact encodings. This works, but cannot work when pages are embedded with mixes of languages. If someone can enlighten me on how to recognize embedded language changes before transcoding from utf-8, in this way I can regsub inject encoding markups, which when the bot renders output can use these markups to better handle multiple encodings within the same output as utf-8 naturally does. If this all sounds very complicated, believe me it is..

@Renegade, how about instead of a blacklist. Which would simply tell the user 'Please use appropriate language in this channel.' (Eventually, this would get to spamming so often, it would be just as bad as what was there before). How about instead we build an aversion vocabulary. What this would consist of is something like this: