I've had similar problems with the translate function, and tried to clumsily add convertto to it to solve some language encoding problems, but it's far from foolproofed.. heh<sp33chy> Currently: can't read " 5?C1;8:0 !@18X0 X5 :>=B8=5=B0;=0 4@6020 :>X0 A5 =0;078 C XC3>8AB>G=>X 2@>?8 (=0 0;:0=A:>;C>AB@2C) 8 C A@54Z>X 2@>?8 (0=>=A:>X =878X8). # A0AB02C 5?C1;8:5 !@18X5 AC 8 425 0CB>=>:@0X8=5 >X2>48=0 8 >A>2> 8 5B>E8X0. 5?C1;8:0 !@18X0 X5 453 =0@>40 8 A28E 4@C38E 3@0R0=0 :>X8 C Z>X 6825, 70A=>20=0 =0 45X ?@82@548, ?>HB>20ZC YC4A:8E ?@020 8 2;04028=8 ?@020. !@18X0 A": no such variable
djevrek wrote:Code: Select all
<djevrek> !w .sr Srbija <Grgo> !@18X0 | 5?C1;8:0 !@18X0 X5 :>=B8=5=B0;=0 4@6020 :>X0 A5 =0;078 C XC3>8AB>G=>X 2@>?8 (=0 0;:0=A:>;C>AB@2C) 8 C A@54Z>X 2@>?8 (0=>=A:>X =878X8). # A0AB02C 5?C1;8:5 !@18X5 AC 8 425 0CB>=>:@0X8=5 >X2>48=0 8 >A>2> 8 5B>E8X0. 5?C1;8:0 !@18X0 X5 453 =0@>40 8 A28E 4@C38E 3@0R0=0 :>X8 C Z>X 6825, 70A=>20=0 =0 45X ?@82@548, ?>HB>20ZC YC4A:8E ?@020 8 2;04028=8 ?@020. !@18X0 A @ http://sr.wikipedia.org/wiki/Srbija
I think i've found an easy way to remedy this if that looks correct to you. cp1251 = Serbian language encoding.<speechles> !w .sr Srbija
<sp33chy> Ñðáè¼à | Ðåïóáëèêà Ñðáè¼à ¼å êîíòèíåíòàëíà äðæàâà êî¼à ñå íàëàçè ó ¼óãîèñòî÷íî¼ Åâðîïè (íà Áàëêàíñêîì ïîëóîñòðâó) è ó ñðåäœî¼ Åâðîïè (Ïàíîíñêî¼ íèçè¼è). Ó ñàñòàâó Ðåïóáëèêå Ñðáè¼å ñó è äâå àóòîíîìíå ïîêðà¼èíå Âî¼âîäèíà è Êîñîâî è Ìåòîõè¼à. Ðåïóáëèêà Ñðáè¼à ¼å äåìîêðàòñêà äðæàâà ñðïñêîã íàðîäà è ñâèõ äðóãèõ ãðààíà êî¼è ó œî¼ æèâå, çàñíîâàíà íà äåìîêðàòñêèì íà÷åëèìà, òðæèøí @ http://sr.wikipedia.org/wiki/Srbija
<speechles> !w .sr Srbija#toc
<sp33chy> Ñðáè¼à | ToC: Ãåîãðàôè¼à; Èñòîðè¼à; Òåðèòîðè¼àëíà îðãàíèçàöè¼à; Ãðàäîâè; Äåìîãðàôè¼à; Íàðîäè è íàöèîíàëíå ìàœèíå; £åçèê; Âåðîèñïîâåñò; Äðæàâíè ñèìáîëè; Ïîëèòèêà; Ïðàâîñóå; Ïðàâà ãðààíà; Åêîíîìè¼à; Òóðèçàì; Ñàîáðàžà¼; Êóëòóðà; Ëèêîâíå óìåòíîñòè; Ñðåäœè âåê; Ìîäåðíî äîáà; œèæåâíîñò; Ìóçèêà; Êëàñè÷íà ìóçèêà; Ïîçîðèøòå è ôèëì; Ñâåòñêà êóëòóðíà áàøòèíà ÓÍÅÑÊÎ-à ó Ñðáè¼è; Ôåñòèâàëè; Ðàçâî¼ íàóêå è âèñîêîã øêîëñòâà;
<sp33chy> Îáðàçîâàœå; Ïðàçíèöè; Âèäè ¼îø; Ãàëåðè¼à ñëèêà; Ðåôåðåíöå; Ñïîšàøœå âåçå; Âëàäà; Îñòàëî @ http://sr.wikipedia.org/wiki/Srbija#toc
<speechles> !w .sr Srbija#Ãåîãðàôè¼à
<sp33chy> Ñðáè¼à | Ãåîãðàôè¼à Ñðáè¼à ñå íàëàçè íà Áàëêàíó - ðåãèîíó ¼óãîèñòî÷íå Åâðîïå (îêî 80% òåðèòîðè¼å) è ó Ïàíîíñêî¼ íèçè¼è - ðåãèîíó ñðåäœå Åâðîïå (îêî 20% òåðèòîðè¼å). Íî, ãåîãðàôñêè, à è êëèìàòñêè, ¼åäíèì äåëîì ñå óáðà¼à è ó ìåäèòåðàíñêå çåìšå. Óêóïíà äóæèíà ãðàíèöà ñà îêîëíèì çåìšàìà èçíîñè 2.027 km. Äóæèíà ãðàíèöà ïî äðæàâàìà ñóñåäèìà èçíîñè: Àëáàíè¼à 115 km, Áîñíà è Õåðöå @
<sp33chy> http://sr.wikipedia.org/wiki/Srbija#.D0 ... 1.98.D0.B0 [1 Redirect(s)]
Code: Select all
set html [encoding convertto "cp1251" $html]
Put what here? I'm telling you, if I choose UTF-8 it appears exactly the same as using standard eggdrop unicode, no difference at all. Don't know why either, it just does. So the trick I used above with convertto "cp1251" is working, it just doesn't look right on my American English mIRC 6.12 client (which is what I pasted). But would've looked right to any Serbian in channel that saw it, you see. So give me some time to make a list of "wikipedia country:country encoding". It will be a big list. Then I'll either use a giant case statement or a list with a foreach, haven't decided yet. But it won't be soon (unless soon means a week; yes, it may take that long), as this list takes time to build. Rome wasn't built in a day, and this script is large in scope, and complicated, and best of all.. FREE.djevrek wrote:No, this doesn't look good. This is not proper Serbian language. Try to compare it with online wikipedia page on the links above. Proper encoding would be with UTF-8. Can you try it and put it here please? If it's ok with UTF-8 can you tell me what did you change so i can change it too, or can you tell me when we can expect new version of script?
Btw, this why I chose "cp1251" to represent Serbian even tho it's not 100% correct, it's the best possible encoding for eggdrop.Currently: unknown encoding "iso-8859-5"
Currently: while executing
Currently: "encoding convertto "iso-8859-5" $html"
Code: Select all
set html [encoding convertto "utf-8" $html]
Code: Select all
regsub -all " " $html " " html
regsub -all ";;>" $html "" html
}
set html [encoding convertto "utf-8" $html]
set match ""
Okay, let's go over this, #2 results specificially.djevrek wrote:I did that, but nothing much happends. I just got same old weird characters, not what i want to see. OK, for now i will wait for you to find out some other way to fix this.
P.S. I really don't know anything about TCL, but maybe something from here (http://www.google.com/codesearch?hl=en& ... tnG=Search) can help you with this problem.
This has potential, but.. like i said, i need to make a list because.. using the bot wih utf-8 and expecting multi-lingual greatness is broke or something. So...?share/dotlrn0/packages/acs-tcl/tcl/html-email-procs.tcl - 13 identical
41: # convert text to charset
set encoding [ns_encodingforcharset $charset]
if {[lsearch [encoding names] $encoding] != -1} {
set html_body [encoding convertto $encoding $html_body]
set text_body [encoding convertto $encoding $text_body]
} else {
Code: Select all
<c0nv1ct> !google weather eindhoven
<risponditore> Weather for Eindhoven, Netherlands: 63°F, Wind: SE at 4 mph, Humidity: 82%, <div style="padding:5px;float:left" align=center>Sun <img style="border:1px solid #bbc;margin-bottom:2px" src="/images/weather/mostly_sunny.gif" alt="Mostly Sunny" title="Mostly Sunny" width=40 height=40 border=0> <nobr>84°F | 66°F</nobr>
Code: Select all
<c0nv1ct> !google weather amsterdam
<risponditore> Weather for Amsterdam, Netherlands: 63°F, Clear, Wind: SE at 9 mph, Humidity: 88%
That one is pretty easy to fix, but requires a kludge rather than a real fix, as the method to detect weather results is a bit clumsy.c0nv1ct wrote:Thanks for the wikimedia addition! #sabayon on freenode appreciates your work
Only problem i've noticed is the weather parsing for some cities.
Code: Select all
# weather!
} elseif {[string match "*/images/weather/*" $html] == 1} {
regexp -- {<p.*?class=e>.*?<td><div.*?>(.+?)</div>.*?<td><div.*?>(.+?)<.*?>(.+?)<.*?>(.+?)<.*?>(.+?)</div>.*?</table>} $html - w1 w2 w3 w4 w5
regsub -- {<p.*?class=e>(.*?)</table} $html {} html
if {[string match "*<*" $w5]} {
set w5 ""
} else {
set w5 ", ${w5}"
}
set desc "$w1\: $w2, $w3, $w4$w5"
regsub -all -- {°} $desc {°} desc
set link ""
regsub -all -- {weather} $input {} input
##NoWrap################################################################################################################################################
Elfriede wrote:Anybody has an idea, what i'm making wrong ?
edit:
I'm using: v1.9.6 - July 27th, 2oo7
Code: Select all
# what to use to seperate results, set this to "\n" and it will output each result
# on a line of its own. the seperator will be removed from the end of the last result.
variable seperator " | "
sidenote: Still working on that big ol wikipedia list (the country:encodings), and that will soon be finished, it's just tedious doing it all by hand. I should have something to show by this coming weekend and it should make Serbians happy.<speechles> !g mirc
<sp33chy> 12,300,000 Results
<sp33chy> mIRC - An Internet Relay Chat program @ http://www.mirc.com/
<sp33chy> Download mIRC or the mIRC FAQ. @ http://www.mirc.com/get.html
<sp33chy> #mIRC-DALnet Resource Center @ http://www.mirc.org/
<sp33chy> - mIRC Scripting Network - mIRC Scri @ http://www.mirc.net/
Code: Select all
# what to use to seperate results, set this to "\n" and it will output each result
# on a line of its own. the seperator will be removed from the end of the last result.
variable seperator "\n"
and yes, i've rehashed|22:04:02| <~User> !g mirc
|22:04:02| <&Fantc> 12,400,000 Results | mIRC - An Internet Relay Chat program @ http://www.mirc.com/ | Download mIRC or the mIRC FAQ. @ http://www.mirc.com/get.html | #mIRC-DALnet Resource Center @ http://www.mirc.org/ | - mIRC Scripting Network - mIRC Scri @ http://www.mirc.net/