This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Problem with special characters ² ³ and °

Help for those learning Tcl or writing their own scripts.
User avatar
arfer
Master
Posts: 436
Joined: Fri Nov 26, 2004 8:45 pm
Location: Manchester, UK

Post by arfer »

Reference my post above, I have since done much reading and muttering under my breath in an attempt to understand the difference between ³ as per a copy/paste from say windows character map and a ³ as generated from the hex notation \xB3. I am non the wiser. Even binary scanning the character shows them to have the same underlying value.

At least I am 50% happy in that I have a solution. Simply build up the regsub pattern BOTH from an explicit copy/paste of the characters themself AND from their implicit hex equivalents.

use --> [regsub -all -- {[°²³\xB0\xB2\xB3]} $varname "\\\\&"]

This is the result when I build up the variable using windows character map for the special characters :-

% set mytest "at xy³ + y² temperature is 14°"
at xy³ + y² temperature is 14°
% return [regsub -all -- {[°²³\xB0\xB2\xB3]} $mytest "\\\\&"]
at xy\³ + y\² temperature is 14\°

This is the result when I build up the variable using hex notation for the special characters :-

% set mytest "at xy\xB3 + y\xB2 temperature is 14\xB0"
at xy³ + y² temperature is 14°
% return [regsub -all -- {[°²³\xB0\xB2\xB3]} $mytest "\\\\&"]
at xy\³ + y\² temperature is 14\°

Works for both.

My guess is that this is pretty much the dirty solution the original poster found.

Please, please, somebody explain what the difference is so that I may sleep peacefully.

/me wanders off threatening a terrible revenge on all descendants of Charles Babbage.
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

Probably has more to do with encodings, using eggdrop v1.6.17 and tcl 8.4
<speechles> .tcl set mytest "at xy³ + y² temperature is 14°"
<bot> Tcl: at xy? + y? temperature is 14?
<speechles> .tcl set testing [regsub -all -- {[°²³]} $mytest "\\\\&"]
<bot> Tcl: at xy\? + y\? temperature is 14\?
This is how it works as iso8859-1, those chars aren't represented correctly they get the question mark treatment. But apparently it has worked because the escapes are properly placed. But if we instead use "utf-8"...
<speechles> .tcl set mytest [encoding convertto "utf-8" "at xy³ + y² temperature is 14°"]
<bot> Tcl: at xy³ + y² temperature is 14°
<speechles> .tcl set testing [regsub -all -- {[°²³]} $mytest "\\\\&"]
<sp33chy> Tcl: at xy³ + y² temperature is 14°
Fails, but I can see them clearly...
The work around of course is to use binary/octal/decimal/hex notation when referencing these characters or using the correct encoding to begin with...

When eggdrop finally supports utf-8, and latin charsets aren't confused with iso8859-1 representations.. Well, at that time all this stuff will probably not need work arounds any longer..
User avatar
arfer
Master
Posts: 436
Joined: Fri Nov 26, 2004 8:45 pm
Location: Manchester, UK

Post by arfer »

Works fine for me using partyline Tcl.

My guess is you are not using a utf-8 compliant IRC client or you are in the bot's partyline via telnet.

Should work in DCC CHAT within mIRC or XChat (I'm using mIRC) providing they are set to display utf-8 by default.
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

arfer wrote:Works fine for me using partyline Tcl.

My guess is you are not using a utf-8 compliant IRC client or you are in the bot's partyline via telnet.

Should work in DCC CHAT within mIRC or XChat (I'm using mIRC) providing they are set to display utf-8 by default.
Why does it matter what my irc client does? You aren't seeing the bigger picture. What you are trying to regsub, the encoding you set it to, and what you are regsubbing in, it's encoding both matter.
<speechles> .tcl set mytest [encoding convertto "utf-8" "at xy³ + y² temperature is 14°"]
<bot> Tcl: at xy³ + y² temperature is 14°
<speechles> .tcl set mytest2 [encoding convertto "utf-8" "\[°²³\]"]
<sp33chy> Tcl: [°²³]
<speechles> .tcl set testing [regsub -all -- "$mytest2" $mytest "\\\\&"]
<sp33chy> Tcl: at xy\Â\³ + y\Â\² temperature is 14\Â\°
This is meant to demonstrate working outside of the bot's internal encoding or system encoding. You can get it to work, you just have to be explicit.

If you check out the unofficial incith google script, you can see this issue causes problems in several places and has numerous work arounds.
User avatar
arfer
Master
Posts: 436
Joined: Fri Nov 26, 2004 8:45 pm
Location: Manchester, UK

Post by arfer »

Sorry my mistake.

The bot's partyline seems incapable of interpreting/displaying UTF-8 characters by default. No amount of encoding seems to change that, as your post confirms.

My original posts were using a public commands Tclsh and so done through a mIRC bot channel, hence my solution works because mIRC displays UTF-8 by default.

My solution would likewise work in a TCL script not confined to display within a non UTF-8 environment.
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

arfer wrote:My solution would likewise work in a TCL script not confined to display within a non UTF-8 environment.
When eggdrop v1.6.20 is released, eggdrop should become a workable utf-8 environment finally. At that time I'll probably update my irc client, until then... work-around is the name of the game. ^_~
Post Reply