This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Striping out character

Help for those learning Tcl or writing their own scripts.
Post Reply
b
bras
Voice
Posts: 7
Joined: Fri Feb 03, 2006 8:38 am

Striping out character

Post by bras »

Hi,

I'm doing a script but I'm having some trouble to remove a character in a text. The text is «» TIME «»
I know that « is 171 in ASCII code and » 187, however I don't know how to represent them in a replacevar procedure. I tried:
set echo [replacevar $echo "\0171" ""]
set echo [replacevar $echo "\0187" ""]

Obviously didn't work :( Anyone could help me ?
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

\0171 and \0187 are invalid character escapes, they should be \253 and \273 (since 171 decimal is 253 octal and 187 is 273)

Code: Select all

string map {\253 {} \273 {}} $str
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
b
bras
Voice
Posts: 7
Joined: Fri Feb 03, 2006 8:38 am

Post by bras »

Hi demond, thanks very much for getting some time to help me. You were right about the codes, however I don't know why I don't see to be able to strip them out. Here is what I'm doing

Code: Select all

bind pubm "m|m" *\00312TIME* dotime

proc replacevar {strin what withwhat} {
        set output $strin
        set replacement $withwhat
        set cutpos 0
        while { [string first $what $output] != -1 } {
                set cutstart [expr [string first $what $output] - 1]
                set cutstop  [expr $cutstart + [string length $what] + 1]
                set output [string range $output 0 $cutstart]$replacement[string range $output $cutstop end]
        }
        return $output
}

proc dotime { nick host handle channel text } {
set text [split $text]
set time [lrange $text 5 end]
set echo $time               
set echo [replacevar $echo "\253" ""]
set echo [replacevar $echo "\273" ""]
    putserv "PRIVMSG #newsnet :$echo"
}
I don't know why the replacevar proc is not working for this characters. It has always worked for me. An example of the text where I'm stripping out would be:

In Rio de Janeiro : 23h 12m 30s «» TIME «»

What I want is only the time, which is not always in this format, that's why I'm trying to work with what is between : and «

Would you have any idea why I can't strip out « and » ?

Thanks again!
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

get rid of that [replacevar] proc, Tcl has built-in proc for replacing string(s) within a string, it's called [string map] (there is also [string replace] of course, but it doesn't suit you for what you need)
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
b
bras
Voice
Posts: 7
Joined: Fri Feb 03, 2006 8:38 am

Post by bras »

I used string map too, didn't work neither.
set data [string map {"\273" ""} $time]
Can remove everything else but those signs. Can't understand why. Thanks anyway for your patience demond.
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

really?

Code: Select all

% set a foo\273bar
foo?bar
% string map {\273 {}} $a
foobar
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
s
spock
Master
Posts: 319
Joined: Thu Dec 12, 2002 8:40 pm

Post by spock »

try \xAB and \xBB

actually f*** that, if demond's suggestion doesnt work then min ewont either (PEBKAC)
photon?
b
bras
Voice
Posts: 7
Joined: Fri Feb 03, 2006 8:38 am

Post by bras »

Yep... neither worked for me...

I found out that it's happening because there are color escapes near the characters I'm working with. Its not \003 though... are there (in case of yes, which are) any other ways to end a color escape besides \003 ?
b
bras
Voice
Posts: 7
Joined: Fri Feb 03, 2006 8:38 am

Post by bras »

Just to show what I'm talking about... forgot about the image
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

you simply don't know your codes

print them out with:

Code: Select all

foreach c [split $str {}] {binary scan $c H2 x; putlog "$c \\x$x"}
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
User avatar
awyeah
Revered One
Posts: 1580
Joined: Mon Apr 26, 2004 2:37 am
Location: Switzerland
Contact:

Post by awyeah »

Actually hes right. Today I was working with this, researched deeply on this topic for 2-3hrs and tested my bot.

The only codes which can be removed, stripped, detected in string or list are from the following range:

Code: Select all

In octal: \300-\377
In hexadecimal: \xC0-\xFF
I tried everything from regexp, regsub and even string map, but the codes from in the range:

Code: Select all

In octal: \200-\277
In hexadecimal: \x80-\xBF
were not detected through anyway. For this I also performed some tests. Here is one of them shown.

In this one I use the whole range as you can see 128 chars and for regexp matching I used \200-\277 & 300-\377 to detect, generally all should be detected, but only \300-\377 were detected.

Code: Select all

<awyeah> .tcl string length "€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"
<adapter> Tcl: 128

<awyeah> !test "€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"
<adapter> Remaining: "€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿"
Further I also used regsub to substitude and string map also, they gave me similar answers.

So my conclusion, for wasting the whole afternoon and working on this was that:

In the character range:

Code: Select all

octal: \200-\277 and \300-\377
hexadecimal: \x80-\xFF
Only the range:

Code: Select all

In octal: \300-\377
In hexadecimal: \xC0-\xFF
is detectable.
Last edited by awyeah on Tue Jul 10, 2007 8:25 pm, edited 1 time in total.
·­awyeah·

==================================
Facebook: jawad@idsia.ch (Jay Dee)
PS: Guys, I don't accept script helps or requests personally anymore.
==================================
User avatar
awyeah
Revered One
Posts: 1580
Joined: Mon Apr 26, 2004 2:37 am
Location: Switzerland
Contact:

Post by awyeah »

Follow up of my previous post. For testing:

In partyline I got this:

Code: Select all

<awyeah> .tcl string map {"Š" "" "Œ" "" "Ž" "" "œ" "" "ž" "" "Ÿ" ""} "werŠŒytyrtŽewreœtrwežrwetertŸfg"
<adapter> Tcl: werytyrtewretrwerwetertfg

<awyeah> .tcl string match "*Œ*" "werŠytyrtŽewreœtrwežrwetertŸfg"
<adapter> Tcl: 0

<awyeah> .tcl string match "*Œ*" "werŠŒytyrtŽewreœtrwežrwetertŸfg"
<adapter> Tcl: 1
This indicates everything is working correctly in partyline.
Now check, when I load the tcl into the bot and then test.

For this proc, (tcl loaded into the bot):

Code: Select all

bind pub - !test testing

proc testing {n u h c t} {
 set i [string map {"\x8A" "" "\x8C" "" "\x8E" "" "\x9C" "" "\x9E" "" "\x9F" ""} $t]
 putserv "PRIVMSG #adapter :String map: $i"
 if {[string match -nocase "*\x8C*" $t] || [string match -nocase "*\x9E*" $t]} {
 putserv "PRIVMSG #adapter :Match found"
 } else {
 putserv "PRIVMSG #adapter :No match found"
 }
}
and for the same string, I got these results:

Code: Select all

<awyeah> !test "werŠŒytyrtŽewreœtrwežrwetertŸfg"
<adapter> String map: "werŠŒytyrtŽewreœtrwežrwetertŸfg"
<adapter> No match found
Means there is definately something wrong.
Evidently, I also check for this proc:

Code: Select all

bind pub - !test testing

proc testing {n u h c t} {
 set i [string map {"Š" "" "Œ" "" "Ž" "" "œ" "" "ž" "" "Ÿ" ""} $t]
 putserv "PRIVMSG #adapter :String map: $t"
 if {[string match -nocase "*Œ*" $t]} {
 putserv "PRIVMSG #adapter :Match found"
 } else {
 putserv "PRIVMSG #adapter :No match found"
 }
}
It also gave me the same result as above:

Code: Select all

<awyeah> !test "werŠŒytyrtŽewreœtrwežrwetertŸfg"
<adapter> String map: "werŠŒytyrtŽewreœtrwežrwetertŸfg"
<adapter> No match found
Further more as a conclusion from what I've read there might be 2 identified problems for this case:

1) http://www.ascii.cl/htmlcodes.htm << this page lists that characters from the range \x80-\xBF (or \200-\277) are NOT defined in HTML 4 standard
2) From: /eggdrop/docs/known-problems
* High-bit characters are being filtered from channel names. This is a
fault of the Tcl interpreter, and not Eggdrop. The Tcl interpreter
filters the characters when it reads a file for interpreting. Update
your Tcl to version 8.1 or higher.

* Version 8.1 of Tcl doesn't support unicode characters, for example, è.
If those characters are handled in a script as text, you run into errors.
Eggdrop can't handle these errors at the moment.
However, strange as it may seem my shell provider has tcl version 8.4 and patch upto 8.4.11.

I think these major two are the basic problems, due to which my aim is not achievable. If anyone has anything to say or any comment, regarding my conclusion, please follow up my post.

Thanks,
JD
·­awyeah·

==================================
Facebook: jawad@idsia.ch (Jay Dee)
PS: Guys, I don't accept script helps or requests personally anymore.
==================================
User avatar
awyeah
Revered One
Posts: 1580
Joined: Mon Apr 26, 2004 2:37 am
Location: Switzerland
Contact:

Post by awyeah »

Actually, I got it infact. Its quite easy, I readup today about encoding different ascii character sets, and then tested on some. The major two which can be used for this case are: cp1252 and iso8859-1.

I tried with cp1252 for the proc below, it didnot completely strip the characters and ended up with stripping some and leaving some weird characters as you can see in the output.

Code: Select all

bind pub - !test testing

proc testing {n u h c t} {
 regsub -all {[\200-\377]} [encoding convertfrom cp1252 $t] {} a
 putserv "privmsg #adapter :CP1252: $a"
 regsub -all {[\200-\377]} [encoding convertfrom iso8859-1 $t] {} b
 putserv "privmsg #adapter :ISO8859-1: $b"
}
When I used iso8859-1, everything was stripped off completely as I wanted it to be, see the results below. :)

Code: Select all

<awyeah> !test "df€fdgdf„fg…d†dsderyrt‡ˆdfŠ‹ŒŽdf‘’ertdfse“”•–—˜™š›œerftždsŸ trydsrt¡¢£¤¥¦sdf§¨©ª«¬­rtyrt®¯°dsf±fsd²³´µ¶·¸¹º»dsfsd¼½¾¿ÀÁÂdfsdtrysdfsdtytrrtÄÅÆjhÇÈmjhmÉÊËmkhjrtÌÍÎÏÐÑmkhjÓk,hÔÕjhØÙ,klÛÜuiÝÞhjßàákhjâãkuytiuyikåweæçèfsewrdêëdsìíîïsdfðdfsffsfsósdôsdfsddstyfrtsdö÷øùúûsdfsdüýþÿ"

<adapter> CP1552: "df¬fdgdffg&d dsderyrt!Ædf`9R}dfertdfse"Ü"a:Serft~dsxtrydsrtsdfrtyrtdsffsddsfsddfsdtrysdfsdtytrrtjhmjhmmkhjrtmkhjk,hjh,kluihjkhjkuytiuyikwefsewrddssdfdfsffsfssdsdfsddstyfrtsdsdfsd"

<adapter> ISO8859-1: "dffdgdffgddsderyrtdfdfertdfseerftdstrydsrtsdfrtyrtdsffsddsfsddfsdtrysdfsdtytrrtjhmjhmmkhjrtmkhjk,hjh,kluihjkhjkuytiuyikwefsewrddssdfdfsffsfssdsdfsddstyfrtsdsdfsd"
Hence to completely be able to use the complete range \200-\377 or \x80-\xFF you need to encode the text in the proc and convertfrom iso8859-1.

Mission successful!
·­awyeah·

==================================
Facebook: jawad@idsia.ch (Jay Dee)
PS: Guys, I don't accept script helps or requests personally anymore.
==================================
Post Reply