This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

help replacing some extended ascii characters

Old posts that have not been replied to for several years.
b
blotter45
Voice
Posts: 17
Joined: Mon Feb 28, 2005 8:37 am

help replacing some extended ascii characters

Post by blotter45 »

hiya guys ;)

got a quick question, and I didn't find an answer for it from searching here.

I need to take a text string, and replace ascii characters (or all or them would be fine too) into their ascii hex equivalents, so it can be sent to a php script, for example:

set t [string map {"§" "%A7"} $t]
set t [string map {"®" "%AE"} $t]

that method (and regsub) works fine for normal ascii characters, but not extended ascii characters, like the 2 shown above. And without that conversion, query strings with extended ascii characters keep failing.

Here's a sample query that gets sent:

Code: Select all

http://domain.com/scs/info.php?username=§ent§
but I would need that to get changed to this before being sent:

Code: Select all

http://domain.com/scs/info.php?username=%A7ent%A7
Can somebody help me with a proc to replace ALL extended ascii chars in a string into their ascii hex equivalents?
User avatar
user
 
Posts: 1452
Joined: Tue Mar 18, 2003 9:58 pm
Location: Norway

Post by user »

http://tcl.tk/man/tcl8.5/TclCmd/http.htm#M28

Code: Select all

proc makeUrlencodeMap {} {
	for {set i 0} {$i<256} {incr i} {
		set c [format %c $i]
		if {![string match \[a-zA-Z0-9\] $c]} {
			set tmp($c) %[format %.2X $i]
		}
	}
	array set tmp {" " +   \n %0D%0A}
	array get tmp
}
set urlencodeMap [makeUrlencodeMap]

proc urlencode string {
	string map $::urlencodeMap $string
}
EDIT: fixed a bug that came with the loop i stole from my http package :P (2.3)
Last edited by user on Mon Feb 28, 2005 12:35 pm, edited 1 time in total.
Have you ever read "The Manual"?
b
blotter45
Voice
Posts: 17
Joined: Mon Feb 28, 2005 8:37 am

Post by blotter45 »

thanks for the quick response m8!

I tried the method you posted, but I get tcl errors when I use it:

Code: Select all

proc p_userinfo { n u h c t } {
  global scripturl ui_pass ui_salt
  if {[userinfo:auth $c]} {
    if { [llength [split $t]] < "1" } {
      putquick "PRIVMSG $c :\0034\002\[UINFO\]:\002 USAGE: \002!userinfo <username>\002\003"
      return 0
    }
    
    set ft [join [list $t]]
    set t $ft
   
    set t [http::formatQuery $t]
    set agent "Mozilla"
    set query "$scripturl?querytype=userinfo&username=$t&password=$ui_pass&salt=$ui_salt"
    set page [http::config -useragent $agent]
    set page [http::geturl $query]
    set lines [split [::http::data $page] \n]
    set result [lindex $lines 1]
   
    if { [lrange "$result" 0 0] == "baduser" } {
      putquick "PRIVMSG $c :\0034\002\[UINFO\]:\002 invalid username: \002$ft\002\003"
      putlog "\0034\002\[UINFO\]:\002 ($c) \002$n\002 requested userinfo on invalid username: \002$ft\002\003"
    }
    if { [lrange "$result" 0 0] == "gooduser" } {
      putserv "PRIVMSG $c :[string range [lindex $lines 1] 9 end]"
      putserv "PRIVMSG $c :[lindex $lines 2]"
      putlog "\0033\002\[UINFO\]:\002 ($c) \002$n\002 received userinfo on username: \002$ft\002\003"
    }
  }
}
and it produces this error when called with a username with extended ascii chars (like §ent§):

Code: Select all

Tcl error [p_userinfo]: can't read "formMap(ᄃ)": no such element in array
but it still works with normal usernames
b
blotter45
Voice
Posts: 17
Joined: Mon Feb 28, 2005 8:37 am

Post by blotter45 »

sorry I didn't see that code you pasted until after I typed that last reply.

but when I added your code, and then used this in that proc of mine above:

set t [urlencode $t]

the script doesn't change the string before it gets sent (ie, §ent§ remains the same and gets sent that way)
User avatar
user
&nbsp;
Posts: 1452
Joined: Tue Mar 18, 2003 9:58 pm
Location: Norway

Post by user »

blotter45 wrote:the script doesn't change the string before it gets sent (ie, §ent§ remains the same and gets sent that way)
How are you checking this?
Have you ever read "The Manual"?
b
blotter45
Voice
Posts: 17
Joined: Mon Feb 28, 2005 8:37 am

Post by blotter45 »

I just add a line to putlog the value of $query to the partyline so I can see what is being sent out, and it shows that §ent§ gets sent out unaltered.
b
blotter45
Voice
Posts: 17
Joined: Mon Feb 28, 2005 8:37 am

Post by blotter45 »

sorry it took so long to get back again, but my cable was out most of the day..

ok, here is the latest code I'm using, and it represents all the additions from user (thanks again for your help m8):

Code: Select all

proc makeUrlencodeMap {} {
   for {set i 0} {$i<256} {incr i} {
      set c [format %c $i]
      if {![string match \[a-zA-Z0-9\] $c]} {
         set tmp($c) %[format %.2X $i]
      }
   }
   array set tmp {" " +   \n %0D%0A}
   array get tmp
}
set urlencodeMap [makeUrlencodeMap]

proc urlencode string {
   string map $::urlencodeMap $string
}

proc p_userinfo { n u h c t } {
  global scripturl ui_pass ui_salt
  if {[userinfo:auth $c]} {
    if { [llength [split $t]] < "1" } {
      putquick "PRIVMSG $c :\0034\002\[UINFO\]:\002 USAGE: \002!userinfo <username>\002\003"
      return 0
    }
    
    set ft [join [list $t]]
    set t "$ft"
    
    set t [urlencode $t]

    set agent "Mozilla"
    set query "$scripturl?querytype=userinfo&username=$t&password=$ui_pass&salt=$ui_salt"
    set page [http::config -useragent $agent]
    set page [http::geturl $query]
    set lines [split [::http::data $page] \n]
    set result [lindex $lines 1]
    set output [string range [lindex $lines 1] 9 end]
   
    if { [lrange "$result" 0 0] == "baduser" } {
      putlog "\[UINFO\] sent query: $query"
      putquick "PRIVMSG $c :\0034\002\[UINFO\]:\002 invalid username: \002$ft\002\003"
      putlog "\0034\002\[UINFO\]:\002 ($c) \002$n\002 requested userinfo on invalid username: \002$ft\002\003"
    }
    if { [lrange "$result" 0 0] == "gooduser" } {
      putlog "\[UINFO\] sent query: $query"
      putserv "PRIVMSG $c :[string range [lindex $lines 1] 9 end]"
      putserv "PRIVMSG $c :[lindex $lines 2]"
      putlog "\0033\002\[UINFO\]:\002 ($c) \002$n\002 received userinfo on username: \002$ft\002\003"
    }
  }
}
when I call that p_userinfo proc (with !uinfo username), here's some examples of the http queries that gets sent out:

!uinfo somedude

Code: Select all

[UINFO] sent query: http://www.domain.com/scs/test.php?querytype=userinfo&username=somedude&password=48573434&salt=fuuwy
!uinfo some dude

Code: Select all

[UINFO] sent query: http://www.domain.com/scs/test.php?querytype=userinfo&username=some+dude&password=48573434&salt=fuuwy
!uinfo §ome dude§

Code: Select all

[UINFO] sent query: http://www.domain.com/scs/test.php?querytype=userinfo&username=§ome+dude§&password=48573434&salt=fuuwy
!uinfo ®en

Code: Select all

[UINFO] sent query: http://www.domain.com/scs/test.php?querytype=userinfo&username=®en&password=48573434&salt=fuuwy
So, extended ascii characters still are not getting changed to hex before they are sent out :(

ie, ® should get changed into %AE, and § should get changed into %A7
b
blotter45
Voice
Posts: 17
Joined: Mon Feb 28, 2005 8:37 am

Post by blotter45 »

Just keeping this thread alive ;)

Does anybody have any ideas about getting certain extended ascii chars to translate?
User avatar
^DooM^
Owner
Posts: 772
Joined: Tue Aug 26, 2003 5:40 pm
Location: IronForge
Contact:

Post by ^DooM^ »

try this proc for urlencode

Code: Select all

proc urlencode { text } { 
  
    regsub -all {\%} $text "%25" text 
    set i [string length $text] 
    
    # Loop through and replace all non alpha / numeric characters
    # with their relevant %{HEX ascii code} values.
    while {([regexp -nocase {[^a-z0-9%]} $text toh])&&!($i < 0 )} { 
        scan "$toh" %c dec 
        regsub -all "\\${toh}" $text "%[format %X $dec]" text 
        incr i -1 
    } 

    if {$i < 0} {putlog "Infinite loop something is very wrong: $text"} 
    return $text 
} 
The lifecycle of a noob is complex. Fledgling noobs gestate inside biometric pods. Once a budding noob has matured thru gestation they climb out of their pod, sit down at a PC, ask a bunch of questions that are clearly in the FAQ, The Noob is born
b
blotter45
Voice
Posts: 17
Joined: Mon Feb 28, 2005 8:37 am

Post by blotter45 »

^DooM^ wrote:try this proc for urlencode

Code: Select all

proc urlencode { text } { 
  
    regsub -all {\%} $text "%25" text 
    set i [string length $text] 
    
    # Loop through and replace all non alpha / numeric characters
    # with their relevant %{HEX ascii code} values.
    while {([regexp -nocase {[^a-z0-9%]} $text toh])&&!($i < 0 )} { 
        scan "$toh" %c dec 
        regsub -all "\\${toh}" $text "%[format %X $dec]" text 
        incr i -1 
    } 

    if {$i < 0} {putlog "Infinite loop something is very wrong: $text"} 
    return $text 
} 
heya thanks for the input m8 ;)

I tried your proc, but when extended ascii characters are found in the trigger to call the p_userinfo proc, I get this error:

!uinfo §ent§

Code: Select all

Tcl error [p_userinfo]: couldn't compile regular expression pattern: invalid escape \ sequence
but when using only normal ascii characters, your proc works fine.
User avatar
^DooM^
Owner
Posts: 772
Joined: Tue Aug 26, 2003 5:40 pm
Location: IronForge
Contact:

Post by ^DooM^ »

got me beat :roll: maybe just ban users with extended chars in there nicks :lol: what you cant see wont hurt you :P
The lifecycle of a noob is complex. Fledgling noobs gestate inside biometric pods. Once a budding noob has matured thru gestation they climb out of their pod, sit down at a PC, ask a bunch of questions that are clearly in the FAQ, The Noob is born
User avatar
user
&nbsp;
Posts: 1452
Joined: Tue Mar 18, 2003 9:58 pm
Location: Norway

Post by user »

sounds like an encoding bug...what is your system encoding?

Code: Select all

putlog "System encoding:  [encoding system]"
try this proc:

Code: Select all

proc slowUrlEncode str {
	set out ""
	foreach c [split $str ""] {
		if {[string match {[A-Za-z0-9]} $c]} {
			append out $c
		} {
			append out [switch [string bytelen $c] {
				1 {format %%%.2X [scan $c %c]}
				2 {eval format %%%.2X%%%.2X [scan [encoding convertto utf-8 $c] %c%c]}
				3 {eval format %%%.2X%%%.2X%%%.2X [scan [encoding convertto utf-8 $c] %c%c%c]}
			}]
		}
	}
	set out
}
Have you ever read "The Manual"?
O
Ofloo
Owner
Posts: 953
Joined: Tue May 13, 2003 1:37 am
Location: Belguim
Contact:

Post by Ofloo »

% time "urlEncode $url" 100
203 microseconds per iteration

% time "slowUrlEncode $url" 100
379 microseconds per iteration
%

here a simple proc that is bit faster then the one above ..

Code: Select all

proc urlEncode {url} {
  if {[regexp {(.*)\?(.*)} $url -> pre sur]} {
    set result ${pre}?
    foreach {x} [split $sur {}] {
      if {[regexp {[^a-zA-Z0-9=%]} $x]} {
        append result "%[format %02x [scan $x %c]]"
      } else {
      	append result $x
      }
    }
    return $result
  }
  return $url
}
Last edited by Ofloo on Wed Mar 02, 2005 3:10 pm, edited 1 time in total.
XplaiN but think of me as stupid
b
blotter45
Voice
Posts: 17
Joined: Mon Feb 28, 2005 8:37 am

Post by blotter45 »

user wrote:sounds like an encoding bug...what is your system encoding?

Code: Select all

putlog "System encoding:  [encoding system]"
try this proc:

Code: Select all

proc slowUrlEncode str {
	set out ""
	foreach c [split $str ""] {
		if {[string match {[A-Za-z0-9]} $c]} {
			append out $c
		} {
			append out [switch [string bytelen $c] {
				1 {format %%%.2X [scan $c %c]}
				2 {eval format %%%.2X%%%.2X [scan [encoding convertto utf-8 $c] %c%c]}
				3 {eval format %%%.2X%%%.2X%%%.2X [scan [encoding convertto utf-8 $c] %c%c%c]}
			}]
		}
	}
	set out
}
awesome!!! that seems to work well, thanks alot m8! btw, my encoding is: iso8859-1
O
Ofloo
Owner
Posts: 953
Joined: Tue May 13, 2003 1:37 am
Location: Belguim
Contact:

Post by Ofloo »

Code: Select all

proc urlEncode {url} {
  if {[regexp {(.*)\?(.*)} $url -> pre sur]} {
    set result ${pre}?
    foreach {x} [split $sur {}] {
      if {[regexp {[^a-zA-Z0-9=%]} $x]} {
        append result "%[format %02x [scan $x %c]]"
      } else {
      	append result $x
      }
    }
    return $result
  }
  return $url
}
sorry edited it for got a %
XplaiN but think of me as stupid
Locked