This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

split ||

Old posts that have not been replied to for several years.
Locked
User avatar
arcane
Master
Posts: 280
Joined: Thu Jan 30, 2003 9:18 am
Location: Germany
Contact:

split ||

Post by arcane »

hi
anyone a solution for this:

i've got a string "a||b||c||d" and i want to split it into a, b, c and d. now my problem is that tcl won't split by "||". it just splits by "|" and gives me too much parts.

Code: Select all

set test "a||b||c||d"
set length [llength [split $test "||"]]
length is now "7".

Code: Select all

set test "a|b|c|d"
set length [llength [split $test "||"]]
length is now "4".

i've tried everything i could think of (split "\\||", split {||}, split "\|\|"...). none did work. can you help me?
aVote page back online!
Check out the most popular voting script for eggdrop bots.

Join the metal tavern!
User avatar
user
 
Posts: 1452
Joined: Tue Mar 18, 2003 9:58 pm
Location: Norway

Post by user »

Check the manual.
'split' splits on ALL the chars specified in the second argument (if any).

Code: Select all

regexp -all -inline {[^|]+} $yourString
would return a list like you want, but doesn't care how many |'s there are between the "elements".
Another solution is replacing || with some single char not used anywhere else in your content (using 'string map' or 'regsub') and then split by that char.
p
ppslim
Revered One
Posts: 3914
Joined: Sun Sep 23, 2001 8:00 pm
Location: Liverpool, England

Post by ppslim »

Code: Select all

proc chunk {in chars} {
  if {[string first $chars $in] < 0} { return [list $in] }
  set temp [list]
  set chunks 0
  set chunke 0
  while {[set chunke [string first $chars $in $chunks]] != "-1"} {
    lappend temp [string range $in $chunks [expr $chunke - 1]]
    set chunks [expr $chunke + [string length $chars]]
  }
  if {[string length [string range $in $chunks end]]} {
    lappend temp [string range $in $chunks end]
  }
  return $temp
}
Simalar to split, however, it does it in chunks like you asked.
% set a "123,@.456,@.789,@.abc,@.def,>.ghi,@.jklm"
123,@.456,@.789,@.abc,@.def,>.ghi,@.jklm

% chunk $a ",@."
123 456 789 abc def,>.ghi jklm

% chunk $a ",>."
123,@.456,@.789,@.abc,@.def ghi,@.jklm
User avatar
user
&nbsp;
Posts: 1452
Joined: Tue Mar 18, 2003 9:58 pm
Location: Norway

Slight bug

Post by user »

This check:
ppslim wrote: if {[string length [string range $in $chunks end]]} {
will lead to invalid results if the last chars of the string is the chars you're "splitting" by. (should result in a empty element at the end)

Here's a rewrite of ppslim's proc that should produce results more like the original split:

Code: Select all

proc chop {str {by " "}} {
	set l [string length $by]
	set i 0
	set j 0
	while {[set j [string first $by $str $i]]>-1} {
		lappend out [string range $str $i [expr {$j-1}]]
		set i [expr {$j+$l}]
	}
	if {$i<=[string len $str]} {
		lappend out [string range $str $i end]
	}
	set out
}
EDIT: I still think

Code: Select all

proc chop {str {by "  "} {re \0}} {
  split [string map [list $by $re] $str] $re
}
is better (at least for text recieved from irc) :)
User avatar
strikelight
Owner
Posts: 708
Joined: Mon Oct 07, 2002 10:39 am
Contact:

Re: Slight bug

Post by strikelight »

user wrote:This check:
ppslim wrote: if {[string length [string range $in $chunks end]]} {
will lead to invalid results if the last chars of the string is the chars you're "splitting" by. (should result in a empty element at the end)

Here's a rewrite of ppslim's proc that should produce results more like the original split:

Code: Select all

proc chop {str {by " "}} {
	set l [string length $by]
	set i 0
	set j 0
	while {[set j [string first $by $str $i]]>-1} {
		lappend out [string range $str $i [expr {$j-1}]]
		set i [expr {$j+$l}]
	}
	if {$i<=[string len $str]} {
		lappend out [string range $str $i end]
	}
	set out
}
EDIT: I still think

Code: Select all

proc chop {str {by "  "} {re \0}} {
  split [string map [list $by $re] $str] $re
}
is better (at least for text recieved from irc) :)
It most definitley is better for ANY text... not only because of code size, but also cpu time wise.. the previous implementation would render approximately O(8n) instructions whereas the second one only renders
about O(3) instructions ... so if the text was 128 chars long, the first one would be issuing about 384 instructions (worst case scenario) to the processor, as opposed to the mere 3 instructions sent out by the shorter proc. Although I would have used \x81 instead of \0 myself ;)
User avatar
stdragon
Owner
Posts: 959
Joined: Sun Sep 23, 2001 8:00 pm
Contact:

Post by stdragon »

Just to nitpick, you're assuming that the operations in question have the same penalty time-wise, but that's wrong. If you think about it, "string map" and "split" both cycle through the entire string searching. Both procs are O(n).
User avatar
strikelight
Owner
Posts: 708
Joined: Mon Oct 07, 2002 10:39 am
Contact:

Post by strikelight »

stdragon wrote:Just to nitpick, you're assuming that the operations in question have the same penalty time-wise, but that's wrong. If you think about it, "string map" and "split" both cycle through the entire string searching. Both procs are O(n).
I was referring to the 'string map' versus the proc initially proposed by ppslim, which uses a while loop, as well as many other functions, which is obviously going to require a larger O notation.
And if you are referring to the proc which does use both split and string map and my calculation of big-O, you will see the word 'about' in my estimation (which is what O notation is).. It would be O(n+n) = O(2n) then.. and even then it's probably less, because when you think about it, if you were to use regsub in place of string map, you would find it takes longer in practical tests.. so assuming the regsub would be O(n), then string map < O(n) .. Nitpick nitpicked.
User avatar
user
&nbsp;
Posts: 1452
Joined: Tue Mar 18, 2003 9:58 pm
Location: Norway

Re: Slight bug

Post by user »

strikelight wrote:It most definitley is better for ANY text...
Not if the text can contain any char. Then it's useless.
strikelight wrote:the previous implementation would render approximately O(8n) instructions whereas the second one only renders about O(3) instructions ... so if the text was 128 chars long, the first one would be issuing about 384 instructions (worst case scenario) to the processor, as opposed to the mere 3 instructions sent out by the shorter proc.
By "instructions" I assume you mean command invocations, and counting them, like stdragon said, makes little sense.

Why think when we've got "time"? :P I named the three procs from this thread in the order they were posted and timed them:

Code: Select all

set a "ab||cde||fghi||jklmn||opqrst||uvwxyz0||12345678||"

foreach cmd {chop1 chop2 chop3} {
  puts "$cmd: [time [list $cmd $a ||] 10000]"
}
Result:
chop1: 377 microseconds per iteration
chop2: 118 microseconds per iteration
chop3: 36 microseconds per iteration
strikelight wrote:Although I would have used \x81 instead of \0 myself ;)
WHY?
\x81 can be sent via irc, \0 can't. (unless it's encoded in a ctcp iirc) That's my reason for using \0.
User avatar
strikelight
Owner
Posts: 708
Joined: Mon Oct 07, 2002 10:39 am
Contact:

Re: Slight bug

Post by strikelight »

user wrote:
strikelight wrote:It most definitley is better for ANY text...
Not if the text can contain any char. Then it's useless.
Hence the \x81 furthur on..
user wrote:
strikelight wrote:the previous implementation would render approximately O(8n) instructions whereas the second one only renders about O(3) instructions ... so if the text was 128 chars long, the first one would be issuing about 384 instructions (worst case scenario) to the processor, as opposed to the mere 3 instructions sent out by the shorter proc.
By "instructions" I assume you mean command invocations, and counting them, like stdragon said, makes little sense.
O-notation is largely used in computer science... To call it sensless, is pure ignorance. I suggest researching "O Notation" on google.
User avatar
user
&nbsp;
Posts: 1452
Joined: Tue Mar 18, 2003 9:58 pm
Location: Norway

Re: Slight bug

Post by user »

strikelight wrote:Hence the \x81 furthur on..
I still don't get it.
strikelight wrote:O-notation is largely used in computer science... To call it sensless, is pure ignorance. I suggest researching "O Notation" on google.
I didn't call O-notation senseless. What I meant is that it's very inaccurate when used on the uncompiled tcl code.
User avatar
stdragon
Owner
Posts: 959
Joined: Sun Sep 23, 2001 8:00 pm
Contact:

Post by stdragon »

strikelight wrote:
stdragon wrote:Just to nitpick, you're assuming that the operations in question have the same penalty time-wise, but that's wrong. If you think about it, "string map" and "split" both cycle through the entire string searching. Both procs are O(n).
I was referring to the 'string map' versus the proc initially proposed by ppslim, which uses a while loop, as well as many other functions, which is obviously going to require a larger O notation.
I was referring to those two procs too. Ppslim's does not require a bigger O notation, because "string map" is itself a function that uses a loop and many other functions. It is not a constant-time function. So the two procs are basically the same in terms of efficiency -- although ppslim's is slower overall because it's implemented in tcl instead of C. That doesn't change its O value.
strikelight wrote: And if you are referring to the proc which does use both split and string map and my calculation of big-O, you will see the word 'about' in my estimation (which is what O notation is).. It would be O(n+n) = O(2n) then.. and even then it's probably less, because when you think about it, if you were to use regsub in place of string map, you would find it takes longer in practical tests.. so assuming the regsub would be O(n), then string map < O(n) .. Nitpick nitpicked.
O(8n) = O(2n) = O(n) (you ignore constants). That's why I said both are O(n).

Just to clear this up: the purpose of big-O notation is to estimate the change in something like memory usage or running time relative to a change in input (n). In this case we're talking about string length. So if you double the string length, an O(n) algorithm will take double the time to finish. You can see that (2n) / (n) = 2, twice the time. If you have O(8n), you get (8 * 2n) / (8 * n) = 2 (same as O(n)). If you have an O(n^2) algorithm, you get ((2n)^2) / (n^2) = 4, which means it takes 4 times as long when you double the input.

Also, what does comparing regsub and string map have to do with anything? Ppslim's original proc didn't use regsub so I don't see where that comparison is going.. but even so, it's wrong, because regsub is not always O(n), it can be higher, like O(n^2) for certain operations, or lower, like O(1) for other operations.

nitpicked nitpick nitpick nitpicked :)
User avatar
strikelight
Owner
Posts: 708
Joined: Mon Oct 07, 2002 10:39 am
Contact:

Post by strikelight »

stdragon wrote:
O(8n) = O(2n) = O(n) (you ignore constants). That's why I said both are O(n).
Yes, this is true.. However, because tcl execution is quite slow in comparison to compiled languages, the coefficients become increasingly harder to simply discard.
stdragon wrote: Also, what does comparing regsub and string map have to do with anything? Ppslim's original proc didn't use regsub so I don't see where that comparison is going.. but even so, it's wrong, because regsub is not always O(n), it can be higher, like O(n^2) for certain operations, or lower, like O(1) for other operations.
I brought it up as a comparison for the short 'chop' procedure...
ie. regsub -all $by $str $re str in place of the string map...

So either (from timed results):
a) regsub does extra work to do the same thing as done by the string map (> O(n))
b) string map doesn't take an iterative-search approach to implementing it's changes (< O(n))

Nitpick nitpicked nitpicked nitpick nitpick nitpicked :o
Locked