After some further messing and testing around, 3 candidates remained:
BARE: a single bare line regexp, not (yet) generical for any channel

(stdragon)
SPDY: a speedy string comparison proc (egghead/stdragon)
SPLT: a good old line splitting and word by word checking (strikelight, minor mod by egghead)
First some differences are highlighted in what each proc considers GOOD/BAD:
Code: Select all
Testing: BARE
BARE GOOD join egghead
BARE GOOD join #egghead
BARE GOOD join #egghead!
BARE GOOD join#egghead
BARE GOOD join #egghead for fun
BARE GOOD #egghead
BARE GOOD #egghead!!!aaaaaaa
BARE BAD #egghead2
BARE GOOD #egghead-xxxxx
Testing: SPDY
SPDY GOOD join egghead
SPDY GOOD join #egghead
SPDY BAD join #egghead!
SPDY GOOD join#egghead
SPDY GOOD join #egghead for fun
SPDY GOOD #egghead
SPDY BAD #egghead!!!aaaaaaa
SPDY BAD #egghead2
SPDY BAD #egghead-xxxxx
Testing: SPLT
SPLT GOOD join egghead
SPLT GOOD join #egghead
SPLT BAD join #egghead!
SPLT BAD join#egghead
SPLT GOOD join #egghead for fun
SPLT GOOD #egghead
SPLT BAD #egghead!!!aaaaaaa
SPLT BAD #egghead2
SPLT BAD #egghead-xxxxx
Next the procs are tested against a list of 20 identical strings, each string being "hello #egghead join #xxx for great fun"
Code: Select all
Testing 20 strings WITH channel names (10000 times)
BARE: 855 microseconds per iteration (average of 42 microseconds per line)
SPDY: 741 microseconds per iteration (average of 37 microseconds per line)
SPLT: 807 microseconds per iteration (average of 40 microseconds per line)
BARE: 919 microseconds per iteration (average of 45 microseconds per line)
SPDY: 758 microseconds per iteration (average of 37 microseconds per line)
SPLT: 806 microseconds per iteration (average of 40 microseconds per line)
Several tests indicated that the SPDY and SPLT are the fastest, but the differences between the 3 are minor.
Next the procs were tested against 20 identical strings without any channel name in it ("hello world, this is a line"):
Code: Select all
Testing 20 strings WITHOUT channel names (10000 times)
BARE: 364 microseconds per iteration (average of 18 microseconds per line)
SPDY: 235 microseconds per iteration (average of 11 microseconds per line)
SPLT: 753 microseconds per iteration (average of 37 microseconds per line)
BARE: 366 microseconds per iteration (average of 18 microseconds per line)
SPDY: 301 microseconds per iteration (average of 15 microseconds per line)
SPLT: 750 microseconds per iteration (average of 37 microseconds per line)
In this case SPLT has some difficulties to catch up, because it has to check each word in the line up to the last one for the presence of a channel name. These differences can be amplified by adding more words to the line.
Finally, the procs are tested against 20 identical lines containing 8 times the valid channelname as words in it ("#egghead #egghead etc...):
Code: Select all
Testing 20 strings WITH channel names (1000 times)
BARE: 2221 microseconds per iteration (average of 111 microseconds per line)
SPDY: 2283 microseconds per iteration (average of 114 microseconds per line)
SPLT: 1250 microseconds per iteration (average of 62 microseconds per line)
BARE: 2646 microseconds per iteration (average of 132 microseconds per line)
SPDY: 2275 microseconds per iteration (average of 113 microseconds per line)
SPLT: 1230 microseconds per iteration (average of 61 microseconds per line)
In this extreme case the SPLT proc clearly is the fastest. These results can be further amplified by adding the channel name a couple of times more. In such case the SPLT will remain the fastest, followed by BARE and SPDY as last one.
Based on the above results both the SPDY and the SPLT are good solutions. If the proc has to be used to check general IRC conversation, then the SPDY proc is preferred over the SPLT proc. In that case the SPLT proc can be upgraded, by adding the line "if {[string first {#} $string] == -1 } { return 0 }" as the first line in the proc.
It will be very interesting to see how the results compare with a proc using an [lsearch -all -inline] (available in Tcl 8.4) followed by a channelname check.
Finally, as mentioned by stdragon, note that there are no checks for valid word separators in the SPDY and LIST proc, there is some in the BARE proc.
Adding such check, which is left as a trivial exercise for the interested reader, will raise the timing results of the SPDY and the LIST procs in an identical way, since they perform the channel name checks in the same way.
The procs:
Code: Select all
#---------------------------------------------------------------------
# one line regexp by stdragon
#---------------------------------------------------------------------
proc stringisbadBARE { channel string } {
if { [regexp -nocase {#(?!egghead(\W|$))} $string] } { return 1 }
return 0
}
#---------------------------------------------------------------------
# split lines and test by strikelight (slightly modified)
#---------------------------------------------------------------------
proc stringisbadSPLT { channel string } {
# if {[string first {#} $string] == -1 } { return 0 }
foreach word [split $string] {
if {![string match "*#*" $word]} { continue }
if {[string compare -nocase $word $channel] == 0 } { continue }
return 1
}
return 0
}
#---------------------------------------------------------------------
# review occurances of # in string (egghead/stdragon)
#---------------------------------------------------------------------
proc stringisbadSPDY {channel string } {
set idx 0
while {[set idx [string first {#} $string $idx]] != -1} {
set space [string first { } $string $idx]
if {$space == -1} {set space end} {incr space -1}
set word [string range $string $idx $space]
if {[string compare -nocase $channel $word ]} {return 1}
set idx $space
}
return 0
}