This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

sugestion

Old posts that have not been replied to for several years.
Locked
User avatar
caesar
Mint Rubber
Posts: 3776
Joined: Sun Oct 14, 2001 8:00 pm
Location: Mint Factory

sugestion

Post by caesar »

Hi,
I'm going to put some trivia on a channel and I have about 11000 questions in a file. Some of them are repeated and I want to remove the *clones* and leave only one. Also, for the to long questions to be removed. Any ideeas? I was thinking to get the *line* length and remove it if it's bigger than *x*, or something like this..
Once the game is over, the king and the pawn go back in the same box.
User avatar
arcane
Master
Posts: 280
Joined: Thu Jan 30, 2003 9:18 am
Location: Germany
Contact:

Post by arcane »

hehe... i would write me some short c++ program to do this for me :D
User avatar
caesar
Mint Rubber
Posts: 3776
Joined: Sun Oct 14, 2001 8:00 pm
Location: Mint Factory

Post by caesar »

Heh, good for you. I don't now C* at all :)
Once the game is over, the king and the pawn go back in the same box.
User avatar
De Kus
Revered One
Posts: 1361
Joined: Sun Dec 15, 2002 11:41 am
Location: Germany

hmmm

Post by De Kus »

if the whole thing is read in a table, you would only have to make 2 foreach and check each line against each other. this could take a while, but its a onetime srcipt, so who cares? :). for the length... isnt there a linelength count command!? just include this check ^-^.
User avatar
arcane
Master
Posts: 280
Joined: Thu Jan 30, 2003 9:18 am
Location: Germany
Contact:

Post by arcane »

for clone-detection:
i think it saves time, if you first sort the lines and then check only the next...
c++: if youre interested in a win-app i could help you.
p
ppslim
Revered One
Posts: 3914
Joined: Sun Sep 23, 2001 8:00 pm
Location: Liverpool, England

Post by ppslim »

This is a small Tcl application. It doesn't work on eggdrop, as it's designed for the command line.

Code: Select all

#!/usr/bin/tclsh

proc out {text} {
  puts stdout $text
}

proc usage {} {
  global argv0
  out "Usage: $argv0 \[InputFile\] \[OutputFile\] <MaxLen>"
  out "\[InputFile\] : File to filter for duplicate or long entries"
  out "\[OuputFile\] : File to create with filtered output"
  out "<MaxLen> : If lines should be a maximum length, specify the limit here (optional). 0 or not given disables this"
  exit
}

if {$argc < 2} { usage }
set in [lindex $argv 0]
set out [lindex $argv 1]
set max 0
if {$argc >= 3} { set max [lindex $argv 2] }
if {[regexp -- {[^0-9]} $max]} {
  out "Invalid max length given"
  usage
}

if {(![file exits $in]) || ([file exists $out])} {
  out "Input file doesn't exist, or output is allready there"
  usage
}

set fp [open $in r]
set buf [read $fp]
set outbuf [list]
close $fp

out "Scanning file"

set idx 1
foreach line $buf {
  if {[lsearch -exact [lrange $buf $idx end] $line] >= 0} {
    incr idx
    continue
  }
  if {($max) && ([string length $line] > $max)} {
    incr idx
    continue
  }
  lappend outbuf $line
  incr idx
}
out "Writting output buffer"
se fp [oepn $out w]
foreach a $outbuf {
  puts $fp $a
}
close $fp
out "Output complete. Discarded [expr [llength $buf] - [llength $outbuf]] record(s)"

If this works, and the speed of it is another matter.

However, there is one small issue. It doesn't account for different letter cases in duplicates.

IE
This is a test by PPSlim!
This is a test by ppslim!
The above two are different.
User avatar
caesar
Mint Rubber
Posts: 3776
Joined: Sun Oct 14, 2001 8:00 pm
Location: Mint Factory

Post by caesar »

Hi, Thank you for your kind help on this. I've bumped in a problem:

Code: Select all

[irc@delta irc]$ tclsh sort.tcl trivia.wri result.wri 0
bad option "exits": must be atime, attributes, channels, copy, delete, dirname, executable, exists, extension, isdirectory, isfile, join, lstat, mtime, mkdir, nativename, owned, pathtype, readable, readlink, rename, rootname, size, split, stat, tail, type, volumes, or writable
    while executing
"file exits $in"
    (file "sort.tcl" line 1)
I'll look imediatly after the error.
Once the game is over, the king and the pawn go back in the same box.
User avatar
caesar
Mint Rubber
Posts: 3776
Joined: Sun Oct 14, 2001 8:00 pm
Location: Mint Factory

Post by caesar »

I've located the *exists* and seems to be corectly. I've also uncomented them and got another error:

Code: Select all

[irc@delta irc]$ tclsh sort.tcl trivia.wri result.wri 0
Scanning file
list element in quotes followed by ";" instead of space
    while executing
"foreach line $buf { 
if {[lsearch -exact [lrange $buf $idx end] $line] >= 0} { 
incr idx 
continue 
} 
if {($max) && ([string length $line] > $max)} {..."
    (file "sort.tcl" line 39)
Also I've noticed a lil typing mistake:

Code: Select all

se fp [oepn $out w]
should be:

Code: Select all

set fp [open $out w]
Once the game is over, the king and the pawn go back in the same box.
User avatar
caesar
Mint Rubber
Posts: 3776
Joined: Sun Oct 14, 2001 8:00 pm
Location: Mint Factory

Post by caesar »

Also, I've noticed that it stops when encounters in the *in* file a ';', a '?' and I guess there is more..
Once the game is over, the king and the pawn go back in the same box.
p
ppslim
Revered One
Posts: 3914
Joined: Sun Sep 23, 2001 8:00 pm
Location: Liverpool, England

Post by ppslim »

Having got more tiem to work on it now, though I still havn't tested it, there are a few major changes.

Code: Select all

#!/usr/bin/tclsh

proc out {text} {
  puts stdout $text
}

proc usage {} {
  global argv0
  out "Usage: $argv0 \[InputFile\] \[OutputFile\] <MaxLen>"
  out "\[InputFile\] : File to filter for duplicate or long entries"
  out "\[OuputFile\] : File to create with filtered output"
  out "<MaxLen> : If lines should be a maximum length, specify the limit here (optional). 0 or not given disables this"
  exit
}

if {$argc < 2} { usage }
set in [lindex $argv 0]
set out [lindex $argv 1]
set max 0
if {$argc >= 3} { set max [lindex $argv 2] }
if {[regexp -- {[^0-9]} $max]} {
  out "Invalid max length given"
  usage
}

if {(![file exists $in]) || ([file exists $out])} {
  out "Input file doesn't exist, or output is allready there"
  usage
}

set fp [open $in r]
set buf [split [read $fp] \n]
set outbuf [list]
close $fp

out "Scanning file"

set idx 1
foreach line $buf {
  if {[lsearch -exact [lrange $buf $idx end] $line] >= 0} {
    incr idx
    continue
  }
  if {($max) && ([string length $line] > $max)} {
    incr idx
    continue
  }
  lappend outbuf $line
  incr idx
}
out "Writting output buffer"
se fp [open $out w]
foreach a $outbuf {
  puts $fp $a
}
close $fp
out "Output complete. Discarded [expr [llength $buf] - [llength $outbuf]] record(s)"

I had a nasty brain fart, and I think I am getting the flu (I feel rotton, and the rest of the family claims they are ill).

I had made major "Just woekn" spelling errors and forgot to create a list out of the incoming text.
User avatar
caesar
Mint Rubber
Posts: 3776
Joined: Sun Oct 14, 2001 8:00 pm
Location: Mint Factory

Post by caesar »

A lil typo mistake:

Code: Select all

se fp [open $out w] 
should be:

Code: Select all

set fp [open $out w] 
And seems to be working fine. Thank you.
Once the game is over, the king and the pawn go back in the same box.
Locked