This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

eggdrop listen

Old posts that have not been replied to for several years.
Locked
e
egghead
Master
Posts: 481
Joined: Mon Oct 29, 2001 8:00 pm
Contact:

Post by egghead »

Hello,

I'm trying to write a tcl script that will read information POSTed by a WWW browser (Netscape, IE, Lynx).
POST information is submitted by browsers into two parts: a head part and a body part, seperated by an empty line. What I observe is that with most browsers, the tcl script only gets the head part and not the body.

Shown below is an isolated example.

The following TCL is loaded on an eggdrop (1.6.6, 1.3.28 and 1.1.5 tested on SUN-OS):

# listen on port 4000
listen 4000 script observescript
proc observescript { idx } {
control $idx observeproc
return 0
}
proc observeproc { idx text } {
putlog "OBSERVE: ($idx) $text"
return 0
}

The following piece of HTML is loaded into lynx:

<html><head></head><body>
<form method=post action="http://vhost.at.shell:4000">
<input name=nick type=text size=20>
</form></body></html>

where vhost.at.shell should be modified to reflect the shell/vhost the bot is using.

Upon typing something in the form and submitting it, the following is OBSERVEd:

[16:33] OBSERVE: (9) POST / HTTP/1.0
[16:33] OBSERVE: (9) Host: vhost.at.shell:4000
[16:33] OBSERVE: (9) Accept: text/html, text/plain, text/sgml, text/x-sgml, application/x-wais-source,
application/html, */*;q=0.001
[16:33] OBSERVE: (9) Accept-Encoding: gzip, compress
[16:33] OBSERVE: (9) Accept-Language: en
[16:33] OBSERVE: (9) Pragma: no-cache
[16:33] OBSERVE: (9) Cache-Control: no-cache
[16:33] OBSERVE: (9) User-Agent: Lynx/2.7
[16:33] OBSERVE: (9) Referer: file://localhost/test.html
[16:33] OBSERVE: (9) Content-type: application/x-www-form-urlencoded
[16:33] OBSERVE: (9) Content-length: 9

and then the connection HANGS :sad: In other words, only the head part is read from the socket.
On some browsers (Netscape, certain versions of IE) everything works fine. On other browsers (Lynx, certain versions of IE) it doesnt work.
Similar behaviour is observed with TCL "socket" in combination with "gets". With "reads" it seems to work fine. Nevertheless, I want to stick to the listen/control idea of eggdrop for various reasons.

I would appreciate any comments/help, as it is a killer for a TCL I'm writing. Is it a bug or is there a way to modify eggdrop source? Once again, I want to stick with the listen/control mechanism of eggdrop.

Thanks.
p
ppslim
Revered One
Posts: 3914
Joined: Sun Sep 23, 2001 8:00 pm
Location: Liverpool, England

Post by ppslim »

You will not be able to stick with eggdrop's listen and control socket functions.

The HTTP RFC states that no CRLF is sent while transmitting the POST data. Thus, to retreive the data, you need to retreive a set length string. This can only be done witht he "read" command in TCL (The HTTP RFC states that the value of the CONTENT_LENGTH header, should be used as the length to read).

"gets", is used to retreive data from the input buffer, up to a CRLF. Where "read" can be used to get any length string, so long as a length is specified.
e
egghead
Master
Posts: 481
Joined: Mon Oct 29, 2001 8:00 pm
Contact:

Post by egghead »

PPSLIM, tnx for the response. If I understand correctly, eggdrop is waiting till it gets a CRLF and then sends the line to the script. Because some browsers (appearently) don't send a CRLF, and some other (appearently) do, at the end of the Entity-Body, eggdrop keeps waiting for it and keeps the connection hanging.

RFC1945 states in section 7.2 that the "Entity-Body = *OCTET".
Then section 2.2 states that "the end-of-line marker" within an Entity-Body is defined by its associated media type" (which is text/plain).
Although the above leaves me a bit clueless whether to expect a CRLF or not, I would expect that a browser sends a CRLF after the Entity-Body is sent.

Now my question is:
- What do browsers send at the end of an Entity-Body? Nothing?
- Isn't there a way to modify the form/post to force a browser to send a CRLF? One way is to use another encoding, but I've found that different browsers encode differently, which is a bit difficult to anticipate in a TCL, especially if the encoding is broken over several lines.
- The thing is that I want to stick to listen/control. I certainly don't want to rely on "Content-Length" as provided by browsers for using in a "read" with TCL sockets.

Any help/comments are greatly appreciated, as the TCL I'm writing uses handing over "control" extensively and I don't like the headache of rewriting it :smile:
p
ppslim
Revered One
Posts: 3914
Joined: Sun Sep 23, 2001 8:00 pm
Location: Liverpool, England

Post by ppslim »

TCL sockets are going to be the only trustworthy method of getting the full header implimentation,

If I have to read the HTTP RFC once more time, I will go insane. So the information here, is from memory.

Each item in a HTTP header is ended with with a <EOL> marker (CRLF on windows, Just one on Lniux, which one, i'll be damned). Once all headers are received, and single <EOL> marker is sent.
EG
Content-type: text/plain<EOL>
Content-length: 23<EOL>
<EOL>
This is where the web-server will usualy stop parsing, and will leave anything else in the input buffer on hold. This is either where a server module, or wsome other form of CGI interface will read the data.

Note: Eggdrop's connect handling system wiull not send blank lines to the script, as this would break the system eggdrop has setup. In a eggdrop connect script, if you receive a blank line, it means the connection has been closed, not that it received a blank line. Another reason to use TCL sockets.

Apart from the referance you have allready stated (from the RFC) There is nothig to state if a input entity should be <EOL> terminated. The content-type should be used, yes, but in the case of text/plain, no <EOL> is needed, as there is no definition to state how plain text is terminated, or if it needs to be terminated.

As such, this is the reason why, browsers send different requests to the server (some terminated, others not).

There is no way to force a browser to use one method.

This leaves only the content-length as the reliable source of reading method. There is nothing in the RFC to state that a browser can't tail junk to the end of the entity body (I have used this method in a previous project, to beat server cache). As such you have to read useing content-length, as it will return the true RFC complient entity.

I doubt it is worth changing the code for eggdrop's scripted sockets, as you will genraly break them alltogether. What are you going to tell eggdrop to wait for before calling the script, if nothing, you gonna have a mess on your hands, with unpredictable data being sent to the script.

To sum up
There is no way to change the method that a browser sends it's entity. There is no way to get a non <AOL> terminated string of data from the socket. As such, eggdrop's scripted sockets are both unreliable and not suited to the job.

As such, TCL sockets are the only way of reading ahead of a <EOL>.

Good luck
e
egghead
Master
Posts: 481
Joined: Mon Oct 29, 2001 8:00 pm
Contact:

Post by egghead »

PPSLIM, again tnx for the reply. The <EOL> you mention, in reality is a <CRLF>. Nevertheless, your comments appear to be correct.

By searching on "HTTP POST CRLF", several links came up with some background on this "problem".
http://www.google.com/search?q=HTTP+POST+CRLF

It seems that in the early days the CERN webservers needed the CRLF after the posted body, to be able to run certain scripts. Which is similar to eggdrop's way of handling the data.
http://httpd.apache.org/docs/misc/known ... blems.html
Some clients still send a CRLF after the POSTed body. Some other clients don't.

There are even some hilarious examples of the consequenses, e.g.:
http://support.microsoft.com/support/kb ... 3/2/98.asp

It appears that for the listen/control/putidx scheme, that a "getidx" would be really handy (unless it breaks something) for reading bytes based on the "Content-length".

Is it feasible and doable, to have a "getidx idx nbytes" read command in the eggdrop environment, equivalent to the "putidx idx" command and equivalent to the read command of TCL itself?

Do eggdrop coders read this forum? I'm interested to hear their comments. Or should I take this to the eggheads mailing list?

Anyway, I'm trying to stick to the control/listen idea of eggdrop, as it gives me the advantage of having "control" and "dcclist", which allows easy checking for open connections.

<font size=-1>[ This Message was edited by: egghead on 2001-11-12 15:49 ]</font>
User avatar
stdragon
Owner
Posts: 959
Joined: Sun Sep 23, 2001 8:00 pm
Contact:

Post by stdragon »

This may help you out... send the "Connection: close" response header right away, and the client should close the connection after the data is transferred. That will trigger your callback once with the last line, and then again with eof.

Your idea for getdcc or getidx sounds feasible. You could do it with a module.
e
egghead
Master
Posts: 481
Joined: Mon Oct 29, 2001 8:00 pm
Contact:

Post by egghead »

BBcode not working?

<font size=-1>[ This Message was edited by: egghead on 2002-01-06 12:18 ]</font>
e
egghead
Master
Posts: 481
Joined: Mon Oct 29, 2001 8:00 pm
Contact:

Post by egghead »

stdragon, thanks for the suggestion. It could be an option (although a tricky one :wink: ), if the webpage returned by the bot didn't depend on the POST'ed information.
In my case the returned webpage depends on the posting :sad:

I'm still tossing with the two options
  • use <form method=post enctype="multipart/form-data"> which will include the CRLF
  • use TCL Sockets
The first option requires writing a parser for the encoded info. Both IE and Netscape apply a similar kind of encoding, Lynx uses a different one, however, which makes parsing the information a tad difficult.

As for the second option (TCL sockets) I started with the following code:

Code: Select all


set sock [socket -server runsock 4000]

proc runsock {sock address clientport} {
   fconfigure $sock -blocking 0
   fconfigure $sock -buffering none
   fileevent $sock readable [list infohandler $sock]
}
proc infohandler {sock} {
   if {[eof $sock]} {
      putlog "Connection $sock closed"
      close $sock
   }
   set text [gets $sock]
   putlog "OBSERVED: $text"
}

Once a browser submits its info (see the html code in the initial posting), it is OBSERVED that text is coming in line by line. However the rate of the OBSERVED lines is rather low, say 1 line per 200 ms. Which would take about 2 seconds for 10 lines (roughly the number of lines of an HTTP header from a client).

The question is: is there a setting in TCL or in eggdrop which can increase this speed? For example 1 line per 20 ms?

Of course, it is also possible to include a while loop in the proc "infohandler", with a GET or READ within the loop. (Compare for example "motherhen" :smile: ) For now, I want to avoid using this option.

Motherhen: http://home.dal.net/stdragon/eggdrop

<font size=-1>[ This Message was edited by: egghead on 2002-01-06 12:19 ]</font>
p
ppslim
Revered One
Posts: 3914
Joined: Sun Sep 23, 2001 8:00 pm
Location: Liverpool, England

Post by ppslim »

Tcl's fileevent command, provides a way to prevent blocking of eggdrop, while listening to connections. However, this requires that you wait 1 loop of eggdrop's own event loop, before any more information is called to action by the fileevent command.

Changing values in eggdrop, could leave you with more timeout's on network connections, and a few other tasks, and would be a path you should not realy follow.

For speed, you would have to use some form of loop system in your script to read miltiple lines from the socket.

Another option, would be to add an extra call to "Tcl_DoOneEvent" within the eggdrop source, which would double the load to the script, per eggdrop event loop. However, I am no programer, and have not looked into the bad side of this.
User avatar
stdragon
Owner
Posts: 959
Joined: Sun Sep 23, 2001 8:00 pm
Contact:

Post by stdragon »

You are better off using a loop. That is the way tcl's channels are meant to be used, because more than 1 line is transmitted at a time, and you should read all available lines in each fileevent.

Also, about "Connection: close"... the browser will still send the posted information. It will just close the connection when it finishes, instead of keeping it open for a new request.
e
egghead
Master
Posts: 481
Joined: Mon Oct 29, 2001 8:00 pm
Contact:

Post by egghead »

Connection: close
To test the trick of sending the Connection: close, the following code is used (feel free to shoot at it :smile: ):

Code: Select all


listen 4000 script observescript 
   proc observescript { idx } { 
   control $idx observeprocA
   return 0 
} 
proc observeprocA { idx text } { 
   putlog "OBSERVE A: ($idx) $text" 
   if {$text == ""} { return 1 }
   putidx $idx "HTTP/1.0 200 OKr"
   putidx $idx "Allow: GET, HEAD, POSTr"
   putidx $idx "Server: Eggdropr"
   putidx $idx "From: eggheadr"
   putidx $idx "Content-type: text/plainr"
   putidx $idx "Connection: closer"
   putidx $idx "r"
   control $idx observeprocB
   return 0 
} 
proc observeprocB { idx text } { 
   putlog "OBSERVE B: ($idx) $text" 
   if {$text == ""} {
      putlog "OBSERVE B: connection closed"
      return 1
   } else {
      putidx $idx "BOUNCING: $textr"
      return 0
   }
}

After receiving the first requestlines, "observeprocA" sends the Connection: close.
The remaining lines from the client are received by "observeprocB".

Both Netscape 4.7 and IE 5.5 indeed send the POSTed information :cool: Although IE 5.5 doesn't send it on fresh startup of IE. Only after a "stop" and resend, the info comes available.
Lynx and Netscape 6.2 however continue to refuse :evil:
Either the trick doesn't work for Lynx, or the above code should be modified to fool Lynx a bit more. Anyway, I'm keeping this option as a reasonable backup option.

TCL sockets:
Conclusion from ppslims's post is that the limiting factor for the loop is the internal eggdrop loop, not the TCL loop (if a gets is done on each FILEEVENT trigger i.e. there are as many FILEEVENT triggers as there are lines coming to the socket).
Conclusion from stdragon's post is that on an FILEEVENT all available lines must be read from the socket. Although I don't see why all info should be read immediately. There is a buffer anyway (?) Why not read a line every 50 ms controlled by a timer?

Unfortunately, I haven't found any good and thorough reference on the WWW on what is allowed and what not. Any good reference is highly appreciated.
heh.. I'm not a TCP/networking egghead, unless some drinks get involved in the networking :smile:
One question for example: if data is read from the socket in a slow pace -for whatever reason- (say 1 line per 500 ms), does the socket buffer overflow?

Two cases of loops were tested to read the data from the socket on a fileevent:
  • A proc which reads one (1) line. This proc is called every X milliseconds using TCL's "after".
  • A while loop with a GETS, until all data is read or until the socket is exhausted.
The second option works ok, but I want to avoid it for now.
The first option it seems, however, doesn't work when the X milliseconds is set to a low value (say 50 ms). It seems that again the bottleneck is the "loop" time it takes eggdrop to evaluate. Is that a correct conclusion? Is there a way (apart from messing with the internals of eggdrop) to circumvent this?

<font size=-1>[ This Message was edited by: egghead on 2002-01-06 17:18 ]</font>
User avatar
stdragon
Owner
Posts: 959
Joined: Sun Sep 23, 2001 8:00 pm
Contact:

Post by stdragon »

Code: Select all

proc observeprocA { idx text } { 
   putlog "OBSERVE A: ($idx) $text" 
   if {$text == ""} { return 1 }
   putidx $idx "HTTP/1.0 200 OKr"
   putidx $idx "Allow: GET, HEAD, POSTr"
   putidx $idx "Server: Eggdropr"
   putidx $idx "From: eggheadr"
   putidx $idx "Content-type: text/plainr"
   putidx $idx "Connection: closer"
   putidx $idx "r"
   control $idx observeprocB
   return 0 
} 
proc observeprocB { idx text } { 
   putlog "OBSERVE B: ($idx) $text" 
   if {$text == ""} {
      putlog "OBSERVE B: connection closed"
      return 1
   } else {
      putidx $idx "BOUNCING: $textr"
      return 0
   }
}

What happens if you get rid of the 'return 1' part? Also, maybe you could try HTTP 1.1 instead of 1.0, and/or keep-alive connections.
Conclusion from ppslims's post is that the limiting factor for the loop is the internal eggdrop loop, not the TCL loop (if a gets is done on each FILEEVENT trigger i.e. there are as many FILEEVENT triggers as there are lines coming to the socket).
That's true. The normal timeout on eggdrop's main loop is 1 second. If there is activity on any connection, like the server, or a dcc chat, then it ends early, so you may see very sporadic response times ranging from near-instant to 1 second.
Conclusion from stdragon's post is that on an FILEEVENT all available lines must be read from the socket. Although I don't see why all info should be read immediately.
I didn't mean to say it *must* be done like that, I'm just suggesting you do it that way because it is faster and more efficient.
There is a buffer anyway (?) Why not read a line every 50 ms controlled by a timer?
The lowest guaranteed timer resolution is 1 second. Anything less than that and it becomes rather random, since the main loop uses a 1 second timeout interval.
Unfortunately, I haven't found any good and thorough reference on the WWW on what is allowed and what not. Any good reference is highly appreciated.
You're allowed to do it any way. It's just "supposed" to be done the way I said, that's why there are commands like "fblocked" and "feof" to clarify the situation when "gets" returns -1 in your fileevent handler.
One question for example: if data is read from the socket in a slow pace -for whatever reason- (say 1 line per 500 ms), does the socket buffer overflow?
No, eventually it will fill up and stop accepting data until you read some of it, but it won't overflow or lose any data. TCP guarantees that for you (congestion control, resending packets, etc).
The first option it seems, however, doesn't work when the X milliseconds is set to a low value (say 50 ms). It seems that again the bottleneck is the "loop" time it takes eggdrop to evaluate. Is that a correct conclusion? Is there a way (apart from messing with the internals of eggdrop) to circumvent this?
To your first question, yes that is a correct conclusion. The second... yes, there are some ugly hacks you could do that wouldn't involve editing the source code. For instance, if you have another bot, connect them via botnet. Then run a script on the 2nd bot that continuously sends garbage over the network. That will force eggdrop's inner loop to spin constantly, eliminating most of the delay.
Locked