[SOLVED] Back for More - Socket timeout help needed

Post by **nml375** » Mon Mar 02, 2009 1:05 pm

gets does not care much for the output, other than it expects a complete line.
In some cases, read would probably be an better option, as long as it is implemented properly.

The biggest issue with your code, is the lack of the readable fileevent.

dj-zath · Post by **dj-zath** » Mon Mar 02, 2009 2:34 pm

actually, you think I need more than the ones I'm using (fileevents)?

heres a piece of the ACTUAL code..

this is working.. I found what was causing the erratic output- silly me forgot to set the timers back to reasonable limits (I was hitting the RAC at 2 times a second!)

Code: Select all


#### Get Rac Info ####

proc    RacSkt {} {
        global RacIP RacPort DetO DetAP DetPY DetPL MetaPY MetaPL
        if {![info exists DetO]} {set DetO "offair.gif"; set DetAP "offair.gif"; set DetPY "offair.gif"; set DetPL "offair.gif"; set MetaPY "Not Available"; set MetaPL "Not Available"; return 0;};

        if {[catch {set RacSock [socket -async $RacIP $RacPort]; fconfigure $RacSock -blocking 0;}]} {
                set DetO "offair.gif";
                set DetPY "offair.gif";
                set DetPL "offair.gif";
                set MetaPY "Not Available";
                set MetaPL "Not Available";
                return 0;
        } else {
                fileevent $RacSock writable [list RacNfo-A $RacSock];
        };
}

proc    RacNfo-A {RacSock} {
        global DetO RacIP RacPort RacInfo
        set DetO "onair.gif";
        flush $RacSock;
        puts $RacSock "GET /x/playing.cgi HTTP/1.0\r\nUser-Agent: Mozilla\r\nHOST $RacIP:$RacPort\r\n\r\n";
        flush $RacSock;
        set T 0; while {($T <= 100000)&&(![eof $RacSock])} {set RacInfo [gets $RacSock];};
        flush $RacSock;
        close $RacSock;
        if {[catch {set RacSock [socket -async $RacIP $RacPort]; fconfigure $RacSock -blocking 0; }]} {
                return 0;
        } else {
                fileevent $RacSock writable [list RacNfo-B $RacSock]
        };
}

proc    RacNfo-B {RacSock} {
        global RacIP RacPort RacInfo PlayInfo DetO DetAP DetPY DetPL MetaPY MetaPL
        flush $RacSock;
        puts $RacSock "GET /x/playlist.cgi HTTP/1.0\r\nUser-Agent: Mozilla\r\nHOST $RacIP:$RacPort\r\n\r\n";
        flush $RacSock;
        set T 0; while {($T <= 100000)&&(![eof $RacSock])} {set PlayInfo [gets $RacSock];};
        flush $RacSock;
        close $RacSock;
        if {($RacInfo == "")||($PlayInfo == "")} {return 0;};

#### Rac Chunk Parsers ####

parsers go here (theres a lot of them too)

}

... and heres the part I'm still working on....

Code: Select all

#### Get Cast Info ####

proc    CastSkt {} {
        global CastIP CastPort AdmLogin AdmPass DetI DetR DetS DetH DetL DetC DetDJ Mp3HC Mp3LC CamC Mp3HP Mp3LP CamP MetaT MetaS MetaDJ
        if {![info exists DetH]} {set DetI "offair.gif"; set DetR "off
air.gif"; set DetS "offair.gif"; set DetH "offair.gif"; set DetL "off
air.gif"; set DetC "offair.gif"; set DetDJ "offair.gif"; set Mp3HC "N/A"; set Mp3LC "N/A"; set CamC "N/A"; set Mp3HP "N/A"; set Mp3LP "N/A"; set CamP "N/A"; set MetaT "Not Available"; set MetaS "Not Available";set MetaDJ "Not Available"; return 0;};

        if {[catch {set CastSock [socket -async $CastIP $CastPort]; ##fconfigure $CastSock -blocking 0;}]} {
                set DetI "offair.gif";
                set DetR "offair.gif";
                set DetS "offair.gif";
               set DetH "offair.gif";
                set DetL "offair.gif";
                set DetC "offair.gif";
                set DetDJ "offair.gif";
                set Mp3HC "N/A";
                set Mp3LC "N/A";
                set CamC "N/A";
                set Mp3HP "N/A";
                set Mp3LP "N/A";
                set CamP "N/A";
                set MetaT "Not Available";
                set MetaS "Not Available";
                set MetaDJ "Not Available";
                return 0;
        } else {
                fileevent $CastSock writable [list CastNfo $CastSock];
        };
}

proc    CastNfo {CastSock} {
        global CastIP CastPort AdmLogin AdmPass CastInfo DetM DetI DetR DetS DetH DetL DetC DetDJ Mp3HC Mp3LC CamC Mp3HP Mp3LP CamP MetaT MetaS MetaDJ
        set DetM "onair.gif";
        flush $CastSock;
        puts $CastSock "GET /Cast.xsl HTTP/1.0\r\nUser-Agent: Mozilla\r\nHOST $AdmLogin:$AdmPass@$CastIP:$CastPort\r\n\r\n";
        flush $CastSock;
        set T 0; while {($T <= 100000)&&(![eof $CastSock])} {set CastInfo "[read $CastSock]";};
        flush $CastSock;
        close $CastSock;
        if {$CastInfo == ""} {return 0;};

#### Cast Chunk Parsers ####

again more parsers (anmd a lot of them)

}

Post by **nml375** » Mon Mar 02, 2009 10:33 pm

Well, if you check my code example, you'll see that I remove the "writable" fileevent once the request has been sent, and instead I create a new "readable" one that is responsible for reading the response.
Basically, getResponse will be called whenever there is data to be read on that socket. The way you're doing it means there is a risk you actually read before the remote server has sent a response, and hence get an empty result.

dj-zath · Post by **dj-zath** » Fri Mar 06, 2009 3:35 am

hi buddy!

yeah, I see what you're saying about the "read before write".. but I think we gone way beyond the inital problem...

So, let's start over... back to basics...

In spite of all this, I'm still getting eggdrop stalling and the occasional "drop to prompt" with a "broken pipe" error.. now, I have chk scripts that will re-load eggdrop once it detects its crashed.. so that, in itself, isnt the problem...

I have gone back to simple connection tests.. how simple? note the following:

Code: Select all


proc    CastSkt {} {
        global CastIP CastPort CastSock
        if {([catch {set CastSock [socket $CastIP $CastPort];}])} {
                putlog {Cast Socket NOT Detected...};
        } else {
                putlog {Cast Socket Detected!};
        };

}

when the "cast" server (on localhost) is present, then "Cast Socket Detected!" appears on the console.. quickly and reliably- as its SUPPOSED to.

when the "cast" server is not present, "Cast Socket NOT Detected..." appears on the console- again, quickly and reliably.

Now its Rac's turn...

Code: Select all

proc    RacSkt {} {
        global RacIP RacPort RacSock
        if {([catch {set RacSock [socket $RacIP $RacPort];}])} {
                putlog {Rac Socket NOT Detected...};
        } else {
                putlog {Rac Socket Detected!};
        };

}

Now things are not going so great... at first, it appears to be working- the proper response "Rac Socket NOT Detected..." displays as the machine hasn't been connected for awhile (about 2 hours at first)

So, then I start up the RAC and it detects it.. and "Rac Socket Detected!" appears on the console.

now, I turn OFF the RAC and then things get REALLY screwy.. the example above will continue to display "Rac Socket Detected!" and, at this poiint eggdrop stops responding for about 5 minutes or so.

if I allow this cycle to continue, between the long pauses, I see (for example) "timer spun 5 mins, was x, now x" errors.

Okay, so now I try it with non-blocking mode set:

Code: Select all


proc    CastSkt {} {
        global CastIP CastPort CastSock
        if {([catch {set CastSock [socket -async $CastIP $CastPort]; fconfigure $CastSock -blocking off;}])} {
                putlog {Cast Socket NOT Detected...};
        } else {
                putlog {Cast Socket Detected!};
        };

}

proc    RacSkt {} {
        global RacIP RacPort RacSock
        if {([catch {set RacSock [socket -async $RacIP $RacPort]; fconfigure $RacSock -blocking off;}])} {
                putlog {Rac Socket NOT Detected...};
        } else {
                putlog {Rac Socket Detected!};
        };

}

this time, its doesn't halt, so to speak... but now it reports back

(With both servers offline):

Cast Socket NOT Detected...
Rac Socket Detected!

Cast Socket NOT Detected...
Rac Socket Detected!

Cast Socket NOT Detected...
Rac Socket Detected!

(since the initial start of the Rac server, it hasn't reported the socket NOT detected even with the RAC completely offline and disconnected)

Code: Select all


proc    RacSkt {} {
        global RacIP RacPort RacSock
        if {([catch {set RacSock [socket -async $RacIP $Ra
cPort]; fconfigure $RacSock -blocking off;}])} {
                putlog {Rac Socket NOT Detected...};
        } else {
                putlog {Rac Socket Detected!};
                fileevent $RacSock readable [list RacNfo $RacSock]
        };

}

proc     RacNfo {RacSock} {
           flush $RacSock
           close $RacSock
}

At this point, Now, I see a flood of "broken pipe" errors and the procs that "invoke" them will eventually stop responding- and whch case, eggdrop drops direct to prompt.

Now your examples didn't work either- because they work with the ASSUMPTION that the basic connection handlers follow protocol- which theese do -NOT-

Further tests also show that, on the RAC's side of things, once the connection is broken/disconnected for ANY reason, that socket is completely NON-RESPONSIVE and will REMAIN that way untill the ISP's router times it out and CLOSES it.. that time is STILL undetermined- and unknown.. it appears to be around 2 HOURS

in closing, it just seems that eggie's/TCL simply don't know WHAT to do with a DEAD socket that reports live and well..

unless I can get past this simple "go/no go" I don't think it matters what else occurs afterwards...

-DjZ-

Post by **nml375** » Fri Mar 06, 2009 3:35 pm

Once again, when using asynchronous connections, a socket is created regardless whether the connection failed, succeeded, or is yet pending. The only time it will not create a socket, is if the socket command is unable to make sense of the address/port, or the underlying network layer is completely unavailable.

In this case, good/bad socket says absolutely nothing whether we can reach the server or not. It just tells us whether the socket could be created or not (not if the connection could be established).

As for your tests... Use the full feature of catch, as illustrated in my examples.

man n catch wrote:...
If the varName argument is given, then the variable it names is set to the result of the script evaluation. When the return code from the script is 1
(TCL_ERROR), the value stored in varName is an error message. When the return code from the script is 0 (TCL_OK), the value stored in resultVarName is
the value returned from script.
...

That is, use this instead when creating your socket:

Code: Select all

...
if {[catch {socket -async $CastIP $CastPort} CastSock]} {
 putlog "Cast Socket not created: $CastSock"
 putlog "reason: $::errorInfo"
} else {
 putlog "Cast Socket $CastSock created"
}

This will post the channel identifier (what you like to call "socket") when the command is successful, and post the full details on why the command failed when it's not. Not using this info makes your tests pretty much useless, as it tells us absolutely nothing as to what is going on.

As for stale channel identifiers, my code did not implement any timeout code. As such, we are currently relying on the underlying network layer to change the state of the channel to something like connection failed (and thus triggering tcl to close the channel identifier). Of course, since you keep on trying without any timeout or cleanup-code, you'll eventually run out of resources, pretty much causing the issues you describe.

A simple timeout implementation, start a timer (preferably using after) that just closes the channel identifier after sufficient time has passed. This in turn will destroy any references, including fileevents to this channel identifier.

I have no clue what you are talking about with this:

dj-zath wrote:Now your examples didn't work either- because they work with the ASSUMPTION that the basic connection handlers follow protocol- which theese do -NOT-

Are you trying to say that your services does not use http-transactions?

There is one issue, some http-transactions do not use trailing newlines on the data output, meaning we will not get the last line when using gets. Using read (properly) when reading the data will solve this issue, yet I still recommend using gets while retrieving the headers.

dj-zath · Post by **dj-zath** » Thu Mar 12, 2009 5:24 am

As such, we are currently relying on the underlying network layer to change the state of the channel to something like connection failed (and thus triggering tcl to close the channel identifier)

yes.. THATS the problem!

the underlying network layer is 1.. reporting connections to machines that aren't there.. and 2.. reporting them intermittantly

I gather from all this that what I [need] to do here is NOT going to be that simple or straight-foreward.. using the http package or not... I have even currently implemented your examples in the best-possable means that I can.. it still "hoses" the eggie (and now it seems the whole SERVER- with icecast and the ircd is also being effected by it as well... since you said.. its running out of resources...

I may have to consider running PHP for this instead of TCL.. call me an idiot... (though you probably have reached that conclusion allready!)

thanks for the help in any case.

I'll now reload my AOL disc to reload the net! <smirk>

-DjZ-
:/:/

Post by **nml375** » Thu Mar 12, 2009 11:54 am

You know, in the very same paragraph I pointed out the reason we're currently stuck at that (and also hints on how to bypass it). Namely, I didn't add a timeout function, which I believe I mentioned earlier aswell...

In any case, it would be something as trivial as adding something like this in the appropriate place:

Code: Select all

#Straight after we've initated the connect and setup the first fileevent, add this:
after 30000 [list safeClose $socket]
...
#Also make sure our safeClose proc exists
proc safeClose {token} {
 catch {close $token}
}

This should add nicely to my posted code back on page 2. It uses a hard timeout of 30 seconds, where it will terminate the connection regardless of it's current state. If you intend to make request more frequent than that, I'd suggest you cut down on the timeout.

dj-zath · Post by **dj-zath** » Mon Mar 16, 2009 4:24 am

hi again...

a couple things to update ya...

first.. couldn't I just use:

Code: Select all

after 30000 [catch {close $socket}]

??

of course I can't! because the connection has to be made beforehand and thats were its getting locked up!

fileevents aren't quite working either.. although "writable" reports ALL the time, "readable" doesn't report AT ALL! (or its intermittant at best).

and even in async mode.. its STILL hanging on the initial connection...

Also,

Someone wanted to try accessing the RAC connection for S&Gs, using python.. and he's having the same problems.. either no connection then connection to nowhere and/or connection but no data and a complete lockup/stall/crash of the script.. even in python, connection error-control is totally worthless!

I've been wondering, myself if, in fact, its the onboard NIC on the server's motherboard thats causing the issues.. (the server that the eggie is residing) The RAC is on a windows box on the end of a WiMax (wireless) connection.. the NIC on the server is an Intel PRO/1000 MT DA (onboard) maaybe someone else out there might know of any issues with using this NIC with an eggie/TCL? tests to localhost all work properly and

I have NO PROBLEMS with error-control or connection timeout/stalling issues with the icecast server running on the same box though there are issues if I try to run with "gets" instead of "read" which "gets" will render nothing.. I believe this is caused by icecast sending XSL output (and likely omitting any \r\n's) Which reminds me.. anyone out there know how I can set up a basic http auth to icecast?? I have tried NML's suggestion but that isn't accepting it.. icecast wants login:password@admin.server.com for login.. and TCL thinks theres a "sintax error"

NML:
How would you like the opportunity to plug your code (your code, your way) into the RAC connection and try to reach the RAC? perhaps you'll better understand the problem(s) once you experience them yourself.. thing is, I'll need to know when you're doing it so I can "connect and disconnect" when the tests require it...

just let me know..

-DjZ-

Post by **nml375** » Mon Mar 16, 2009 2:36 pm

dj-zath wrote:hi again...

a couple things to update ya...

first.. couldn't I just use:
Code: Select all
after 30000 [catch {close $socket}]
??

Problem here is that you'd execute the catch {close $socket} command instantly, rather than delaying it 30 seconds.

dj-zath wrote:of course I can't! because the connection has to be made beforehand and thats were its getting locked up!

Please re-read my previous post, and it'll tell you exactly where to insert the correct code.. That is, _after_ you've initiated the (async) connection...

dj-zath wrote:fileevents aren't quite working either.. although "writable" reports ALL the time, "readable" doesn't report AT ALL! (or its intermittant at best).

Could you post a showcase of this, including the code you were using?

dj-zath wrote:and even in async mode.. its STILL hanging on the initial connection...

Then you are doing something wrong, async by nature will return instantly, regardless of the state of the socket (or connection).

dj-zath wrote:Also,

Someone wanted to try accessing the RAC connection for S&Gs, using python.. and he's having the same problems.. either no connection then connection to nowhere and/or connection but no data and a complete lockup/stall/crash of the script.. even in python, connection error-control is totally worthless!

I'm no expert in python, but as far as I can guess, python uses the very same underlying networking layer..

dj-zath wrote:I've been wondering, myself if, in fact, its the onboard NIC on the server's motherboard thats causing the issues.. (the server that the eggie is residing) The RAC is on a windows box on the end of a WiMax (wireless) connection.. the NIC on the server is an Intel PRO/1000 MT DA (onboard) maaybe someone else out there might know of any issues with using this NIC with an eggie/TCL? tests to localhost all work properly and

Eggdrop doesn't ever see your NIC...
It only sees the C-API provided from stdlibC, which in it's turn interacts with the system kernel. No userspace program, generally, should be able to access the underlying hardware. The only possible exception would be advanced debug/troubleshooting/firmware update-software, which then would require root permissions.
In essence, anything that would work with php should work with eggdrop. Just remember that eggdrop is a single-threaded application, and blocking operations will prevent it from handle other events until the block ends.

dj-zath wrote:I have NO PROBLEMS with error-control or connection timeout/stalling issues with the icecast server running on the same box though there are issues if I try to run with "gets" instead of "read" which "gets" will render nothing.. I believe this is caused by icecast sending XSL output (and likely omitting any \r\n's) Which reminds me.. anyone out there know how I can set up a basic http auth to icecast?? I have tried NML's suggestion but that isn't accepting it.. icecast wants login:password@admin.server.com for login.. and TCL thinks theres a "sintax error"

"gets" returning noting: correct. Solution here is to use gets while retrieving the headers, and then switching over to read.
"syntax error": tcl knows nothing about http protocol, hence it'll say nothing regarding syntax error. Http servers (such as icecast) do know http, and will issue such messages if you do not follow protocol.
Regarding http://user:pass@host:port/path/file URI's, with http they suggest the "Basic" authentication scheme, where user and passwords are sent as request headers, and not included in the URI when sending the request.

dj-zath wrote:NML:
How would you like the opportunity to plug your code (your code, your way) into the RAC connection and try to reach the RAC? perhaps you'll better understand the problem(s) once you experience them yourself.. thing is, I'll need to know when you're doing it so I can "connect and disconnect" when the tests require it...

just let me know..

-DjZ-

Actually, your network issues are rather easily replicated with a WinXP system with firewall enabled. Just try to connect to an unused port, and the connection is silently dropped rather than refused. What I find very strange, is how you manage to freeze your application for several minutes. My 2.6 kernel will drop the connection attempt within 30 seconds without any kind of custom timeout code.

I am working on a simplified rewrite of the http-package to remove the nasty blocking behaviour during connection, although progress is slow due to other interests and projects.
I'll post when I get some further work done...

dj-zath · Post by **dj-zath** » Tue Mar 17, 2009 10:16 am

I have to admit...

I have been lacking sleep lately..

and I have mis-represented the problems I have been experiencing...

so, let me try once again to explain whats been happening..

first off, that guy trying to use Python DID get it to work.. he said its working "as expected" and I can see him hitting the RAC... he says hes seeing a lot of "no route to host" errors and lots of timeouts... but python is simply timing out or reporting "no route to host".

what I'm seeing, is TCL having trouble negotiating the handshake of the RAC connection.. and this is before the connection establishes and starts any of our fconfigures or, otherwise, ANY other config/behaviors. python sees this too- but merely ignores it I'm assuming.. (or treating it as a "no route to host") in either case, its this point, that where TCL seems to be having issues.. My ISP has admitted to me that they do "tamper" with the connection protocols to hamper/prevent bit-torrent and other peer-2-peer file-sharing, however they won't disclose to me exactly what they are doing to the protocols. To me, it appears they "proxy" the incoming connections and then "redistribute" them simular to what a NAT does.. y(ou gotta understand, this ISP is GOOFY- the entire TOWN I'm in IS on the LAN and IPs are assigned on a per pay/per machine basis; but besides that, its a strange situation nevertheless- and since they are the ONLY ISP I have access to, I have to deal with it.

To sum it up "Rural college town ISP" need I say more??

(God, I sure DO miss Chicago sometimes!)

-DjZ-

Post by **nml375** » Tue Mar 17, 2009 11:29 am

I'm using a mobile unit at the moment, so I'll be brief..

Python and tcl use the vey same calls to the system kernel in order to establish the connection. Neither tcl nor eggdrop are having isues negotiating the connection. It's your kernel that's having the issues..
The only big difference between using a separate python process and tcl-script within eggdrop's process, is which proces gets blocked.

The messages 'No Route to Host', 'Remote Host unreachable', etc originates from the IP-stack within the kernel, they're not generated by php, tcl, python, etc...

The whole point of these coding-exercises have been to avoid blocking-mode in eggdrop, noting else.

Yes, your ISP has some akward packet filtering - even so, simply dropping SYN packets is nothing new, and properly written software have been able to cope with that for a long time. In most cases by using proper non-blocking/asynchronous code and proper timeouts, along with either polling or some kind of event dispatcher.

dj-zath · Post by **dj-zath** » Thu Mar 19, 2009 2:59 am

I agree!

it seems that I have a kernel issue, if nothing else..

I will tell you that fileevent didn't work- even the fileevents in YOUR code examples..

fileevent $Sock readable [?proc? $Sock]

never triggers except maybe once every 500 'hits' or so..

fileevent $Sock writable [?proc? $Sock]

always fired- even if the connecion was never completed!

(of course, then comes the "broken pipe" error followed with an immediate drop to prompt)

but I guess that doesn't count- if I stupid enough to call a connection a socket- even if it was merely done for clearity...

no matter- I seemed to have knocked the subnet off line (again) with all of this.. The ISP sent me an email stating they are capping me to 5 G's now... yes I somehow took them down for over 12 hours?? don't ask HOW... I don't have a clue!

now, back to the main thread...

The guy testing Python said "it worked as expected and the RAC didn't go mental" meaning he didn't see anything wrong with the reading of the RAC except clusters of "no route to host" etc... (hes running Linux of some flavor)

yes, I have already began to think its a NIC or kernel issue... perhaps BSD 6x isn't the thing to use... who knows.

a 5G/month cap isn't good for anything streaming-related- (unless maybe I stream in dial-up speeds..) well.. I'll pull my CoLo and be DONE with all of it...

yeah, I know- quite a defeatist attitide.. but thats how I feel about all of this at the moment!

-DjZ-
:/:/

dj-zath · Post by **dj-zath** » Mon Mar 23, 2009 12:04 pm

okay, I spent another night screwing around with things..

first the CODE (totally rewritten/derived from NML-375's examples):

Code: Select all


proc RunLoop1 {} {RacSkt; utimer 10 [list RunLoop1]; return 1;};
proc RunLoop2 {} {HeartBeat; utimer 10 [list RunLoop2]; return 1;};

proc    RacSkt {} {
        global RacIP RacPort
        if {([catch {set RacSock [socket -async $RacIP $RacPort]; fconfigure $RacSock -blocking 0 -buffersize 10;}])||([catch {set PlaySock [socket -async $RacIP $RacPort]; fconfigure $PlaySock -blocking 0 -buffersize 10;}])} {
                catch {close $RacSock};
                catch {close $PlaySock};
                return 0;
        } else {
                set CA [after 1800 [list close $RacSock];];
                set CB [after 1800 [list close $PlaySock];];
                fileevent $RacSock writable [list RacNfo $RacSock $CA];
                fileevent $PlaySock writable [list PlayNfo $PlaySock $CB];
        };
}

proc    RacNfo {RacSock CA} {
        global RacIP RacPort RacInfo
        putlog {Rac Test}
        if {([catch {after cancel $CA; flush $RacSock; puts $RacSock "GET /x/playing.cgi HTTP/1.0\r\nUser-Agent: Mozilla\r\nHOST $RacIP:$RacPort\r\n\r\n\r\n\r\n"; flush $RacSock; while {(![eof $RacSock])} {set RacInfo [gets $RacSock];}; catch {flush $RacSock}; catch {close $RacSock};}])} {
        set RacInfo "";
        return 0;
        };
}

proc    PlayNfo {PlaySock CB} {
        global RacIP RacPort PlayInfo
        putlog {Play Test}
        if {([catch {after cancel $CB; flush $PlaySock; puts $PlaySock "GET /x/playlist.cgi HTTP/1.0\r\nUser-Agent: Mozilla\r\nHOST $RacIP:$RacPort\r\n\r\n\r\n\r\n"; flush $PlaySock; while {(![eof $PlaySock])} {set PlayInfo [gets $PlaySock];}; catch {flush $PlaySock}; catch {close $PlaySock};}])} {
        set PlayInfo "";
        return 0;
        };
}


proc    HeartBeat {} {
          putlog {Still alive...};

}

now the test conditions:
Eggdrop started with the RAC NOT connected: (Radio off)

[10:37] Still alive...
[10:37] Still alive...
[10:37] Still alive...
[10:37] Still alive...
[10:37] Still alive...
[10:37] Still alive...
[10:37] Still alive...

..then the RAC is connected (radio on)

[10:37] Still alive...
[10:38] Still alive...
[10:38] Still alive...
[10:38] Still alive...
[10:38] Rac Test <-- connection detected
[10:38] Play Test
[10:38] Rac Test
[10:38] Still alive...
[10:38] Play Test
[10:38] Still alive...
[10:38] Rac Test
[10:38] Play Test
[10:38] Still alive...

..and now the RAC is disconnected once more.. (Radio off)

[10:38] Rac Test <-notice the script is still detecting a DEAD connection!
[10:38] Play Test
[10:39] Still alive...
[10:39] Rac Test
Heres where the Eggdrop freezes... completely non-responsive!
[10:45] timer: drift (lastmin=40, now=45)
[10:45] timer: drift (lastmin=41, now=45)
[10:45] timer: drift (lastmin=42, now=45)
[10:45] timer: drift (lastmin=43, now=45)
[10:45] timer: drift (lastmin=44, now=45)
[10:45] (!) timer drift -- spun 5 minutes <- you asked where the 5 mins comes from? its RIGHT THERE!
[10:45] net: eof!(write) socket 4 (Broken pipe,32)
[10:45] Writing user file...
[10:45] * EXIT
> <- Eggdrop exits and drops to prompt

This is [one of] the error condition I've been trying to convey to you.. (NML-375)

conclusion: FAIL!

DjZ

Post by **nml375** » Mon Mar 23, 2009 1:06 pm

Uff.. not to be rude, but that code is hard to read.

Your code, will indeed hang your eggdrop in certain conditions. This is the result of a combination of several issues in your code.

I took the liberty of re-indenting your code, and pasting one of the procs here:

Code: Select all

proc    RacNfo {RacSock CA} {
  global RacIP RacPort RacInfo
  putlog {Rac Test}
  if {
    ([catch {
      after cancel $CA
      flush $RacSock
      puts $RacSock "GET /x/playing.cgi HTTP/1.0\r\nUser-Agent: Mozilla\r\nHOST $RacIP:$RacPort\r\n\r\n\r\n\r\n"
      flush $RacSock
      while {(![eof $RacSock])} {
        set RacInfo [gets $RacSock]
      }
      catch {flush $RacSock}
      catch {close $RacSock}
    }])
  } {
    set RacInfo ""
    return 0
  }
}

First, don't nest 20 commands into one catch!
This is bound to blow up in your face. Especially if you include vital clean up code in the very same catch.

If there is any error within that first catch, you'll never reach the line where you close the socket. Hence the socket will remain active. Since you catch everything in that proc, it will never throw an error, and the fileevent will remain valid.

Second, you never disable the fileevent. This means your RacNfo proc will be called over and over and over for as long as the socket remains writeable.

Third, you use the gets/read function in a proc that's triggered on writable, not readable. Hence you have no ideas whatsoever if there is anything there to read. This can mess with the fblocked and eof status commands.
Also remember, that on sockets, there will not be an eof condition until the remote end closes the connection.

All in all, whenever an async connection fails, flush will return an error condition. As the close-commands are in the same catch-script, and the whole script is terminated, the socket remains open, and the fileevent active. This, in its turn, keeps triggering the writable event:

A channel is considered to be writable if at least one byte of data can be written to the underlying file or device without blocking, or if an error condition is present on the underlying file or device.

A cleanup of your code, which should work (although I did not test it) follows... You'll end up with response headers aswell in RacInfo, so I guess you'll like to do some cleanup once the request is completed...

Code: Select all

proc RacNfo {socket after} {
 set ::RacInfo ""
 after cancel $after

 puts $socket "GET /x/playing.cgi HTTP/1.0\r\nUser-Agent: Mozilla\r\nHost: $::RacIP: $::RacPort\r\n\r\n"

 if {[catch {flush $socket} status} {
  putlog "Error in $socket: $status"
  close $socket
  return
 }

 fileevent $socket writable ""
 fileevent $socket readable [list RacReadNfo $socket]
}

proc RacReadNfo {socket} {
 if {[eof $socket]} {
  putlog "Socket $socket is eof, closing..."
  close $socket
  return
 }

 append ::RacInfo [read $socket]
}

Ohh, one more thing. That code of yours could be considered "hammering", as it will make a new connection each 10 seconds, regardless of the status of the previous one. Since your cleanup-code is flawed, in some cases you may very well end up with numerous "pending" connection, filling up the address translation table in your NAT-router (which, btw, is the only reasonable way I could see one system downing a complete subnet these days).

dj-zath · Post by **dj-zath** » Mon Mar 23, 2009 2:10 pm

yes!
I have to nest the whole thing otherwise it FLOODS the shell with pipe errors and returns to prompt almost immediately!

fileevents don't work correctly either..

fileevent readable NEVER works and fileevent writable ALWAYS works- even if the conneection isn't!

this makes it UNRELIABLE and causes errors and crashes the damn thing!

sorry for the "hard to read" word-wrap and I'm mostly blind- so I have to use long tabs to SEE the lines.. but it doesn't paste well in the wiindow...

so, what the BEEEEEP am I missing here?