Added the sleep back in to TestAuthorizationTimeout which got removed in #493.
This was causing the test to fail intermittently, though it would
usually pass on Travis CI or with the -race flag since these slowed it
down enough.
When TLS and authorization is enabled, the authorization timeout can
fire during the TLS handshake, causing the server to write the
authorization timeout error string into the client socket, injecting
what becomes bad data into the TLS handshake. This creates misleading
errors on the client such as tls: oversized record received with length
21024.
This moves the authorization timeout scheduling to after the TLS
handshake to avoid the race. This should be safe since TLS has its own
handshake timeout. Added a unit test that fails with the old behavior
and passes with the new. LMK if you can think of a better way to test
this.
Fixes#432
This happens sometimes, and the latest occurence was today:
https://github.com/nats-io/java-nats/issues/96
When it happens, there is no error but subscribers would not receive
anything, etc...
This PR uses the fact that clients set the field Lang in the CONNECT
protocol that ROUTEs do not. I have checked that all Apcera supported
clients do set Lang in the CONNECT protocol.
If we plan to add Lang for routes, we need to find another field or
use a new one, in which case that would work only for new clients
(that would need to be updated).
With this change, when the server accepts a connection on the route
port and detects that this protocol field is present, it now closes
the client connection.
The nice thing is that newer clients, when incorrectly connecting
to the route port, get from the route's INFO the list of client URLs,
which means that on the initial connect error, they are able to
subsequently connect to the proper client port, so it is transparent
to the user (which may or may not be a good thing). However, it is not
guaranteed because if the client is not setting NoRandomize to true,
the client URL is added but the array shuffled, so it is possible that
the client library does not find the correct port in the connect loop.
Run GetTLSConnectionState on a non-tls connection (in a dedicated test)
and a tls connection.
Because initializing the TLS connection in the tests is non-trivial,
I hijacked the TestTLSCloseClientConnection test.
Clients that will be at the ClientProtoInfo protocol level (or above)
will now receive an asynchronous INFO protocol when the server
they connect to adds a *new* route. This means that when the cluster
adds a new server, all clients in the cluster should now be notified
of this new addition.
Ensure that all socket writes are protected with deadlines.
For connection Close(), also use deadlines since in case of TLS,
the Close() will send an alert (do a write) if the handshake was
completed. If the peer is not reading, this would cause the Close()
to hang.
* There was a race during unsubscribe()
* 'go test -race' reports a race in TestSetLogger test. This one could be ignored since we normally invoke SetLogger only on server startup. That being said, Travis failed when I tried to submit a PR for the fix of the unsubscribe race. So proposing to fix the logger too.
race condition:
- a client is closed at the same time as an incoming SUB message occurs.
- the subscription is added to the srv.sl even though the socket is
closed.
- the connection cleanup has already run, so the bad state is never
corrected
- now messages may be forwarded to a client without a connection
- messages will not be forwarded to a router that needs it now, because
processMsg assumes the router already received it