This is the result of flapping tests in go-nats that were caused
by a defect (see PR https://github.com/nats-io/go-nats/pull/348).
However, during debugging, I realize that there were also things
that were not quite right in the server side. This change should
make it the notification of cluster topology changes to clients
more robust.
When a server accepts a route, it will keep track of that server
`connectURLs` array. However, if the server was creating a route
to that other server at the same time, it will promote the route
as a solicited one. The content of that array was not transfered,
which means that on a disconnect, it was possible that the cluster
topology change was not properly sent to clients.
Until now, a server would only notify clients of servers that join
the cluster. More than that, a server would send ot its clients only
information if new servers were added.
This PR changes this by sending to clients that support async INFO
the list of URLs for all servers in the cluster any time that there
is a change (joining or leaving the cluster).
As of now, clients will not be affected by the change (and will not
take benefit of this: removing servers from their server pool). This
will be addressed in each supported client once this is merged.
- Renamed GitHash to gitCommit as per discussions with Ivan
- Set gitHash to 'not set' if not set (as in the case of a local build)
# Conflicts:
# server/server.go
[FIX] Docker images were injecting the compile time variable `github.com/nats-io/gnatsd/version.GITCOMMIT`, however this is not referenced nor exposed anywhere.
[FIX] Docker images were injecting the compile time variable `github.com/nats-io/gnatsd/version.GITCOMMIT`, however this is not referenced nor exposed anywhere.
When the option Cluster.NoAdvertise is false, a server will send
an INFO protocol message to its client when a server has joined
the cluster.
Previously, the protocol would be sent only if the
joining server's "client URLs" (the addresses where clients connect
to) were new. It will now be sent regardless if the server joins
(for the first time) or rejoins the cluster.
Clients are still by default invoking the DiscoveredServersCB callback
only if they themselves detect that new URLs were added. A separate
PR may be filled to client libraries repo to be able to invoke
the callback anytime an async INFO protocol is received.
Based on @madgrenadier PR #597.
If server requires TLS and clients are connecting, and a Connz
request is made while clients are still in TLS Handshake, the
call to tls.Conn.ConnectionState() would block for the duration
of the handshake. This would cause the overall http request to
take too long.
We will now not try to gather TLSVersion and TLSCipher from a
client that is still in TLS handshake.
Resolves#600
The http servers for those two were recently modified to set
a ReadTimeout and WriteTimeout. The WriteTimeout specifically
caused issues for Profiling since it is common to ask sampling
of several seconds. Pprof code would reject the request if it
detected that http server's WriteTimeout was more than sampling
in request.
For monitoring, any situation that would cause the monitoring code
to take more than 2 seconds to gather information (could be due
to locking, amount of objects to return, time required for sorting,
etc..) would also cause cURL to return empty response or WebBrowser
to fail to display the page.
Resolves#600
There were some cases where override would not work. Any command
line parameter that would be set to the type default value (false
for boolean, "" for string, etc) would not be taken into account.
I moved all the flags parsing and options configuration into
a new function, which may help reduce code duplication in
NATS Streaming.
The other advantage of moving this in a function is that it
can now be unit tested.
I am also removing call to `RemoveSelfReference()` which attempted
to remove a route to self, which has been already solved at runtime
with detecting and ignoring a route to self.
This function would be invoked only when routes were defined in
the configuration file, not in the command line parameter.
Removing this call also solves an user issue (#577)
Resolves#574Resolves#577
- Created a setter for the closed flag.
- Check if route is closed under lock and set a boolean if so,
so we don't check c.route outside of c's mutex.
- Ensure that we do not create a route on shutdown, which would
leave a connection hanging (was seen in some config reload tests).
gnatsd currently uses a global logger. This can cause some problems
(especially around the config-reload work), but global variables are
also just an anti-pattern in general. The current behavior is
particularly surprising because the global logger is configured through
calls to the Server.
This addresses issue #500 by removing the global logger and making it a
field on Server.
In case one creates a server instance with New() and then starts
the http server manually (s.StartHTTPMonitoring()), calling
s.Shutdown() would not stop the http server because Shutdown()
would return without doing anything if `running` was not true.
This boolean was set to true only in `s.Start()`.
Also added StartMonitoring() to perform the options check and
selectively start http or https server to replace individual calls.
This is useful for NATS Streaming server that will now be able
to call s.StartMonitoring() without having to duplicate code
about options checks and http server code.
This is related to PR #481
When TLS and authorization is enabled, the authorization timeout can
fire during the TLS handshake, causing the server to write the
authorization timeout error string into the client socket, injecting
what becomes bad data into the TLS handshake. This creates misleading
errors on the client such as tls: oversized record received with length
21024.
This moves the authorization timeout scheduling to after the TLS
handshake to avoid the race. This should be safe since TLS has its own
handshake timeout. Added a unit test that fails with the old behavior
and passes with the new. LMK if you can think of a better way to test
this.
Fixes#432
The RunServer() function (and the various variants)
call Server.Start() in a go-routine, but do not return until
it has verified that the server is ready to accept connections.
To do so, it use GetListenEndpoint() to get a suitable connect
address (replacing "0.0.0.0" or "::" with localhost - important
on Windows). It then creates a raw TCP connection to ensure the
server is started, repeating the process in case of failure up
to 10 seconds.
This PR replaces this with a function that checks that client
listener, and route listener if configured, are set. This removes
the need to get a connect address and create test tcp connections.
The reason for this change is that NATS Streaming when starting
the NATS Server (unless configured to connect to a remote one)
calls RunServerWithAuth(), which when getting "localhost" from
GetListenEndpoint(), would fail trying to resolve it. This happened
for the NATS Streaming Docker image built with Go 1.7+.