The grace period used to be hardcoded at 10 seconds.
This option allows the user to configure the amount of time the
server will wait before initiating the closing of client connections.
Note that the grace period needs to be strictly lower than the overall
lame_duck_duration. The server deducts the grace period from that
overall duration and spreads the closing of connections during
that time.
For instance, if there are 1000 connections and the lame duck
duration is set to 30 seconds and grace period to 10, then
the server will use 30-10 = 20 seconds to spread the closing
of those 1000 connections, so say roughly 50 clients per second.
Resolves#1459.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is related to #1408.
Make sure that we close the websocket "accept loop" if configured
before proceeding with the lame duck mode.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- A race test may have consumed a lot of fds going in TIME_WAIT
that could cause some issues for other tests
- Missing defer filestore.Stop() that would leave flushLoop()
routines
- A defer for the from server in a LeafNode test
- Rework [Re]ConnectErrorReports that was failing often for me
locally (probably due to exhaustion of fds - too many TIME_WAIT).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Server was incorrectly processing a queue subscription removal
as both a plain sub and queue sub, which may have resulted in
drop of interest even when some queue subs remained.
Resolves#1421
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This will ensure that there is no race where clients are accepted
after the LDM INFO notification.
Also add to the test to make sure that we don't send INFO when
routes are disconnected due to internal closing of connections
during the shutdown process.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Also send an INFO to routes so that the remotes can remove the
LDM's server client URLs and notify their own clients of this
change.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When no tag was set, os.Getenv("TRAVIS_TAG") would return empty string.
Travis now set TRAVIS_TAG to `''`. So check for both.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Updated all tests that use "async" clients.
- start the writeLoop (this is in preparation for changes in the
server that will not do send-in-place for some protocols, such
as PING, etc..)
- Added missing defers in several tests
- fixed an issue in client.go where test was wrong possibly causing
a panic.
- Had to skip a test for now since it would fail without server code
change.
The next step will be ensure that all protocols are sent through
the writeLoop and that the data is properly flushed on close (important
for -ERR for instance).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Running test suite on a Windows VM, I notice several failures.
Updated the compute of the RTT to be at least 1ns. I think that
this is just an issue with the VM I am running, but that change
will have no impact for normal situations (since setting the rtt
to the very minimum duration (1ns) instead of 0) and will prevent
some tests from failing.
Because of those same timer granularity issues, I had to add some
delays between some actions in order for time.Sub()/Since() to
actually report something more than 0.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This adds a new config option server_name that
when set will be exposed in varz, events and more
as a descriptive name for the server.
If unset though the server_name will default to the pk
Signed-off-by: R.I.Pienaar <rip@devco.net>
Also fix for #1071 in that we need to check remote gateways TLS
config even if main gateway section is not configured with TLS.
Related to #1071
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This has an effect only on connections created by the server,
so routes and gateways (explicit and implicit).
Make sure that an explicit warning is printed if the insecure
property is set, but otherwise allow it.
Resolves#1062
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Changed the introduced new option and added a new one. The idea
is to be able to differentiate between never connected and reconnected
event. The never connected situation will be logged at first attempt
and every hour (by default, configurable).
However, once connected and if trying to reconnect, will report every
attempts by default, but this is configurable too.
These two options are supported for config reload.
Related to #1000
Related to #1001Resolves#969
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is a continuation of #1000. Added a configuration to specify
the number of attempts at which the repeated error is reported.
The algo is now to print only the 1st attempt and when current
attempt % <this config param> == 0.
Resolves#969
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This applies to routes, gateways and leaf node connections.
The failed attempts will be printed at the first, after the first
minute and then every hour.
The connect/error statements now include the attempt number.
Note that in debug mode, all attempts are traced, so you may get
double trace (one for debug, one for info/error).
Resolves#969
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Based on @softkbot PR #913.
Removed the command line parameter, which then removes the need for Options.Cluster.TLSInsecure.
Added a test with config reload.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
There is a possibility that a partial write results in data
not being sent in a timely fashion to a subscription.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is not complete solution and is a bit hacky but is a start
to be able to have service import work at least in some basic
cases.
Also fixed a bug where replySub would not be removed from
connection's list of subs after delivery.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Solve RS+ with wildcards
- Solve issue with messages not send to remote gateways queue subs
if there was a qsub on local server.
- Made rcache a perAccountCache since it is now used by routes and
gateways
- Order outbound gateways only on RTT updates
- Print a server's gateway name on startup
- Augment/add some tests
- Update TLS handling: when connecting, use hostname for ServerName
if url is not IP, otherwise use a hostname that we saved when
parsing/adding URLs for the remote gateway.
- Send big buffer in chunks if needed.
- Add caching for qsubs match
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This will allow to signal multiple servers at once to go in
that mode and not have their client reconnect to one of the
servers in the group.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Start the lame duck mode in a go routine in the signal handler
because I think we want to be able to shutdown the server while
in that mode.
Kept the closing as a loop in the lameDuckMode() function (did
not use a timer).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When receiving SIGUSR2 signal (or -sl ldm) the server stops
accepting new clients, closes routes connections and spread the
closing of client connections based on a config lame duck duration
(default is 30sec). This will help preventing a storm of client
reconnect when a server needs to be shutdown.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Increate WriteDeadline test that otherwise could cause a client
connect to fail
- Check failed NumRoutes() with retry
- Check that subs are propagated in route permissions test
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Use pending bytes as slow consumer trigger, so reintroduce max_pending.
Improve latency with inplace flush calls when appropriate. Utilize simple
time budget for readLoop routine.
Signed-off-by: Derek Collison <derek@nats.io>
Until now, a server would only notify clients of servers that join
the cluster. More than that, a server would send ot its clients only
information if new servers were added.
This PR changes this by sending to clients that support async INFO
the list of URLs for all servers in the cluster any time that there
is a change (joining or leaving the cluster).
As of now, clients will not be affected by the change (and will not
take benefit of this: removing servers from their server pool). This
will be addressed in each supported client once this is merged.