The grace period used to be hardcoded at 10 seconds.
This option allows the user to configure the amount of time the
server will wait before initiating the closing of client connections.
Note that the grace period needs to be strictly lower than the overall
lame_duck_duration. The server deducts the grace period from that
overall duration and spreads the closing of connections during
that time.
For instance, if there are 1000 connections and the lame duck
duration is set to 30 seconds and grace period to 10, then
the server will use 30-10 = 20 seconds to spread the closing
of those 1000 connections, so say roughly 50 clients per second.
Resolves#1459.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
There is a race between the time the processing of a subscription
and the init/send of subscriptions when accepting a leaf node
connection that may cause internally a subscription's subject
to be counted many times, which would then prevent the send of
an LS- when the subscription's interest goes away.
Imagine this sequence of events, each side represents a "thread"
of execution:
```
client readLoop leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist
recv CONNECT
auth, added to acc.
updateSmap
smap["foo"]++ -> 1
no LS+ because !allSubsSent
init smap
finds sub in acc sl
smap["foo"]++ -> 2
sends LS+ foo
allSubsSent == true
recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```
Equivalent result but with slightly diffent execution:
```
client readLoop leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist
recv CONNECT
auth, added to acc.
init smap
finds sub in acc sl
smap["foo"]++ -> 1
sends LS+ foo
allSubsSent == true
updateSmap
smap["foo"]++ -> 2
no LS+ because count != 1
recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```
The approach for the fix is delay the creation of the smap
until we actually initialize the map and send the subs on processing
of the CONNECT.
In the meantime, as soon as the LN connection is registered
and available in updateSmap, we check that smap is nil or
not. If nil, we do nothing.
In "init smap" we keep track of the subscriptions that have been
added to smap. This map will be short lived, just enough to
protect against races above.
In updateSmap, when smap is not nil, we need to checki, if we
are adding, that the subscription has not already been handled.
The tempory subscription map will be ultimately emptied/set to
nil with the use of a timer (if not emptied in place when
processing smap updates).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This also introduces the ability to have flow control inbound for restoring a stream.
If the system detects a reply subject it will respond with a nil payload.
For the last EOF message if a reply is present it will respond with a stream info response or error.
Signed-off-by: Derek Collison <derek@nats.io>
In master, this this what happens when a connection is closed
and server runs with `-D`
```
[95023] 2020/06/03 14:55:28.395532 [DBG] 127.0.0.1:54077 - cid:2 - Client connection created
[95023] 2020/06/03 14:55:29.164118 [DBG] 127.0.0.1:54077 - cid:2 - Client connection closed: Client Closed
[95023] 2020/06/03 14:55:29.164191 [DBG] 127.0.0.1:54077 - cid:2 - Client connection closed
```
Notice the trace of connection closed with the reason, and then
the bare connection closed statement.
This double trace was introduced by mistake during the JS branch
work (dd116fcfd4 (diff-853eb184ac73cf9597d7833f6b89e9c9R3547))
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is related to #1408.
Make sure that we close the websocket "accept loop" if configured
before proceeding with the lame duck mode.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When the logfile_size_limit option is specified, the logfile will
be automatically rotated. However, if user still sends the re-open
signal (SIGUSR1), the log file will then no longer apply the
size limit.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- A race test may have consumed a lot of fds going in TIME_WAIT
that could cause some issues for other tests
- Missing defer filestore.Stop() that would leave flushLoop()
routines
- A defer for the from server in a LeafNode test
- Rework [Re]ConnectErrorReports that was failing often for me
locally (probably due to exhaustion of fds - too many TIME_WAIT).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>