There is a race between the time the processing of a subscription
and the init/send of subscriptions when accepting a leaf node
connection that may cause internally a subscription's subject
to be counted many times, which would then prevent the send of
an LS- when the subscription's interest goes away.
Imagine this sequence of events, each side represents a "thread"
of execution:
```
client readLoop leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist
recv CONNECT
auth, added to acc.
updateSmap
smap["foo"]++ -> 1
no LS+ because !allSubsSent
init smap
finds sub in acc sl
smap["foo"]++ -> 2
sends LS+ foo
allSubsSent == true
recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```
Equivalent result but with slightly diffent execution:
```
client readLoop leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist
recv CONNECT
auth, added to acc.
init smap
finds sub in acc sl
smap["foo"]++ -> 1
sends LS+ foo
allSubsSent == true
updateSmap
smap["foo"]++ -> 2
no LS+ because count != 1
recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```
The approach for the fix is delay the creation of the smap
until we actually initialize the map and send the subs on processing
of the CONNECT.
In the meantime, as soon as the LN connection is registered
and available in updateSmap, we check that smap is nil or
not. If nil, we do nothing.
In "init smap" we keep track of the subscriptions that have been
added to smap. This map will be short lived, just enough to
protect against races above.
In updateSmap, when smap is not nil, we need to checki, if we
are adding, that the subscription has not already been handled.
The tempory subscription map will be ultimately emptied/set to
nil with the use of a timer (if not emptied in place when
processing smap updates).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is a breaking change and will not be able to restore consumer's from a filestore when upgraded.
We are getting close to settling on the API an once that happens we will not be introducnig any
breaking changes.
Signed-off-by: Derek Collison <derek@nats.io>
Updated all tests that use "async" clients.
- start the writeLoop (this is in preparation for changes in the
server that will not do send-in-place for some protocols, such
as PING, etc..)
- Added missing defers in several tests
- fixed an issue in client.go where test was wrong possibly causing
a panic.
- Had to skip a test for now since it would fail without server code
change.
The next step will be ensure that all protocols are sent through
the writeLoop and that the data is properly flushed on close (important
for -ERR for instance).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
In updateRouteSubscriptionMap(), when a queue sub is added/removed,
the code locks the account and then the route to send the update.
However, when a route is accepted and the subs are sent, the
opposite (locking wise) occurs. The route is locked, then the account.
This lock inversion is possible because a route is registered (added
to the server's map) and then the subs are sent.
Use a special lock to protect the send, but don't hold the acc.mu
lock while getting the route's lock.
The tests that were created for the original missed queue updates
issue, namely TestClusterLeaksSubscriptions() and
TestQueueSubWeightOrderMultipleConnections() pass with this change.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).
For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Added the RTT field to each route reported in routez.
Ensure that when a route is accepted, we send a PING to compute
the first RTT and don't have to wait for the ping timer to fire.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If each server has a long list of subscriptions, when the route
is established, sending this list could result in each server
treating the peer as a slow consumer, resulting in a reconnect,
etc..
Also bumping the fan-in threshold for route connections.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>