This was found due to a recent test that was flapping. The test
was not checking the correct server for leafnode connection, but
that uncovered the following bug:
When a leafnode connection is solicited, the read/write loops are
started. Then, the connection lock is released and several
functions invoked to register the connection with an account and
add to the connection leafs map.
The problem is that the readloop (for instance) could get a read
error and close the connection *before* the above said code
executes, which would lead to a connection incorrectly registered.
This could be fixed either by delaying the start of read/write loops
after the registration is done, or like in this PR, check the
connection close status after registration, and if closed, manually
undoing the registration with account/leafs map.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
There is a race between the time the processing of a subscription
and the init/send of subscriptions when accepting a leaf node
connection that may cause internally a subscription's subject
to be counted many times, which would then prevent the send of
an LS- when the subscription's interest goes away.
Imagine this sequence of events, each side represents a "thread"
of execution:
```
client readLoop leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist
recv CONNECT
auth, added to acc.
updateSmap
smap["foo"]++ -> 1
no LS+ because !allSubsSent
init smap
finds sub in acc sl
smap["foo"]++ -> 2
sends LS+ foo
allSubsSent == true
recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```
Equivalent result but with slightly diffent execution:
```
client readLoop leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist
recv CONNECT
auth, added to acc.
init smap
finds sub in acc sl
smap["foo"]++ -> 1
sends LS+ foo
allSubsSent == true
updateSmap
smap["foo"]++ -> 2
no LS+ because count != 1
recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```
The approach for the fix is delay the creation of the smap
until we actually initialize the map and send the subs on processing
of the CONNECT.
In the meantime, as soon as the LN connection is registered
and available in updateSmap, we check that smap is nil or
not. If nil, we do nothing.
In "init smap" we keep track of the subscriptions that have been
added to smap. This map will be short lived, just enough to
protect against races above.
In updateSmap, when smap is not nil, we need to checki, if we
are adding, that the subscription has not already been handled.
The tempory subscription map will be ultimately emptied/set to
nil with the use of a timer (if not emptied in place when
processing smap updates).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Server was incorrectly processing a queue subscription removal
as both a plain sub and queue sub, which may have resulted in
drop of interest even when some queue subs remained.
Resolves#1421
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If a leafnode connection is accepted but the server is shutdown
before the connection is fully registered, the shutdown would
stall because read and write loop go routine would not be
stopped.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Say server in cluster A accepts a connection from a server in
cluster B.
The gateway is implicit, in that A does not have a configured
remote gateway to B.
Then the server in B is shutdown, which A detects and initiate
a single reconnect attempt (since it is implicit and if the
reconnect retries is not set).
While this happens, a new server in B is restarted and connects
to A. If that happens before the initial reconnect attempt
failed, A will register that new inbound and do not attempt to
solicit because it has already a remote entry for gateway B.
At this point when the reconnect to old server B fails, then
the remote GW entry is removed, and A will not create an outbound
connection to the new B server.
We fix that by checking if there is a registered inbound when
we get to the point of removing the remote on a failed implicit
reconnect. If there is one, we try the reconnection.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Fixes#1372 by updating s.sys.account pointer.
This issue also showed that accounts are unnecessarily reloaded.
This happened because account imports were not copied and thus,
deepEqual detected a difference were none was.
This was addressed by making the copy less shallow.
Furthermore did deepEqual detects a difference when it compared
slices that were appended to while processing a map.
This was fixed by sorting before comparison.
Noticed that Account.clients stored an unnecessary pointer.
Removed duplicated code in systemAccount.
Signed-off-by: Matthias Hanel <mh@synadia.com>
Currently when using TLS based authentication, any domain components
that could be present in the cert will be omitted since Go's
ToRDNSequence is not including them:
202c43b2ad/src/crypto/x509/pkix/pkix.go (L226-L245)
This commit adds support to include the domain components in case
present, also roughly following the order suggested at:
https://tools.ietf.org/html/rfc2253
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
If the connection is marked as closed while sending the INFO, the
connection would not be removed from the internal map, which would
cause it to be shown in the monitoring list of opened connections.
Resolves#1384
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This configuration allows to refer to a configured user to be used when
the connection provides no credentials.
Signed-off-by: Matthias Hanel <mh@synadia.com>
If a node in the cluster goes away, an async INFO is sent to
inbound gateway connections so they get a chance to update their
list of remote gateway URLs. Same happens when a node is added
to the cluster.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This gives the close reason directly in the log without having to
get that information from the monitoring endpoint. Here is an
example of a route closed due to the remote side not replying to
PINGs:
```
[INF] 127.0.0.1:53839 - rid:2 - Router connection closed: Stale Connection
```
Without this change, the log statement would have been:
```
[INF] 127.0.0.1:53839 - rid:2 - Router connection closed
```
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Make use of some existing helpers and add checkFor in some places
since accounting updates may not be instantaneous.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Setup:
B <- GW -> C
/ \
v v
A D
Leafnodes are created from B to A and C to D. The remotes on B and
C have the option "Hub: true".
The replier connects to D and listens to "service". The requestor
connects to "A" and sends the request on "service". The reply does
not make it back to A.
If the requestor on A, instead of calling Request(), first creates
a subscription on an inbox, wait a little bit (few 100s ms), then
publishes the request on "service" with that inbox for the reply
subject, the reply makes it back to A.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>