Commit Graph

1462 Commits

Author SHA1 Message Date
Ivan Kozlovic
fa9de548b1 [FIXED] LeafNode solicit failure race could leave conn registered
This was found due to a recent test that was flapping. The test
was not checking the correct server for leafnode connection, but
that uncovered the following bug:

When a leafnode connection is solicited, the read/write loops are
started. Then, the connection lock is released and several
functions invoked to register the connection with an account and
add to the connection leafs map.
The problem is that the readloop (for instance) could get a read
error and close the connection *before* the above said code
executes, which would lead to a connection incorrectly registered.

This could be fixed either by delaying the start of read/write loops
after the registration is done, or like in this PR, check the
connection close status after registration, and if closed, manually
undoing the registration with account/leafs map.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-08-27 15:04:40 -06:00
Ivan Kozlovic
7e22004c3a [FIXED] Unsubscribe may not be propagated through a leaf node
There is a race between the time the processing of a subscription
and the init/send of subscriptions when accepting a leaf node
connection that may cause internally a subscription's subject
to be counted many times, which would then prevent the send of
an LS- when the subscription's interest goes away.

Imagine this sequence of events, each side represents a "thread"
of execution:
```
client readLoop                         leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist

                                         recv CONNECT
                                     auth, added to acc.

updateSmap
smap["foo"]++ -> 1
no LS+ because !allSubsSent

                                         init smap
                                    finds sub in acc sl
                                    smap["foo"]++ -> 2
                                        sends LS+ foo
                                    allSubsSent == true

recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```
Equivalent result but with slightly diffent execution:
```
client readLoop                         leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist

                                         recv CONNECT
                                     auth, added to acc.

                                         init smap
                                    finds sub in acc sl
                                    smap["foo"]++ -> 1
                                        sends LS+ foo
                                    allSubsSent == true

updateSmap
smap["foo"]++ -> 2
no LS+ because count != 1

recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```

The approach for the fix is delay the creation of the smap
until we actually initialize the map and send the subs on processing
of the CONNECT.
In the meantime, as soon as the LN connection is registered
and available in updateSmap, we check that smap is nil or
not. If nil, we do nothing.

In "init smap" we keep track of the subscriptions that have been
added to smap. This map will be short lived, just enough to
protect against races above.

In updateSmap, when smap is not nil, we need to checki, if we
are adding, that the subscription has not already been handled.
The tempory subscription map will be ultimately emptied/set to
nil with the use of a timer (if not emptied in place when
processing smap updates).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-08-27 14:48:42 -06:00
Ivan Kozlovic
fe31ab3796 [FIXED] Possible removal of interest on queue subs with leaf nodes
Server was incorrectly processing a queue subscription removal
as both a plain sub and queue sub, which may have resulted in
drop of interest even when some queue subs remained.

Resolves #1421

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-08-27 14:39:19 -06:00
Ivan Kozlovic
87c00e3e8f [FIXED] Possible stall on shutdown with leafnode setup
If a leafnode connection is accepted but the server is shutdown
before the connection is fully registered, the shutdown would
stall because read and write loop go routine would not be
stopped.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-08-27 14:38:22 -06:00
Ivan Kozlovic
df676b7c63 [FIXED] Race condition during implicit Gateway reconnection
Say server in cluster A accepts a connection from a server in
cluster B.
The gateway is implicit, in that A does not have a configured
remote gateway to B.
Then the server in B is shutdown, which A detects and initiate
a single reconnect attempt (since it is implicit and if the
reconnect retries is not set).
While this happens, a new server in B is restarted and connects
to A. If that happens before the initial reconnect attempt
failed, A will register that new inbound and do not attempt to
solicit because it has already a remote entry for gateway B.
At this point when the reconnect to old server B fails, then
the remote GW entry is removed, and A will not create an outbound
connection to the new B server.

We fix that by checking if there is a registered inbound when
we get to the point of removing the remote on a failed implicit
reconnect. If there is one, we try the reconnection.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-08-27 14:36:54 -06:00
Ivan Kozlovic
6e3401eb81 [FIXED] Allow response permissions to work across accounts
This is a port of https://github.com/nats-io/nats-server/pull/1487

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-08-27 14:33:53 -06:00
Ivan Kozlovic
bf0930ee76 Merge pull request #1394 from nats-io/release_2_1_7
Release v2.1.7
2020-05-14 11:33:56 -06:00
Ivan Kozlovic
b329215af3 Release v2.1.7
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-05-13 19:40:49 -06:00
Ivan Kozlovic
54e014070f Merge pull request #1392 from guilherme-santos/master
[ADDED] base path for monitoring endpoints
2020-05-13 16:28:57 -06:00
Guilherme Santos
25858cba0b Implement basePath for monitoring endpoints 2020-05-13 23:29:11 +02:00
Matthias Hanel
d486f6ab9b Move reset of internal client to after the account sublist was moved.
This does not avoid the race condition, but makes it less likely to
trigger in unit tests.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-05-13 15:52:29 -04:00
Ivan Kozlovic
09a58cbada Merge pull request #1387 from nats-io/reload
[FIXED] Unnecessary account reloads
2020-05-12 16:09:39 -06:00
Matthias Hanel
04b81abdde [FIXED] default_permissions apply to nkey users as well
Fixes 1390

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-05-12 17:13:25 -04:00
Matthias Hanel
11c0669ae2 [FIXES] Unnecessary account reloads and pointer to old accounts
Fixes #1372 by updating s.sys.account pointer.

This issue also showed that accounts are unnecessarily reloaded.
This happened because account imports were not copied and thus,
deepEqual detected a difference were none was.
This was addressed by making the copy less shallow.

Furthermore did deepEqual detects a difference when it compared
slices that were appended to while processing a map.
This was fixed by sorting before comparison.

Noticed that Account.clients stored an unnecessary pointer.
Removed duplicated code in systemAccount.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-05-11 21:51:41 -04:00
Waldemar Quevedo
9a2d095885 Add support to match domainComponent (DC) in RDNSequence with TLS Auth
Currently when using TLS based authentication, any domain components
that could be present in the cert will be omitted since Go's
ToRDNSequence is not including them:

202c43b2ad/src/crypto/x509/pkix/pkix.go (L226-L245)

This commit adds support to include the domain components in case
present, also roughly following the order suggested at:
https://tools.ietf.org/html/rfc2253

Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2020-05-11 17:41:11 -07:00
Ivan Kozlovic
46f880bc52 [FIXED] Early closed connection may linger in the server
If the connection is marked as closed while sending the INFO, the
connection would not be removed from the internal map, which would
cause it to be shown in the monitoring list of opened connections.

Resolves #1384

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-05-08 12:01:15 -06:00
Ivan Kozlovic
81fabde729 Merge pull request #1362 from nats-io/monitoring_as_systemevents
[ADDED] Making monitoring endpoints available via system services.
2020-05-07 11:37:50 -06:00
Ivan Kozlovic
1ab26dfa0f Merge pull request #1381 from nats-io/validateOnReload
[FIXED] on reload, check error conditions checked in validateOptions
2020-05-06 15:50:03 -06:00
Matthias Hanel
0eae40070b [FIXED] on reload, check error conditions checked in validateOptions
Fixes #1378 by calling validateOptions on reload
Add missing comment to validateOptions

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-05-06 17:38:28 -04:00
Ivan Kozlovic
7e60769f9e Merge pull request #1377 from nats-io/subsz
[FIXED] subsz monitoring endpoint did not account for accounts.
2020-05-06 14:06:12 -06:00
Matthias Hanel
136feb9bc6 [FIXEd] subsz monitoring endpoint did not account for accounts.
Fixes  #1371 and #1357 by adding up stats and collecting subscriptions
from all accounts.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-05-06 15:48:51 -04:00
Matthias Hanel
14c716052d Making monitoring endpoints available via system services.
Available via $SYS.REQ.SERVER.%s.%s and $SYS.REQ.SERVER.PING.%s
Last token is the endpoint name.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-05-04 13:31:50 -04:00
Matthias Hanel
b074c941ae Add a no_auth_user
This configuration allows to refer to a configured user to be used when
the connection provides no credentials.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-05-02 15:59:06 -04:00
Ivan Kozlovic
fef94759ab [FIXED] Update remote gateway URLs when node goes away in cluster
If a node in the cluster goes away, an async INFO is sent to
inbound gateway connections so they get a chance to update their
list of remote gateway URLs. Same happens when a node is added
to the cluster.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-04-20 13:48:47 -06:00
Ivan Kozlovic
2ec00d86ed Replaced %v with %s so String() is not needed
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-04-15 17:52:44 -06:00
Ivan Kozlovic
be00ea96cf [IMPROVED] Added close reason in the connection close log statement
This gives the close reason directly in the log without having to
get that information from the monitoring endpoint. Here is an
example of a route closed due to the remote side not replying to
PINGs:

```
[INF] 127.0.0.1:53839 - rid:2 - Router connection closed: Stale Connection
```

Without this change, the log statement would have been:
```
[INF] 127.0.0.1:53839 - rid:2 - Router connection closed
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-04-15 15:36:54 -06:00
Ivan Kozlovic
c54f41acd6 Fixed flapper test
Make sure that the subscription on "service" is registered on
server A.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-04-14 13:27:17 -06:00
Derek Collison
aff10aa16b Fix for #1344
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-14 09:26:35 -07:00
Derek Collison
dc55356096 Have events look at whether or not a leaf is a hub, regardless of solicit
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-13 15:25:21 -07:00
Derek Collison
6fa7f1ce82 Have hub role sent to accepting side and adapt to be a spoke
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-13 15:18:42 -07:00
Ivan Kozlovic
439dca67c8 Test for interest propagation with GWs and Hub leafnodes
Setup:

  B <- GW -> C
 /            \
v              v
A              D

Leafnodes are created from B to A and C to D. The remotes on B and
C have the option "Hub: true".

The replier connects to D and listens to "service". The requestor
connects to "A" and sends the request on "service". The reply does
not make it back to A.
If the requestor on A, instead of calling Request(), first creates
a subscription on an inbox, wait a little bit (few 100s ms), then
publishes the request on "service" with that inbox for the reply
subject, the reply makes it back to A.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-04-13 12:47:18 -06:00
Derek Collison
2b1fe8f261 Merge pull request #1337 from nats-io/service-account-leaf-test
[FIXED] Service across accounts and leaf nodes
2020-04-10 17:38:07 -07:00
Derek Collison
ef85a1b836 Fix for #1336
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-10 17:30:03 -07:00
Ivan Kozlovic
7218da1cbc Merge pull request #1338 from nats-io/reduce_loop_errors
LeafNode: delay connect even when loop detected by accepting side
2020-04-10 18:04:11 -06:00
Ivan Kozlovic
b200368e52 LeafNode: delay connect even when loop detected by accepting side
If the loop is detected by a server accepting the leafnode connection,
an error is sent back and connection is closed.
This change ensures that the server checks an -ERR for "Loop detected"
and then set the connect delay, so that it does not try to reconnect
right away.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-04-10 16:44:16 -06:00
Derek Collison
090abc939d Fix for stream imports and leafnodes, #1332
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-10 10:36:20 -07:00
Derek Collison
84841a35bb Merge pull request #1334 from nats-io/leafnode_bug
Fix for bug when requestor and leafnode on same server.
2020-04-10 08:51:31 -07:00
Derek Collison
e843a27bba When a responder was on a leaf node and the requestor was connected to the same server as the leafnode we did not propagate the service reply wildcard properly. This fixes that.
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-10 08:35:09 -07:00
Ivan Kozlovic
34eb5bda31 [ADDED] Deny import/export options for LeafNode remote configuration
This will allow a leafnode remote connection to prevent unwanted
messages to be received, or prevent local messages to be sent
to the remote server.

Configuration will be something like:
```
leafnodes {
  remotes: [
    {
      url: "nats://localhost:6222"
      deny_imports: ["foo.*", "bar"]
      deny_exports: ["baz.*", "bat"]
    }
  ]
}
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-04-09 18:55:44 -06:00
Derek Collison
d70426d3f6 Do prefix test to avoid read lock if possible
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-09 08:48:04 -07:00
Derek Collison
699502de8f Detection for loops with leafnodes.
We need to send the unique LDS subject to all leafnodes to properly detect setups like triangles.
This will have the server who completes the loop be the one that detects the error soley based on
its own loop detection subject.

Otehr changes are just to fix tests that were not waiting for the new LDS sub.

Signed-off-by: Derek Collison <derek@nats.io>
2020-04-08 20:00:40 -07:00
Derek Collison
743c6ec774 Merge pull request #1327 from nats-io/leaf_staggered
This commit allows new servers in a supercluster to be informed of accounts with leafnodes.
2020-04-08 10:57:49 -07:00
Derek Collison
82f585d83a Updated to also resend leafnode connect on GW connect via first INFO
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-08 09:55:19 -07:00
Derek Collison
43fbe0ffed This commit allows new servers ina supercluster to be informed of accounts with active leafnode connections.
This is needed to put those accounts into interest only mode for inbound gateway connections. Also added code
to make sure we were doing proper account tracking and would track the global account as well, which used to
be excluded.

Fixes #977

Signed-off-by: Derek Collison <derek@nats.io>
2020-04-07 16:22:15 -07:00
Ivan Kozlovic
76e8e1c9b0 [ADDED] Leafnode remote's Hub option
This allows a node that creates a remote LeafNode connection to
act as it was the hub (of the hub and spoke topology). This is
related to subscription interest propagation. Normally, a spoke
(the one creating the remote LN connection) will forward only
its local subscriptions and when receiving subscription interest
would not try to forward to local cluster and/or gateways.
If a remote has the Hub boolean set to true, even though the
node is the one creating the remote LN connection, it will behave
as if it was accepting that connection.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-04-07 13:42:55 -06:00
Ivan Kozlovic
a19dcbee2d Release v2.1.6
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-03-31 10:29:06 -06:00
Ivan Kozlovic
e63fc5f195 Merge pull request #1318 from nats-io/monitoring
[ADDED] Option to include subscription details in monitoring responses
2020-03-30 15:51:51 -06:00
Matthias Hanel
8a75418386 Using AccountResolver url from operator jwt.
If resolver is specified separately, it takes precedence.
nsc push automatically adds /accounts. That's why its added here too.
Operator jwt reload is not supported and is not taken into account.
On startup the AccountResolver url is checked and exits if it can't be
reached.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-03-30 15:43:32 -04:00
Matthias Hanel
30ba333663 Adding an option to include subscription details in monitoring responses.
Applies to routez and connz and closed connections.
Enable by specifying subs=detail

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-03-23 12:25:51 -04:00
Ivan Kozlovic
b46a011c87 Merge pull request #1316 from nats-io/fix_1313
Add TLS 1.3 (and new ciphers) in the tlsVersion output
2020-03-19 17:30:51 -06:00