If the loop is detected by a server accepting the leafnode connection,
an error is sent back and connection is closed.
This change ensures that the server checks an -ERR for "Loop detected"
and then set the connect delay, so that it does not try to reconnect
right away.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This will allow a leafnode remote connection to prevent unwanted
messages to be received, or prevent local messages to be sent
to the remote server.
Configuration will be something like:
```
leafnodes {
remotes: [
{
url: "nats://localhost:6222"
deny_imports: ["foo.*", "bar"]
deny_exports: ["baz.*", "bat"]
}
]
}
```
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
We need to send the unique LDS subject to all leafnodes to properly detect setups like triangles.
This will have the server who completes the loop be the one that detects the error soley based on
its own loop detection subject.
Otehr changes are just to fix tests that were not waiting for the new LDS sub.
Signed-off-by: Derek Collison <derek@nats.io>
This allows a node that creates a remote LeafNode connection to
act as it was the hub (of the hub and spoke topology). This is
related to subscription interest propagation. Normally, a spoke
(the one creating the remote LN connection) will forward only
its local subscriptions and when receiving subscription interest
would not try to forward to local cluster and/or gateways.
If a remote has the Hub boolean set to true, even though the
node is the one creating the remote LN connection, it will behave
as if it was accepting that connection.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is in addition to checking if the own subscription comes back.
The duplicated lds subscription must come from a different client.
Added unit tests.
Also prefixed lds with '$' to mark it as system subject going forward.
This moves the loop detection check past other checks.
These checks should not trigger in cases where a loop is initially detected.
Fixes#1305
Signed-off-by: Matthias Hanel <mh@synadia.com>
Fixed#1296, by altering client state on reload
Detect a trace level change on reload and update all clients.
To avoid data races, read client.trace while holding the lock,
pass the value into functionis that trace while not holding the lock.
Delete unused client.debug.
Signed-off-by: Matthias Hanel <mh@synadia.com>
First, the test should be done only for the initial INFO and only
for solicited connections. Based on the content of INFO coming
from different "listen ports", use the CID and LeafNodeURLs for
the indication that we are connected to the proper port.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
For issue #1256, we cleared the possibly saved tlsName on Hanshake failure.
However, this meant that for normal use cases, if a reconnect failed for
any reason we would not be able to reconnect if it is an IP until we get
back to the URL that contained the hostname.
We now clear only if the handshake error is of x509.HostnameError type,
which include errors such as:
```
"x509: Common Name is not a valid hostname: <x>"
"x509: cannot validate certificate for <x> because it doesn't contain any IP SANs"
"x509: certificate is not valid for any names, but wanted to match <x>"
"x509: certificate is valid for <x>, not <y>"
```
Applied the same logic to solicited gateway connections, and fixed the fact
that the tlsConfig should be cloned (since we set the ServerName).
I have also made a change for leafnode connections similar to what we are
doing for gateway connections, which is to use the saved tlsName only if
tlsConfig.ServerName is empty, which may not be the case for users that
embed NATS Server and pass directly tls configuration. In other words,
if the option TLSConfig.ServerName is not empty, always use this value.
Relates to #1256
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- All writes will now be done by the writeLoop, unless when the
writeLoop has not been started yet (likely in connection init).
- Slow consumers for non CLIENT connections will be reported but
not failed. The idea is that routes, gateway, etc.. connections
should stay connected as much as possible. However if a flush
operation times out and no data at all has been written, the
connection will be closed (regardless of type).
- Slow consumers due to max pending is only for CLIENT connections.
This allows sending of SUBs through routes, etc.. to not have
to be chunked.
- The backpressure to CLIENT connections is increased (up to 1sec)
based on the sub's connection pending bytes level.
- Connection is flushed on close from the writeLoop as to not block
the "fast path".
Some tests have been fixed and adapted since now closeConnection()
is not flushing/closing/removing connection in place.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is needed for mapped gateway replies. We had used an extra
token when implementing the new prefix, but it was then removed,
but the leafnode subscription on _GR_.*.*.*.> was not updated.
We now subscribe on _GR_.>
There was a test that was passing because we were using inboxes
that caused the pattern to match. Replaced with single token
reply so that it would have caught this bug.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- New prefix that includes origin server for the request
- Mapping done if request is service import or requestor has
recent subscription
- Subscription considered recent if less than 250ms
- Destination server strip GW prefix before giving to client
and restore when getting a reply on that subject
- Mapping removed aftert 250ms
- Server rejects client publish on "$GNR." (the new prefix)
- Cluster and server hash are now 8 chars long and from base 62
alphabets
- Mapped replies need to be sent to leafnode servers due to race
(cluster B sends RS+ on GW inbound then RMSG on outbound, the
RS+ may be processed later and cluster A may have given message
to LN before RS+ on reply subject. So LN needs to accept the
mapped reply but will strip to give to client and reassemble
before sending it back)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Make "lds." a constant
- Create remote's get/reset functions for loop delay
- Bump loop delay to 30 seconds
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is achieved by subscribing to a unique subject. If the LS+
protocol is coming back for the same subject on the same account,
then this indicates a loop.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Added a way to specify which account an accepted leafnode connection
should be bound to when using simple auth (user/password).
Singleton:
```
leafnodes {
port: ...
authorization {
user: leaf
password: secret
account: TheAccount
}
}
```
With above configuration, if a soliciting server creates a LN connection
with url: `nats://leaf:secret@host:port`, then the accepting server
will bind the leafnode connection to the account "TheAccount". This account
need to exist otherwise the connection will be rejected.
Multi:
```
leafnodes {
port: ...
authorization {
users = [
{user: leaf1, password: secret, account: account1}
{user: leaf2, password: secret, account: account2}
]
}
}
```
With the above, if a server connects using `leaf1:secret@host:port`, then
the accepting server will bind the connection to account `account1`.
If user/password (either singleton or multi) is defined, then the connecting
server MUST provide the proper credentials otherwise the connection will
be rejected.
If no user/password info is provided, it is still possible to provide the
account the connection should be associated with:
```
leafnodes {
port: ...
authorization {
account: TheAccount
}
}
```
With the above, a connection without credentials will be bound to the
account "TheAccount".
If credentials are used (jwt, nkey or other), then the server will attempt
to authenticate and if successful associate to the account for that specific
user. If the user authentication fails (wrong password, no such user, etc..)
the connection will be also rejected.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is the first pass at introducing exported services to the system account for generally debugging of blackbox systems.
The first service reports number of subscribers for a given subject. The payload of the request is the subject, and optional queue group, and can contain wildcards.
Signed-off-by: Derek Collison <derek@nats.io>
Defaults to 1sec but will be opts.PingInterval if value is lower.
All non client connections invoked this function for the first
PING.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).
For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- On startup, verify that local account in leafnode (if specified
can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
solicited Leafnode connection to not use user/password when
trying an URL that was discovered through gossip. The server
now saves the credentials of a configured URL to use with
the discovered ones.
Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Take into account tracking of response maps that are created and do proper cleanup.
Also fixes#1089 which was discovered while working on this.
Signed-off-by: Derek Collison <derek@nats.io>
When a cluster of servers are having routes to each other, there
is a chance that the list of leafnode URLs maintained on each
server is not complete. This would result in LN servers connecting
to this cluster to not get the full list of possible URLs the
server could reconnect to.
Also fixed a DATA RACE that appeared when running the updated
TestLeafNodeInfoURLs test. Fixed the race and added specific
test that easily demonstrated the race: TestLeafNodeNoRaceGeneratingNonce
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Also fix for #1071 in that we need to check remote gateways TLS
config even if main gateway section is not configured with TLS.
Related to #1071
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a solicited leafnode comes from multiple servers that themselves are a cluster, cycles were formed.
This change allows solicited leafnodes to behave similar to gateways in that each server of a cluster
is expected to have a solicted leafnode per destination account and cluster.
We no longer forward subscription interest or messages to a cluster from a server that has a solicited leafnode.
Signed-off-by: Derek Collison <derek@nats.io>
One could craft a PUB protocol to cause server to panic. This can
happen if the size in the PUB protocol overruns an int32.
(note that if authorization is enabled, the user would need to
authenticate first, limiting the impact).
Thank you to Aviv Sasson and Ariel Zelivansky from Twistlock
for the security report!
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>