Commit Graph

53 Commits

Author SHA1 Message Date
Derek Collison
aff10aa16b Fix for #1344
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-14 09:26:35 -07:00
Derek Collison
ef85a1b836 Fix for #1336
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-10 17:30:03 -07:00
Matthias Hanel
e8ce738808 Test of service across accounts and leaf node. Tests #1336
Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-04-10 15:55:10 -04:00
Derek Collison
f9d9ac193a Use prefix to make sure we use right subject
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-10 10:49:05 -07:00
Derek Collison
090abc939d Fix for stream imports and leafnodes, #1332
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-10 10:36:20 -07:00
Derek Collison
e843a27bba When a responder was on a leaf node and the requestor was connected to the same server as the leafnode we did not propagate the service reply wildcard properly. This fixes that.
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-10 08:35:09 -07:00
Derek Collison
699502de8f Detection for loops with leafnodes.
We need to send the unique LDS subject to all leafnodes to properly detect setups like triangles.
This will have the server who completes the loop be the one that detects the error soley based on
its own loop detection subject.

Otehr changes are just to fix tests that were not waiting for the new LDS sub.

Signed-off-by: Derek Collison <derek@nats.io>
2020-04-08 20:00:40 -07:00
Derek Collison
82f585d83a Updated to also resend leafnode connect on GW connect via first INFO
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-08 09:55:19 -07:00
Derek Collison
43fbe0ffed This commit allows new servers ina supercluster to be informed of accounts with active leafnode connections.
This is needed to put those accounts into interest only mode for inbound gateway connections. Also added code
to make sure we were doing proper account tracking and would track the global account as well, which used to
be excluded.

Fixes #977

Signed-off-by: Derek Collison <derek@nats.io>
2020-04-07 16:22:15 -07:00
Matthias Hanel
6f77a54118 [FIXED] loop detection by checking for duplicate lds subscriptions
This is in addition to checking if the own subscription comes back.
The duplicated lds subscription must come from a different client.
Added unit tests.
Also prefixed lds with '$' to mark it as system subject going forward.

This moves the loop detection check past other checks.
These checks should not trigger in cases where a loop is initially detected.

Fixes #1305

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-03-17 19:06:35 -04:00
Matthias Hanel
68efc95a60 Modifying unit test error message to hint at ulimit -n possibly being too low
Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-03-04 14:30:35 -05:00
Ivan Kozlovic
47b08335a4 [FIXED] Reset of tlsName only for x509.HostnameError
For issue #1256, we cleared the possibly saved tlsName on Hanshake failure.
However, this meant that for normal use cases, if a reconnect failed for
any reason we would not be able to reconnect if it is an IP until we get
back to the URL that contained the hostname.

We now clear only if the handshake error is of x509.HostnameError type,
which include errors such as:
```
"x509: Common Name is not a valid hostname: <x>"
"x509: cannot validate certificate for <x> because it doesn't contain any IP SANs"
"x509: certificate is not valid for any names, but wanted to match <x>"
"x509: certificate is valid for <x>, not <y>"
```

Applied the same logic to solicited gateway connections, and fixed the fact
that the tlsConfig should be cloned (since we set the ServerName).

I have also made a change for leafnode connections similar to what we are
doing for gateway connections, which is to use the saved tlsName only if
tlsConfig.ServerName is empty, which may not be the case for users that
embed NATS Server and pass directly tls configuration. In other words,
if the option TLSConfig.ServerName is not empty, always use this value.

Relates to #1256

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-01-28 13:16:38 -07:00
Derek Collison
643e73c0c5 Fix for #1256, mixed IP and DNS for cluster and TLS with leafnodes
Signed-off-by: Derek Collison <derek@nats.io>
2020-01-22 11:25:09 -08:00
Ivan Kozlovic
947798231b [UPDATED] TCP Write and SlowConsumer handling
- All writes will now be done by the writeLoop, unless when the
  writeLoop has not been started yet (likely in connection init).
- Slow consumers for non CLIENT connections will be reported but
  not failed. The idea is that routes, gateway, etc.. connections
  should stay connected as much as possible. However if a flush
  operation times out and no data at all has been written, the
  connection will be closed (regardless of type).
- Slow consumers due to max pending is only for CLIENT connections.
  This allows sending of SUBs through routes, etc.. to not have
  to be chunked.
- The backpressure to CLIENT connections is increased (up to 1sec)
  based on the sub's connection pending bytes level.
- Connection is flushed on close from the writeLoop as to not block
  the "fast path".

Some tests have been fixed and adapted since now closeConnection()
is not flushing/closing/removing connection in place.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-31 15:06:27 -07:00
Ivan Kozlovic
1b2754475b Refactor async client tests
Updated all tests that use "async" clients.
- start the writeLoop (this is in preparation for changes in the
  server that will not do send-in-place for some protocols, such
  as PING, etc..)
- Added missing defers in several tests
- fixed an issue in client.go where test was wrong possibly causing
  a panic.
- Had to skip a test for now since it would fail without server code
  change.

The next step will be ensure that all protocols are sent through
the writeLoop and that the data is properly flushed on close (important
for -ERR for instance).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-12 11:58:24 -07:00
Ivan Kozlovic
63138509f7 Tune some code/test for Windows
Running test suite on a Windows VM, I notice several failures.
Updated the compute of the RTT to be at least 1ns. I think that
this is just an issue with the VM I am running, but that change
will have no impact for normal situations (since setting the rtt
to the very minimum duration (1ns) instead of 0) and will prevent
some tests from failing.

Because of those same timer granularity issues, I had to add some
delays between some actions in order for time.Sub()/Since() to
actually report something more than 0.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-21 14:32:46 -07:00
Ivan Kozlovic
977c290bf2 [FIXED] Handling of split buffer for LEAF messages
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-18 11:55:18 -07:00
Derek Collison
07253c0517 Merge pull request #1196 from nats-io/daisy
Allow interest propagation with daisy-chained leafnodes
2019-11-17 17:46:23 -08:00
Derek Collison
07da68ce56 Allow interest propagation with daisy chained leafnodes
Signed-off-by: Derek Collison <derek@nats.io>
2019-11-17 17:35:20 -08:00
Ivan Kozlovic
e0bc81d0ed Make the Leafnode internal sub on _GR_.>
This is needed for mapped gateway replies. We had used an extra
token when implementing the new prefix, but it was then removed,
but the leafnode subscription on _GR_.*.*.*.> was not updated.
We now subscribe on _GR_.>
There was a test that was passing because we were using inboxes
that caused the pattern to match. Replaced with single token
reply so that it would have caught this bug.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-17 17:37:09 -07:00
Derek Collison
3330820502 Fixed a bug where we leaked service imports. Also prior this would have leaked subscriptions as well.
Signed-off-by: Derek Collison <derek@nats.io>
2019-11-14 13:29:17 -08:00
Ivan Kozlovic
aa843945c9 Work on Gateways reply mapping
- New prefix that includes origin server for the request
- Mapping done if request is service import or requestor has
  recent subscription
- Subscription considered recent if less than 250ms
- Destination server strip GW prefix before giving to client
  and restore when getting a reply on that subject
- Mapping removed aftert 250ms
- Server rejects client publish on "$GNR." (the new prefix)
- Cluster and server hash are now 8 chars long and from base 62
  alphabets
- Mapped replies need to be sent to leafnode servers due to race
  (cluster B sends RS+ on GW inbound then RMSG on outbound, the
  RS+ may be processed later and cluster A may have given message
  to LN before RS+ on reply subject. So LN needs to accept the
  mapped reply but will strip to give to client and reassemble
  before sending it back)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-06 16:06:49 -07:00
Ivan Kozlovic
cbbc21ac25 Some update to leafnode subscription handling
- Send all subs in place if smap is small
- Skip sending update until after sendAllLeafSubs() is done

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-30 20:01:49 -06:00
Ivan Kozlovic
279cab2aaf [FIXED] Detect loop between LeafNode servers
This is achieved by subscribing to a unique subject. If the LS+
protocol is coming back for the same subject on the same account,
then this indicates a loop.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-29 16:14:35 -06:00
Derek Collison
52430c304a System level services for debugging.
This is the first pass at introducing exported services to the system account for generally debugging of blackbox systems.
The first service reports number of subscribers for a given subject. The payload of the request is the subject, and optional queue group, and can contain wildcards.

Signed-off-by: Derek Collison <derek@nats.io>
2019-09-17 09:37:35 -07:00
Derek Collison
7989118c3f First pass latency tracking for exported services
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 10:52:48 -07:00
Ivan Kozlovic
7ca8723942 [FIXED] Some Leafnode issues
- On startup, verify that local account in leafnode (if specified
  can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
  Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
  solicited Leafnode connection to not use user/password when
  trying an URL that was discovered through gossip. The server
  now saves the credentials of a configured URL to use with
  the discovered ones.

Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-23 14:08:07 -06:00
Ivan Kozlovic
77c63dbce1 Fix flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 17:07:22 -06:00
Alberto Ricart
273e5af0a8 Fixed an issue where the leaf authentication was not checking for account/signers, so user JWTs signed by a signer failed authentication. 2019-07-17 16:03:55 -04:00
Ivan Kozlovic
0873b46f67 [FIXED] LeafNode urls may be missing in INFO sent to LN connections
When a cluster of servers are having routes to each other, there
is a chance that the list of leafnode URLs maintained on each
server is not complete. This would result in LN servers connecting
to this cluster to not get the full list of possible URLs the
server could reconnect to.

Also fixed a DATA RACE that appeared when running the updated
TestLeafNodeInfoURLs test. Fixed the race and added specific
test that easily demonstrated the race: TestLeafNodeNoRaceGeneratingNonce

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-12 19:15:30 -06:00
Derek Collison
a795920dc3 Report authorization error and use TLS hostname for IPs on leafnodes.
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-12 13:57:16 -07:00
Derek Collison
951ae49100 Prevent multiple solicited leafnodes from forming cycles.
When a solicited leafnode comes from multiple servers that themselves are a cluster, cycles were formed.
This change allows solicited leafnodes to behave similar to gateways in that each server of a cluster
is expected to have a solicted leafnode per destination account and cluster.

We no longer forward subscription interest or messages to a cluster from a server that has a solicited leafnode.

Signed-off-by: Derek Collison <derek@nats.io>
2019-07-10 20:16:47 -07:00
Derek Collison
10d4f1ab7a Convert leafnode solicited remotes to array
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-10 11:53:34 -07:00
Derek Collison
a61d32a82c Test for staggered leafnodes and sub/pub. Verifies fix for #1066
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-10 09:57:43 -07:00
Derek Collison
49707317a1 Make sure we route responses across leafnodes
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-08 16:20:40 -07:00
Derek Collison
100d0d2b02 Use default port for leafnode remote if not specified
Signed-off-by: Derek Collison <derek@nats.io>
2019-06-29 17:50:21 -07:00
Derek Collison
d1a782e014 Messages not distributed evenly when sourced from leafnode.
When messages came from a leafnode there were not being distributed evenly to the destination cluster.

Signed-off-by: Derek Collison <derek@nats.io>
2019-06-11 20:37:49 -07:00
Ivan Kozlovic
ed1901c792 Update go.mod to satisfy v2 requirements
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-06-03 19:45:47 -06:00
Derek Collison
2a8e630bf1 Fix for leafnode and dq selection over GWs
Signed-off-by: Derek Collison <derek@nats.io>
2019-06-01 16:43:54 -07:00
Derek Collison
adba6dc023 Add in leafnode bound account events for accounting
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-31 16:58:27 -07:00
Derek Collison
3cf6f6a5d2 Bug fix for service import with leafnodes and gws
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-31 11:22:02 -07:00
Ivan Kozlovic
ce1e6defab Fix flappers
- TestSystemAccountConnectionUpdatesStopAfterNoLocal: I believe that
  the check on number of notifications was wrong. Since we did not
  consume the ones for the connect, the expected count after the
  disconnect is 8 instead of 4.

- Possible fix GW tests complaining about number of outbound/inbound
  I think that it may be possible that connection does not succeed
  right away (remote to fully started, etc) and due to dial timeout
  and reconnect attempt delay, I suspect that when given a max time
  of 1sec to complete, it may not be enough.
  Quick change for now is to override to 2secs for now in the
  wait helpers. If that proves conclusive, we could remove the
  timeout given to these helpers.

- TestGatewaySendAllSubsBadProtocol: used a t.Fatalf() in checkFor
  instead of return fmt.Errorf().

- TestLeafNodeResetsMSGProto: this test is not about change to
  interest mode only, so to avoid possible mix of protos, delay
  a bit creation of gateway after creation of leaf node.

- Some defer s.Shutdown() were missing

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-26 17:17:08 -06:00
Derek Collison
d7140a0fd1 Update for client rename
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-10 15:11:30 -07:00
Derek Collison
acfe372d63 Changes for rename from gnatsd -> nats-server
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-06 15:04:24 -07:00
Derek Collison
bed56ab9cc Make sure remotes send existing sub interest
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-02 17:05:02 -07:00
Derek Collison
5292ec1598 Various fixes, init smap for leafnodes with gateways too
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-02 14:22:51 -07:00
Derek Collison
1d736ccc61 Make sure we use correct MSG prefix when mixing between leafnodes and routes.
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-01 15:08:20 -07:00
Ivan Kozlovic
dce9d672c1 Fixed panic with leafnode and gateway when no interest registered
Say there are 2 clusters, A and B. A client connects to A and
publishes messages on an account that B has no interest in.
Then a leaf node server connects to B (using same account than
the no-interest is for). Cluster B will ask cluster
A to switch to interest mode only for leaf node account. This
would cause a panic.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-01 13:40:17 -06:00
Derek Collison
2ec3eaeaa9 Leafnode account based connections limits
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-25 14:40:59 -07:00
Derek Collison
bfef3bd5a6 Fix for service import processing across routes for leaf nodes
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-17 14:37:09 -07:00