Commit Graph

49 Commits

Author SHA1 Message Date
Ivan Kozlovic
34eb5bda31 [ADDED] Deny import/export options for LeafNode remote configuration
This will allow a leafnode remote connection to prevent unwanted
messages to be received, or prevent local messages to be sent
to the remote server.

Configuration will be something like:
```
leafnodes {
  remotes: [
    {
      url: "nats://localhost:6222"
      deny_imports: ["foo.*", "bar"]
      deny_exports: ["baz.*", "bat"]
    }
  ]
}
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-04-09 18:55:44 -06:00
Derek Collison
699502de8f Detection for loops with leafnodes.
We need to send the unique LDS subject to all leafnodes to properly detect setups like triangles.
This will have the server who completes the loop be the one that detects the error soley based on
its own loop detection subject.

Otehr changes are just to fix tests that were not waiting for the new LDS sub.

Signed-off-by: Derek Collison <derek@nats.io>
2020-04-08 20:00:40 -07:00
Matthias Hanel
6f77a54118 [FIXED] loop detection by checking for duplicate lds subscriptions
This is in addition to checking if the own subscription comes back.
The duplicated lds subscription must come from a different client.
Added unit tests.
Also prefixed lds with '$' to mark it as system subject going forward.

This moves the loop detection check past other checks.
These checks should not trigger in cases where a loop is initially detected.

Fixes #1305

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-03-17 19:06:35 -04:00
Ivan Kozlovic
156bf7b381 Updates based on code review
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-02-19 16:52:41 -07:00
Ivan Kozlovic
8e4b449119 Fixed flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-02-19 13:19:08 -07:00
Ivan Kozlovic
947798231b [UPDATED] TCP Write and SlowConsumer handling
- All writes will now be done by the writeLoop, unless when the
  writeLoop has not been started yet (likely in connection init).
- Slow consumers for non CLIENT connections will be reported but
  not failed. The idea is that routes, gateway, etc.. connections
  should stay connected as much as possible. However if a flush
  operation times out and no data at all has been written, the
  connection will be closed (regardless of type).
- Slow consumers due to max pending is only for CLIENT connections.
  This allows sending of SUBs through routes, etc.. to not have
  to be chunked.
- The backpressure to CLIENT connections is increased (up to 1sec)
  based on the sub's connection pending bytes level.
- Connection is flushed on close from the writeLoop as to not block
  the "fast path".

Some tests have been fixed and adapted since now closeConnection()
is not flushing/closing/removing connection in place.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-31 15:06:27 -07:00
Ivan Kozlovic
1b2754475b Refactor async client tests
Updated all tests that use "async" clients.
- start the writeLoop (this is in preparation for changes in the
  server that will not do send-in-place for some protocols, such
  as PING, etc..)
- Added missing defers in several tests
- fixed an issue in client.go where test was wrong possibly causing
  a panic.
- Had to skip a test for now since it would fail without server code
  change.

The next step will be ensure that all protocols are sent through
the writeLoop and that the data is properly flushed on close (important
for -ERR for instance).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-12 11:58:24 -07:00
Ivan Kozlovic
b78ca2f63b Fixes for system events
- Call flushOutbound() for SYSTEM connections
- Flush in place in internalSendLoop when sending the shutdown event
- Fix some tests:
  - missing defer client connection Close()
  - ensure subs are registered and messages received before shutdown
    of leafnode server to check disconnected event's stats.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-04 20:55:55 -07:00
Ivan Kozlovic
63138509f7 Tune some code/test for Windows
Running test suite on a Windows VM, I notice several failures.
Updated the compute of the RTT to be at least 1ns. I think that
this is just an issue with the VM I am running, but that change
will have no impact for normal situations (since setting the rtt
to the very minimum duration (1ns) instead of 0) and will prevent
some tests from failing.

Because of those same timer granularity issues, I had to add some
delays between some actions in order for time.Sub()/Since() to
actually report something more than 0.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-21 14:32:46 -07:00
Derek Collison
07ab23af0d Need return when acc not found
Signed-off-by: Derek Collison <derek@nats.io>
2019-11-17 09:24:34 -08:00
Ivan Kozlovic
3e1728d623 [FIXED] Some accounts locking issues
- Risk of deadlock when checking if issuer claim are trusted. There
  was a RLock() in one thread, then a request for Lock() in another
  that was waiting for RLock() to return, but the first thread was
  then doing RLock() which was not acquired because this was blocked
  by the Lock() request (see e2160cc571)

- Use proper account/locking mode when checking if stream/service
  exports/signer have changed.

- Account registration race (regression from https://github.com/nats-io/nats-server/pull/890)

- Move test from #890 to "no race" test since only then could it detect
  the double registration.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-16 16:59:38 -07:00
Ivan Kozlovic
12eb1f5b00 [ADDED] Server name in the RouteStat for statsz
Add the remote server name for a route in the statsz event

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-25 16:34:07 -06:00
Ivan Kozlovic
150d47cab3 [FIXED] Locking issue around account lookup/updates
Ensure that lookupAccount does not hold server lock during
updateAccount and fetchAccount.
Updating the account cannot have the server lock because it is
possible that during updateAccountClaims(), clients are being
removed, which would try to get the server lock (deep down in
closeConnection/s.removeClient).
Added a test that would have show the deadlock prior to changes
in this PR.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-17 18:48:23 -06:00
Derek Collison
52430c304a System level services for debugging.
This is the first pass at introducing exported services to the system account for generally debugging of blackbox systems.
The first service reports number of subscribers for a given subject. The payload of the request is the subject, and optional queue group, and can contain wildcards.

Signed-off-by: Derek Collison <derek@nats.io>
2019-09-17 09:37:35 -07:00
Derek Collison
7989118c3f First pass latency tracking for exported services
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 10:52:48 -07:00
Derek Collison
b3f6997bc0 Make sure to flush
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-11 17:37:07 -07:00
Derek Collison
d027ff7efd Add leafnode usage test
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-11 17:30:01 -07:00
Ivan Kozlovic
7f2620904c Fixed setting timer for account connection updates
The timer was not set with the proper variable, which caused the
check to always think that a new timer should be created, which
would lead to more and more timers being created which translated
to updates being sent more and more frequently.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-29 14:28:26 -06:00
Ivan Kozlovic
ce1e6defab Fix flappers
- TestSystemAccountConnectionUpdatesStopAfterNoLocal: I believe that
  the check on number of notifications was wrong. Since we did not
  consume the ones for the connect, the expected count after the
  disconnect is 8 instead of 4.

- Possible fix GW tests complaining about number of outbound/inbound
  I think that it may be possible that connection does not succeed
  right away (remote to fully started, etc) and due to dial timeout
  and reconnect attempt delay, I suspect that when given a max time
  of 1sec to complete, it may not be enough.
  Quick change for now is to override to 2secs for now in the
  wait helpers. If that proves conclusive, we could remove the
  timeout given to these helpers.

- TestGatewaySendAllSubsBadProtocol: used a t.Fatalf() in checkFor
  instead of return fmt.Errorf().

- TestLeafNodeResetsMSGProto: this test is not about change to
  interest mode only, so to avoid possible mix of protos, delay
  a bit creation of gateway after creation of leaf node.

- Some defer s.Shutdown() were missing

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-26 17:17:08 -06:00
Ivan Kozlovic
48c3f7f846 Fixed some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-24 09:53:35 -06:00
Derek Collison
d7140a0fd1 Update for client rename
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-10 15:11:30 -07:00
Derek Collison
bacb73a403 First pass at leaf nodes. Basic functionality working, including gateways.
What is not completed:
1. TLS
2. config to bind local account.
3. Info updates for solicitor to track topology changes like a client.
4. CONNECT sent after INFO for nonce authroization.
5. Authorization
6. Services and Streams tests.
7. config file parsing.

Signed-off-by: Derek Collison <derek@nats.io>
2019-03-25 08:54:47 -07:00
Derek Collison
c5510d616e Remove delay for global statsz, bump RC version
Signed-off-by: Derek Collison <derek@nats.io>
2019-02-12 15:17:29 -08:00
Ivan Kozlovic
2e9fe694d6 Fixed possible race when looking/registering an account
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-01-31 09:25:40 -07:00
Derek Collison
cc5873cd72 Added start time to Statsz from server.
Added in more debug for imports processing.
Changed subs reporting for Statsz.

Signed-off-by: Derek Collison <derek@nats.io>
2018-12-19 13:19:00 -08:00
Derek Collison
2ab23ca307 Make public for tooling
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-08 18:33:23 -08:00
Derek Collison
a92ef0252c Should not send disconnect events on account $G.
Converted to authorization error events on different subject.
Add cluster name if gateways are configured and pass in INFO to clients.

Signed-off-by: Derek Collison <derek@nats.io>
2018-12-08 16:07:02 -08:00
Derek Collison
c83d7f8851 Support server ping for statusz
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-07 08:42:01 -08:00
Derek Collison
4bea6e0002 Test conditional fix
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-06 15:32:29 -08:00
Derek Collison
bdb42fab54 Don't erase nonce on reload
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-06 15:09:14 -08:00
Derek Collison
7b0f2426fa Internal clients aren't weighed against limits
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-06 14:23:59 -08:00
Derek Collison
18bca5603f Added server version and cluster name to statsz.
Fixed account connection accounting sending after local connections is 0.

Signed-off-by: Derek Collison <derek@nats.io>
2018-12-06 10:57:39 -08:00
Derek Collison
c0932df182 test update
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-06 09:02:47 -08:00
Derek Collison
b9aa2a3da4 Enforce account limits on system account too
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-06 08:37:22 -08:00
Derek Collison
2d54fc3ee7 Account lookup failures, account and client limits, options reload.
Changed account lookup and validation failures to be more understandable by users.
Changed limits to be -1 for unlimited to match jwt pkg.

The limits changed exposed problems with options holding real objects causing issues with reload tests under race mode.
Longer term this code should be reworked such that options only hold config data, not real structs, etc.

Signed-off-by: Derek Collison <derek@nats.io>
2018-12-05 14:25:40 -08:00
Derek Collison
53c70e6ce1 Use atomic.Load
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-04 09:09:27 -08:00
Derek Collison
f9912700c8 Rebase from master
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-04 08:48:40 -08:00
Derek Collison
760507222a Added statsz support
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-04 08:23:33 -08:00
Ivan Kozlovic
2618d39a36 Allow system messages to cross gateways.
Removed the code getting matching subscriptions and trying
to exclude non internal interest since as soon as there is
routing and/or gateway, it is likely that server would end-up
generating the payload and sending. May need to revisit.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-12-03 20:59:32 -07:00
Derek Collison
b2d8421e21 Make tests a bit more stable
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-03 15:45:06 -08:00
Derek Collison
a2ec546850 Remove newest only
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-03 06:32:32 -08:00
Derek Collison
4b6982dbcd Add timeout test for URL account resolver
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-03 05:40:34 -08:00
Derek Collison
f4f3d3baf1 Updates for operator based configurations.
Added update to parse and load operator JWTs.
Changed to add in signing keys from operator JWT to list of trusted keys.
Added URL account resolver.
Added account claim updates by system messages.

Signed-off-by: Derek Collison <derek@nats.io>
2018-12-02 20:34:33 -08:00
Derek Collison
222814609d Add in server shutdowns
Signed-off-by: Derek Collison <derek@nats.io>
2018-12-01 17:30:22 -08:00
Derek Collison
4b1e5358bc Don't hold server lock when placing outbound items on sendq
Needed to change some things around but think this is close.

Signed-off-by: Derek Collison <derek@nats.io>
2018-12-01 16:48:57 -08:00
Derek Collison
744795ead5 Allow servers to send system events.
Specifically this is to support distributed tracking of number of account connections across clusters.
Gateways may not work yet based on attempts to only generate payloads when we know there is outside interest.

Signed-off-by: Derek Collison <derek@nats.io>
2018-12-01 13:54:25 -08:00
Derek Collison
10eb491898 Better serializing of server info
Signed-off-by: Derek Collison <derek@nats.io>
2018-11-29 15:22:04 -08:00
Derek Collison
8d05ed82ff Make sure sub is processed
Signed-off-by: Derek Collison <derek@nats.io>
2018-11-29 12:29:50 -08:00
Derek Collison
574fd62e01 Allow servers to send and receive messages directly
Signed-off-by: Derek Collison <derek@nats.io>
2018-11-29 12:15:08 -08:00