Commit Graph

152 Commits

Author SHA1 Message Date
Ivan Kozlovic
5fc9e0e1cc [FIXED] Gateway URLs gossip and /varz report issues
- When detecting duplicate route, it was possible that a server
would lose track of the peer's gateway URL, which would prevent
it from gossiping that URL to inbound gateway connections
- When a server has gateways enabled and has as a remote its
own gateway, the monitoring endpoint `/varz` would include it
but without the "urls" array.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-10-28 12:05:30 -06:00
Ivan Kozlovic
0bd38bd424 [FIXED] Monitoring: /varz gateway URLs not always updated
When servers leave a cluster and their gateway URLs was not in
the remote cluster's configuration, it is possible that their
gateway URL do not disappear from the list of URLs in the `/varz`
monitoring endpoint.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-10-26 13:11:06 -06:00
Matthias Hanel
29a6367889 incorporating review comments.
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-09-21 17:13:53 -04:00
Matthias Hanel
9911b37b0c [added] value to JS stats showing memory used from accounts with reservations
[fixed] reservations accounting issue on reload introduced by:
commit: bfb726e8e9
clearResources appeared to have been a workaround and broke
reload for non global accounts

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-09-21 16:35:24 -04:00
Matthias Hanel
5b9d20871d Exposing reserved memory in jsz/varz
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-09-16 18:55:50 -04:00
Ivan Kozlovic
a025ce7472 Set defaultServerOptions port to -1 for random
Updated some tests based on this change but also missing defer
connection close or server shutdown.

Fixed how the OCSP run go routine would shutdown, which would
never complete because grWG was not decremented by this go routine
prior to invoking s.Shutdown()

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-09-02 14:22:56 -06:00
Derek Collison
944dd248c4 Fix for tests
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-14 17:39:51 -07:00
Derek Collison
b3f9166b4f [FIXED] Getting varz from the http endoint saw Subscriptions always double for each fetch.
Resolves part of #2170

Signed-off-by: Derek Collison <derek@nats.io>
2021-05-03 18:43:07 -07:00
Matthias Hanel
4430a55eed [added] leaf deny exports/imports to varz monitoring (#2159)
* [added] leaf deny exports/imports to varz monitoring

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-04-26 16:34:09 -04:00
Jaime Piña
27e9628c3a Run gofmt -s to simplify code 2021-04-09 15:18:06 -07:00
Jaime Piña
d929ee1348 Check errors when removing test directories and files
Currently in tests, we have calls to os.Remove and os.RemoveAll where we
don't check the returned error. This hides useful error messages when
tests fail to run, such as "too many open files".

This change checks for more filesystem related errors and calls t.Fatal
if there is an error.
2021-04-07 11:09:47 -07:00
Derek Collison
43b9017b74 Merge pull request #1953 from nats-io/api
JetStream API Changes
2021-03-02 19:46:00 -07:00
Matthias Hanel
c50ee2a1c6 [Changed] all times exposed will be computed in UTC (#1943)
This also applies to times that end up in that json.
Where applicable moved time.Now() to where it is used.
Moved calls to .UTC() to where time is created it that time is converted
later anyway.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-03-02 21:37:42 -05:00
Derek Collison
4f7fbefc7c In clustered JetStream we need to move API calls out of routes/gateways/leafnodes path.
This moves from explicit imports and subscriptions to one wildcard subscription and a single wildcard export.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-02 17:54:41 -08:00
Matthias Hanel
10154c5388 [added] system_account to varz/accounts and is_system to accountz (#1898)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-02-08 15:58:53 -05:00
Derek Collison
c11a733502 Broken test for non MarshalIndent
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-07 05:08:22 -08:00
Matthias Hanel
7b7543d298 [added] jsz nats and http monitoring endpoint for jetstream (#1881)
The new endpoints are /jsz on http and "$SYS.REQ.SERVER.PING.JSZ" and "$SYS.REQ.SERVER.%s.JSZ".
$SYS.REQ.ACCOUNT.%s.JSZ will only return info for the particular account

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-02-05 18:46:04 -05:00
Matthias Hanel
f487429d9e incorporated comments
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-01-29 13:25:02 -05:00
Matthias Hanel
2a34f0daee [added] field to varz output containing the operator jwt/claim
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-01-29 12:32:40 -05:00
Matthias Hanel
c9e0eb6c3a [added] cluster/gateway/leafnode tls required/verify/timeout to varz (#1854)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-01-28 14:08:58 -05:00
Matthias Hanel
9081646109 [added] support for tags and filter ping monitoring requests by tags (#1832)
fixes #1588

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-01-21 21:16:09 -05:00
Ivan Kozlovic
f50c655e75 [FIXED] Monitoring endpoint connz?auth=true show incorrect user
Only the user (from username/password connection method) was reported
in this monitoring endpoint. Will now report proper nkey, public key,
etc..

Resolves #1799

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-11 12:59:05 -07:00
Ivan Kozlovic
df9d5f5fd9 Accepting route warns if remote server has same name
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-10-08 17:59:33 -06:00
Matthias Hanel
4ff7b280f4 Avoid unnecessary CONNS subscription
Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-10-05 18:25:51 -04:00
Matthias Hanel
d501a811b8 [Added] filtering by account to leafz and exposing this as per acc subj
On the monitoring endpoint /leafz specify ?acc=<account id>

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-09-24 17:23:36 -04:00
Matthias Hanel
634ce9f7c8 [Adding] Accountz monitoring endpoint and INFO monitoring req subject
Returned imports/exports are formated like jwt exports imports, even if
they originating account is from config.

Fixes #1604

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-09-23 16:22:48 -04:00
Matthias Hanel
d086a39b64 Add filtering by name and cluster to PING events
On cluster name change, reset internalSendLoop so it picks up the
changed name.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-06-16 18:26:35 -04:00
Derek Collison
146d8f5dcb Updates based on feedback, sped up some slow tests
Signed-off-by: Derek Collison <derek@nats.io>
2020-06-12 17:26:43 -07:00
Derek Collison
c1ffd48638 Add account details to subsz.
Also allow ability to filter based on account.

Signed-off-by: Derek Collison <derek@nats.io>
2020-06-05 05:53:01 -07:00
aricart
e7590f3065 jwt2 testbed 2020-06-01 18:00:13 -04:00
Derek Collison
05e38ae527 Merge branch 'master' into sys-acc 2020-06-01 11:53:14 -07:00
Derek Collison
2bd7553c71 System Account on by default.
Most of the changes are to turn it off for tests that were watching subscriptions and such.

Signed-off-by: Derek Collison <derek@nats.io>
2020-05-29 17:56:45 -07:00
Ivan Kozlovic
44e78a1fb6 Fixed some tests
- A race test may have consumed a lot of fds going in TIME_WAIT
that could cause some issues for other tests
- Missing defer filestore.Stop() that would leave flushLoop()
routines
- A defer for the from server in a LeafNode test
- Rework [Re]ConnectErrorReports that was failing often for me
locally (probably due to exhaustion of fds - too many TIME_WAIT).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-05-29 17:47:08 -06:00
Guilherme Santos
25858cba0b Implement basePath for monitoring endpoints 2020-05-13 23:29:11 +02:00
Matthias Hanel
136feb9bc6 [FIXEd] subsz monitoring endpoint did not account for accounts.
Fixes  #1371 and #1357 by adding up stats and collecting subscriptions
from all accounts.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-05-06 15:48:51 -04:00
Derek Collison
699502de8f Detection for loops with leafnodes.
We need to send the unique LDS subject to all leafnodes to properly detect setups like triangles.
This will have the server who completes the loop be the one that detects the error soley based on
its own loop detection subject.

Otehr changes are just to fix tests that were not waiting for the new LDS sub.

Signed-off-by: Derek Collison <derek@nats.io>
2020-04-08 20:00:40 -07:00
Matthias Hanel
30ba333663 Adding an option to include subscription details in monitoring responses.
Applies to routez and connz and closed connections.
Enable by specifying subs=detail

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-03-23 12:25:51 -04:00
Ivan Kozlovic
1b2754475b Refactor async client tests
Updated all tests that use "async" clients.
- start the writeLoop (this is in preparation for changes in the
  server that will not do send-in-place for some protocols, such
  as PING, etc..)
- Added missing defers in several tests
- fixed an issue in client.go where test was wrong possibly causing
  a panic.
- Had to skip a test for now since it would fail without server code
  change.

The next step will be ensure that all protocols are sent through
the writeLoop and that the data is properly flushed on close (important
for -ERR for instance).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-12 11:58:24 -07:00
Ivan Kozlovic
63138509f7 Tune some code/test for Windows
Running test suite on a Windows VM, I notice several failures.
Updated the compute of the RTT to be at least 1ns. I think that
this is just an issue with the VM I am running, but that change
will have no impact for normal situations (since setting the rtt
to the very minimum duration (1ns) instead of 0) and will prevent
some tests from failing.

Because of those same timer granularity issues, I had to add some
delays between some actions in order for time.Sub()/Since() to
actually report something more than 0.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-21 14:32:46 -07:00
Derek Collison
f0f807f99a After speaking with Ivan we are taking a better approach for initial RTT.
Ivan had the idea of using the CONNECT to establish a first estimate of RTT
without additional PING/PONGs.

Signed-off-by: Derek Collison <derek@nats.io>
2019-10-31 14:01:55 -07:00
Ivan Kozlovic
279cab2aaf [FIXED] Detect loop between LeafNode servers
This is achieved by subscribing to a unique subject. If the LS+
protocol is coming back for the same subject on the same account,
then this indicates a loop.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-29 16:14:35 -06:00
R.I.Pienaar
bcf96fa1de Allows a descriptive server_name to be set
This adds a new config option server_name that
when set will be exposed in varz, events and more
as a descriptive name for the server.

If unset though the server_name will default to the pk

Signed-off-by: R.I.Pienaar <rip@devco.net>
2019-10-17 18:51:19 +02:00
Ivan Kozlovic
cd4b8d3fad [ADDED] /leafz endpoint
Resolves #1061

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 12:00:24 -06:00
Ivan Kozlovic
cd9f898eb0 Made a server's helper to set first ping timer
Defaults to 1sec but will be opts.PingInterval if value is lower.
All non client connections invoked this function for the first
PING.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 10:21:43 -06:00
Ivan Kozlovic
90d592e163 Leaf and Route RTT
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).

For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 09:34:17 -06:00
Ivan Kozlovic
7ca8723942 [FIXED] Some Leafnode issues
- On startup, verify that local account in leafnode (if specified
  can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
  Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
  solicited Leafnode connection to not use user/password when
  trying an URL that was discovered through gossip. The server
  now saves the credentials of a configured URL to use with
  the discovered ones.

Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-23 14:08:07 -06:00
Ivan Kozlovic
89dd13f134 [ADDED] RTT in routez's route info
Added the RTT field to each route reported in routez.
Ensure that when a route is accepted, we send a PING to compute
the first RTT and don't have to wait for the ping timer to fire.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:16:07 -06:00
Derek Collison
10d4f1ab7a Convert leafnode solicited remotes to array
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-10 11:53:34 -07:00
Ivan Kozlovic
5478eaf01e Added /gatewayz endpoint
Such endpoint will list the gateway/cluster name, address and port
then list of outbound/inbound connections.
For each remote gateway there will be at most one outbound connection.
There can be 0 or more inbound connections for the same remote
gateway.

For each of these outbound/inbound connection, the connection info
similar to Connz is reported. Optionally, one can include the
interest mode/stats for each account.

Here are possible options:

* No specific options

http://host:port/gatewayz

* Limit to specific remote gateway, say name "B":

http://host:port/gatewayz/gw_name=B

* Include accounts (default limit to 1024 accounts)

http://host:port/gatewayz/accs=1

* Specific limit, say 200 (note accs=1 in this case is optional)

http://host:port/gatewayz/accs=1&accs_limit=200

* Specific account, say "acc_1". Note that accs=1 is not required then

http://host:port/gatewayz/acc_name=acc_1

* Above options can be mixed: specific remote gateway (B), with 100
  accounts reported

http://host:port/gatewayz/gw_name=B&accs_limit=200

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-28 12:41:09 -06:00
Ivan Kozlovic
ce1e6defab Fix flappers
- TestSystemAccountConnectionUpdatesStopAfterNoLocal: I believe that
  the check on number of notifications was wrong. Since we did not
  consume the ones for the connect, the expected count after the
  disconnect is 8 instead of 4.

- Possible fix GW tests complaining about number of outbound/inbound
  I think that it may be possible that connection does not succeed
  right away (remote to fully started, etc) and due to dial timeout
  and reconnect attempt delay, I suspect that when given a max time
  of 1sec to complete, it may not be enough.
  Quick change for now is to override to 2secs for now in the
  wait helpers. If that proves conclusive, we could remove the
  timeout given to these helpers.

- TestGatewaySendAllSubsBadProtocol: used a t.Fatalf() in checkFor
  instead of return fmt.Errorf().

- TestLeafNodeResetsMSGProto: this test is not about change to
  interest mode only, so to avoid possible mix of protos, delay
  a bit creation of gateway after creation of leaf node.

- Some defer s.Shutdown() were missing

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-26 17:17:08 -06:00