Commit Graph

245 Commits

Author SHA1 Message Date
Ivan Kozlovic
50c3986863 [FIXED] JetStream stream catchup issues
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 16:05:12 -06:00
Ivan Kozlovic
9e6f965913 [ADDED] LeafNode min_version new option
If set, a server configured to accept leafnode connections will
reject a remote server whose version is below that value. Note
that servers prior to v2.8.0 are not sending their version
in the CONNECT protocol, which means that anything below 2.8.0
would be rejected.

Configuration example:
```
leafnodes {
    port: 7422
    min_version: 2.8.0
}
```
The option is a string and can have the "v" prefix:
```
min_version: "v2.9.1"
```
Note that although suffix such as `-beta` would be accepted,
only the major, minor and update are used for the version comparison.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-06 18:40:33 -06:00
Ivan Kozlovic
14f54b8dd7 [ADDED] Monitoring: MQTT and Websocket blocks in /varz endpoint
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-04 10:11:55 -06:00
Ivan Kozlovic
34650e9dd5 Fixed data race and some flappers
Data race that has been seen:
```
Read at 0x00c00134bec0 by goroutine 159:
  github.com/nats-io/nats-server/v2/server.(*client).msgHeaderForRouteOrLeaf()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:2935 +0x254
  github.com/nats-io/nats-server/v2/server.(*client).processMsgResults()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4364 +0x2147
(...)
Previous write at 0x00c00134bec0 by goroutine 201:
  github.com/nats-io/nats-server/v2/server.(*Server).addRoute()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1475 +0xdb4
  github.com/nats-io/nats-server/v2/server.(*client).processRouteInfo()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:641 +0x1704
```

Also fixed some flappers and removed use of `s.js.` since we have
already captured `js` in Jsz monitoring.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-31 10:05:34 -06:00
R.I.Pienaar
4c4aa3e87f skips jsz on non js machines when leader only requested
This is a regression introduced in 055703f4fa
that leads to panics in management tooling

Signed-off-by: R.I.Pienaar <rip@devco.net>
2022-03-31 12:02:09 +02:00
Samuel Torres
9868bb71a7 Add logs to healthcheck handler
Kubernetes probes don't use nor log the reponse body of health
endpoints. This means that for some reason a nats node running in
Kubernetes becomes on a Not Ready state we won't have a way to know why
other than to manually access the cluster and call the /healthz endpoint
manually and see the error.

This change adds an error log so we can observe what is going wrong with
a nats node that is not ready.

Signed-off-by: Samuel Torres <samuel.torres@form3.tech>
2022-03-30 14:14:22 +01:00
Matthias Hanel
0c5f3688a7 [ADDED] Tiered limits and fix limit issues on updates (#2945)
* Adding tiered limits and fix limit issues on updates

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-28 20:47:54 -04:00
R.I.Pienaar
055703f4fa ensures the cluster info in jsz is sent from the leader only
The data from other nodes are usually wrong, this can be quite
confusing for users so we now only send it when we are the leader

Signed-off-by: R.I.Pienaar <rip@devco.net>
2022-03-25 18:27:35 +01:00
Ivan Kozlovic
a23b1b73ef Merge pull request #2931 from nats-io/ipq_changes
Changes to IPQueues
2022-03-17 19:13:02 -06:00
Ivan Kozlovic
c3da392832 Changes to IPQueues
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 17:53:06 -06:00
Derek Collison
fa098f1af0 Show version on main monitoring page with link to source
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 11:04:11 -07:00
Ivan Kozlovic
2c0f5046f1 Merge pull request #2923 from nats-io/gw_detect_duplicate_srv_name
[CHANGED] Gateway: Detect duplicate names between clusters
2022-03-17 10:57:08 -06:00
Derek Collison
287b567b1c Add consumer check to healthz and allow to be called directly
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-16 20:52:31 -07:00
Ivan Kozlovic
63c750e295 [CHANGED] Gateway: Detect duplicate names between clusters
Gateway connection will be closed and error reported if a remote
has a name that is a duplicate of the local cluster.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-15 15:00:13 -06:00
Matthias Hanel
d0c183106a Fixed lock inversion by not using account lock to get the name
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-07 21:22:41 -05:00
Derek Collison
037e3c6bbe Spiffied up monitoring landing page a bit
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-05 09:18:07 -08:00
Ivan Kozlovic
7f81f2d4c6 Merge pull request #2816 from nats-io/revocation-issue-442
Fix jwt based user/activation token revocation and granularity
2022-01-25 13:42:14 -07:00
Matthias Hanel
c5cc81bc1d Fix collectRevocations
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-01-25 15:02:47 -05:00
Matthias Hanel
274ec6db65 incorporate comments
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-01-25 14:53:03 -05:00
Matthias Hanel
fa12d096cd Fix jwt based user/activation token revocation and revocation granularity
user and activation token did not honor the jwt value for all * on
connect.

activation token where not re evaluated when the export revoked a key.
In part this is a consistency measure so servers that already have an
account and servers that don't behave the same way.

in jwt activation token revocations are stored per export.
The server stored them per account, thus effectively merging
revocations. Now they are stored per export inside the server too.

fixes nats-io/nsc/issues/442

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-01-25 13:48:12 -05:00
Derek Collison
6486cd8fc8 Added in /healthz endpoint for health and liveness probes in environments like k8s.
Currently this code returns a 200 and { "status": "ok" } iff all configured ports are open
and if JetStream is configured and we have contact with the metaleader and the cluster and all streams are up to date.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 19:30:10 -08:00
Matt Stephenson
59cc0f0015 Add source and mirror info to stream monitoring 2022-01-21 12:44:42 -08:00
Derek Collison
52da55c8c6 Implement overflow placement for JetStream streams.
This allows stream placement to overflow to adjacent clusters.
We also do more balanced placement based on resources (store or mem). We can continue to expand this as well.
We also introduce an account requirement that stream configs contain a MaxBytes value.

We now track account limits and server limits more distinctly, and do not reserver server resources based on account limits themselves.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-06 19:33:08 -08:00
Klaudiusz Fabryczny
b2b33110e2 FIX: Fix broken link to monitoring documentation 2021-12-30 14:04:12 +01:00
R.I.Pienaar
1146e66f30 fixes a nil panic on jsz
Appears what happens is that the getPublicConsumers()
is called which produces a list of consumers and that
between the time the list is made and the Info() is
called the ephemeral was removed.

Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-12-13 11:51:33 +01:00
Ivan Kozlovic
5fc9e0e1cc [FIXED] Gateway URLs gossip and /varz report issues
- When detecting duplicate route, it was possible that a server
would lose track of the peer's gateway URL, which would prevent
it from gossiping that URL to inbound gateway connections
- When a server has gateways enabled and has as a remote its
own gateway, the monitoring endpoint `/varz` would include it
but without the "urls" array.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-10-28 12:05:30 -06:00
dtest1
0937b848cd fix go doc: DenyRules 2021-10-11 23:19:56 +08:00
Derek Collison
8223275c44 On cold start in mixed mode if the js servers were not > non-js we could stall.
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-27 16:59:42 -07:00
Ivan Kozlovic
0411ba0c03 Changed ClientID to MQTTClient and client_id to mqtt_client
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-09-09 14:34:54 -06:00
Ivan Kozlovic
49024a0353 [ADDED] Monitoring: ClientID (for MQTT clients) on connection events
ClientID has been added to various monitoring objects. Also, added
the ability to filter connections on `client_id`.

On auth violation, the proper code was not invoked, which meant
that no disconnect event (with auth reason) would be published.

Resolves #2270

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-09-09 13:34:46 -06:00
Ivan Kozlovic
80ebf2d7b2 Add a comment to explain that we want to make a copy of the config
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-08-31 15:11:32 -06:00
Ivan Kozlovic
9f2e3d335b [FIXED] JetStream: possible deadlock due to lock inversion
The locking is jetStream->Server, not the otherway around. There
was few places where lock inversion could have caused deadlock.

Also, a change made recently to solve a deadlock was causing
a race that is demonstrated with TestJetStreamRaceOnRAFTCreate.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-08-30 16:16:56 -06:00
Matthias Hanel
41a253dabb fix daisy chained leaf node subject propagation issue. (#2468)
fixes #2448 

initLeafNodeSmapAndSendSubs did not pick up enough local subscriptions.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-08-25 18:10:09 -04:00
Matthias Hanel
7f1833ab1a Adding counter for number of failed logons due to pinned accounts
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-08-23 18:56:56 -04:00
Derek Collison
75ae7c6032 When an account asked for connz should be client and leaf connections only by default.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-15 11:04:23 -07:00
Derek Collison
9f50b66ad7 Break once we see interest
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-13 10:41:26 -07:00
Derek Collison
10167b1bcf Added in ability for normal accounts to access scoped connz info.
Added in client kind and sub type for clients.
Added in ability to filter connections based on matching subject interest.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-13 10:19:12 -07:00
Derek Collison
ceebc3ae07 When checking limits we would check total ask against the server limits if limits were not set.
We were also dynamically setting account limits based on a single server limit.

Signed-off-by: Derek Collison <derek@nats.io>
2021-06-12 10:27:43 -07:00
Matthias Hanel
2caf2303f2 [adding] jetstream info to statsz (#2269)
* [adding] jetstream info to statsz

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-06-10 11:54:56 -04:00
Derek Collison
6e17b7a303 Fix for #2213
We do not want to report consumers that were created for the purpose of sources or mirrors.

Signed-off-by: Derek Collison <derek@nats.io>
2021-05-12 07:51:53 -07:00
Derek Collison
b3f9166b4f [FIXED] Getting varz from the http endoint saw Subscriptions always double for each fetch.
Resolves part of #2170

Signed-off-by: Derek Collison <derek@nats.io>
2021-05-03 18:43:07 -07:00
Matthias Hanel
4430a55eed [added] leaf deny exports/imports to varz monitoring (#2159)
* [added] leaf deny exports/imports to varz monitoring

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-04-26 16:34:09 -04:00
R.I.Pienaar
e24e54c5a3 ensure varz subscriptions consider all accounts
Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-04-07 11:52:09 +02:00
Derek Collison
61771e88f8 In operator mode with JetStream we want to load accounts that have stable storage.
Also if an account was registered but not JetStream enabled, update it vs error.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-20 06:53:13 -07:00
Derek Collison
8eefff2b3b Make sure the jetstream accounts use the name as the key to the map.
This prevents possible double adds under reload or restart scenarios.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-18 17:29:26 -07:00
Matthias Hanel
2a2adb76bc Suppress varz jetstream output if not enabled
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-03-15 16:03:41 -04:00
Derek Collison
e84f845afd Avoid lock inversions.
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-14 18:17:53 -07:00
Derek Collison
6241ef2d41 fix deadlock
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-13 18:04:46 -05:00
Derek Collison
43b9017b74 Merge pull request #1953 from nats-io/api
JetStream API Changes
2021-03-02 19:46:00 -07:00
Matthias Hanel
25ef6b0f0d Merge pull request #1952 from nats-io/goland-lint
Fixed linter issues
2021-03-02 21:43:04 -05:00