nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-02 03:38:42 -07:00

Author	SHA1	Message	Date
Waldemar Quevedo	286a1632ca	Use monotonic time for measuring time internally Signed-off-by: Waldemar Quevedo <wally@nats.io>	2023-05-12 08:27:46 -07:00
Derek Collison	d107ba3549	Under certain scenarios we have witnessed healthz() that never retrun healthy due to a stream or consumer being missing or stopped. This will now allow the healthy call to attempt to restart those assets. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-28 17:11:08 -07:00
Derek Collison	cae91b8cad	In single server mode healthz could mistake a snapshot staging directory during a restore as an account. If the restore took a long time, stalled, or was aborted, would cause healthz to fail. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-24 22:14:04 -07:00
Waldemar Quevedo	d12152c48f	Add server name / remote server name to routez Signed-off-by: Waldemar Quevedo <wally@nats.io>	2023-04-14 12:47:00 -07:00
Derek Collison	c16915bff4	For checking the health of jetstream, do not hold the lock as we traverse the streams and consumers. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-06 11:56:55 -07:00
Derek Collison	59175c491f	Fix for a datarace Signed-off-by: Derek Collison <derek@nats.io>	2023-04-03 14:46:57 -07:00
Derek Collison	ff3f102cdd	Fix for datarace in healthcheck Signed-off-by: Derek Collison <derek@nats.io>	2023-04-02 16:30:13 -07:00
Derek Collison	4b8229ee42	Do not hold js lock for health check, use healthy not current for meta Signed-off-by: Derek Collison <derek@nats.io>	2023-04-02 03:52:54 -07:00
Derek Collison	027f2e42c8	Remove snapshot of cores and maxprocs Signed-off-by: Derek Collison <derek@nats.io>	2023-03-17 15:09:50 -07:00
Jeremy Saenz	26f241cb62	Updated LEAFZ names to use remoteServer name/id and added is_spoke	2023-02-28 18:09:24 -08:00
Jeremy Saenz	9d4a603aaf	Update LEAFZ to include leafnode server/connection name	2023-02-28 14:20:18 -08:00
Waldemar Quevedo	74b703549d	Add raft query parameter to /jsz to include raft group info Signed-off-by: Waldemar Quevedo <wally@nats.io>	2023-02-27 05:42:11 -08:00
Neil Twigg	68961ffedd	Refactor `ipQueue` to use generics, reduce allocations	2023-02-21 14:50:09 +00:00
Neil Twigg	83932b4be6	Don't mark a clustered stream as unhealthy if making forward progress, add `TestJetStreamClusterCurrentVsHealth`	2023-01-26 16:57:34 +00:00
Derek Collison	2aeb5e2c5a	Update snapshots to numCores and maxProcs after maxrocs.Set() Signed-off-by: Derek Collison <derek@nats.io>	2023-01-20 11:30:43 -08:00
Derek Collison	713f632fa7	If a stream's meta was not properly written but the file existed, we could re-add the stream but a subsequent restart would lose the stream again. Also added in healthz for single server systems to make sure all stream directories resulted in recovered streams. Signed-off-by: Derek Collison <derek@nats.io>	2022-12-29 20:08:56 -08:00
Byron Ruth	1477b675ff	Add back existing HealthzOptions.JSEnabled field This fixes a backwards incompat change for library usage as well as using the healthz NATS API which depends on the JSON payload. Signed-off-by: Byron Ruth <byron@nats.io>	2022-12-26 08:45:22 -05:00
Byron Ruth	566d1adfa7	Fix /healthz?js-enabled=true behavior When js-enabled is set to true, the condition was only checked if the `getJetStream()` call returned `nil`. However, if it non-nil, all remaining checks were executed, including assessing the health of the assets (streams and consumers). This change addresses two issues: - Switch to use `js.isEnabled()` which will check whether the value is nil OR `js.disabled = true` which can occur if the subsystem is temporarily disabled (insufficient resources). - Correctly exit the check after the assertion and before meta and asset checks are performed. In addition, the option has been renamed to `js-enabled-only` to align with the `js-server-only` naming. The previous `js-enabled` name still works, but is mapped to this new option. A warning is emitted noting the previous option is deprecated. Fix #3703 Signed-off-by: Byron Ruth <b@devel.io>	2022-12-10 07:34:32 -05:00
Raymond	4d8964e57b	Added stream created timestamp to stream detail	2022-11-17 13:59:58 +01:00
Ivan Kozlovic	170ff49837	[ADDED] JetStream: peer (the hash of server name) in statsz/jsz A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something like this: ``` ... "meta_cluster": { "name": "local", "leader": "A", "peer": "NUmM6cRx", "replicas": [ { "name": "B", "current": true, "active": 690369000, "peer": "b2oh2L6w" }, { "name": "Server name unknown at this time (peerID: jZ6RvVRH)", "current": false, "offline": true, "active": 0, "peer": "jZ6RvVRH" } ], "cluster_size": 3 } ``` Note the "peer" field following the "leader" field that contains the server name. The new field is the node ID, which is a hash of the server name. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-09-16 15:31:37 -06:00
Waldemar Quevedo	46d73eddae	js: add per account reserved mem/store bytes Signed-off-by: Waldemar Quevedo <wally@nats.io>	2022-09-06 16:43:10 -07:00
Ivan Kozlovic	03ac1f256f	Update based on code review - Change finger_prints to cert_sha256 and use hex.EncodeToString - Add spki_sha256 for RawSubjectPublicKeyInfo with hex.EncodeToString Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-24 14:16:37 -06:00
Ivan Kozlovic	d2784589a0	Change json tag name to finger_prints Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-22 12:40:20 -06:00
Ivan Kozlovic	951b7c38f6	[ADDED] Monitoring: TLS Peer Certificates in Connz when auth is on Add basic peer certificates information in /connz endpoint when the "auth" option is provided. Resolves #3317 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-22 11:48:49 -06:00
Ivan Kozlovic	5d3ee8ebf4	[FIXED] Gateway: possible panic if monitor endpoint inspected too soon The monitoring http server is started early and the gateway setup (when configured) may not be fully ready when the `/gatewayz` endpoint is inspected and could cause a panic. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-17 13:30:58 -06:00
Ivan Kozlovic	a4bf4e87f6	Merge pull request #3326 from mfaizanse/health_endpoint_params Added param options to /healthz endpoint	2022-08-09 08:49:22 -06:00
Muhammad Faizan	1634f33de7	Added param options to /healthz endpoint	2022-08-09 08:32:54 +02:00
Derek Collison	2120be6476	nit: Cap stats Signed-off-by: Derek Collison <derek@nats.io>	2022-08-05 07:52:23 -07:00
Matthias Hanel	d53d2d0484	[Added] account specific monitoring endpoint(s) (#3250 ) Added http monitoring endpoint /accstatz It responds with a list of statz for all accounts with local connections the argument "unused=1" can be provided to get statz for all accounts This endpoint is also exposed as nats request under: This monitoring endpoint is exposed via the system account. $SYS.REQ.ACCOUNT..STATZ Each server will respond with connection statistics for the requested account. The format of the data section is a list (size 1) identical to the event $SYS.ACCOUNT.%s.SERVER.CONNS which is sent periodically as well as on connect/disconnect. Unless requested by options, server without the account, or server where the account has no local connections, will not respond. A PING endpoint exists as well. The response format is identical to $SYS.REQ.ACCOUNT..STATZ (however the data section will contain more than one account, if they exist) In addition to general filter options the request takes a list of accounts and an argument to include accounts without local connections (disabled by default) $SYS.REQ.ACCOUNT.PING.STATZ Each account has a new system account import where the local subject $SYS.REQ.ACCOUNT.PING.STATZ essentially responds as if the importing account name was used for $SYS.REQ.ACCOUNT..STATZ The only difference between requesting ACCOUNT.PING.STATZ from within the system account and an account is that the later can only retrieve statz for the account the client requests from. Also exposed the monitoring /healthz via the system account under $SYS.REQ.SERVER..HEALTHZ $SYS.REQ.SERVER.PING.HEALTHZ No dedicated options are available for these. HEALTHZ also accept general filter options. Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-07-12 21:50:32 +02:00
Derek Collison	e6479dafd2	Close leafnode connection when same cluster name detected Signed-off-by: Derek Collison <derek@nats.io>	2022-06-30 15:34:22 -07:00
Ivan Kozlovic	5261d98781	[ADDED] Monitoring: Routez's individual route has now more info Added Start, LastActivity, Uptime and Idle that we normally have in a Connz for non route connections. This info can be useful to determine if a route is recent, etc.. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-05-18 13:18:53 -06:00
Ivan Kozlovic	5050092468	[FIXED] JetStream: possible lock inversion When updating usage, there is a lock inversion in that the jetStream lock was acquired while under the stream's (mset) lock, which is not correct. Also, updateUsage was locking the jsAccount lock, which again, is not really correct since jsAccount contains streams, so it should be jsAccount->stream, not the other way around. Removed the locking of jetStream to check for clustered state since js.clustered is immutable. Replaced using jsAccount lock to update usage with a dedicated lock. Originally moved all the update/limit fields in jsAccount to new structure to make sure that I would see all code that is updating or reading those fields, and also all functions so that I could make sure that I use the new lock when calling these. Once that works was done, and to reduce code changes, I put the fields back into jsAccount (although I grouped them under the new usageMu mutex field). Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-05-02 09:50:32 -06:00
Derek Collison	f702e279ab	Fix for a consumer recovery issue. Also update healthz to check all assets that are assigned, not just running. Signed-off-by: Derek Collison <derek@nats.io>	2022-04-26 19:22:19 -07:00
Ivan Kozlovic	50c3986863	[FIXED] JetStream stream catchup issues - A stream could become leader when it should not, causing messages to be lost. - A catchup could stall because the server sending data could bail out of the runCatchup routine but still send the EOF signal. - Deadlock with monitoring of Jsz Signed-off-by: Ivan Kozlovic <ivan@synadia.com> Signed-off-by: Derek Collison <derek@nats.io> Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-12 16:05:12 -06:00
Ivan Kozlovic	9e6f965913	[ADDED] LeafNode `min_version` new option If set, a server configured to accept leafnode connections will reject a remote server whose version is below that value. Note that servers prior to v2.8.0 are not sending their version in the CONNECT protocol, which means that anything below 2.8.0 would be rejected. Configuration example: ``` leafnodes { port: 7422 min_version: 2.8.0 } ``` The option is a string and can have the "v" prefix: ``` min_version: "v2.9.1" ``` Note that although suffix such as `-beta` would be accepted, only the major, minor and update are used for the version comparison. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-06 18:40:33 -06:00
Ivan Kozlovic	14f54b8dd7	[ADDED] Monitoring: MQTT and Websocket blocks in `/varz` endpoint Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-04 10:11:55 -06:00
Ivan Kozlovic	34650e9dd5	Fixed data race and some flappers Data race that has been seen: ``` Read at 0x00c00134bec0 by goroutine 159: github.com/nats-io/nats-server/v2/server.(client).msgHeaderForRouteOrLeaf() /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:2935 +0x254 github.com/nats-io/nats-server/v2/server.(client).processMsgResults() /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4364 +0x2147 (...) Previous write at 0x00c00134bec0 by goroutine 201: github.com/nats-io/nats-server/v2/server.(Server).addRoute() /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1475 +0xdb4 github.com/nats-io/nats-server/v2/server.(client).processRouteInfo() /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:641 +0x1704 ``` Also fixed some flappers and removed use of `s.js.` since we have already captured `js` in Jsz monitoring. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-31 10:05:34 -06:00
R.I.Pienaar	4c4aa3e87f	skips jsz on non js machines when leader only requested This is a regression introduced in `055703f4fa` that leads to panics in management tooling Signed-off-by: R.I.Pienaar <rip@devco.net>	2022-03-31 12:02:09 +02:00
Samuel Torres	9868bb71a7	Add logs to healthcheck handler Kubernetes probes don't use nor log the reponse body of health endpoints. This means that for some reason a nats node running in Kubernetes becomes on a Not Ready state we won't have a way to know why other than to manually access the cluster and call the /healthz endpoint manually and see the error. This change adds an error log so we can observe what is going wrong with a nats node that is not ready. Signed-off-by: Samuel Torres <samuel.torres@form3.tech>	2022-03-30 14:14:22 +01:00
Matthias Hanel	0c5f3688a7	[ADDED] Tiered limits and fix limit issues on updates (#2945 ) * Adding tiered limits and fix limit issues on updates Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-03-28 20:47:54 -04:00
R.I.Pienaar	055703f4fa	ensures the cluster info in jsz is sent from the leader only The data from other nodes are usually wrong, this can be quite confusing for users so we now only send it when we are the leader Signed-off-by: R.I.Pienaar <rip@devco.net>	2022-03-25 18:27:35 +01:00
Ivan Kozlovic	a23b1b73ef	Merge pull request #2931 from nats-io/ipq_changes Changes to IPQueues	2022-03-17 19:13:02 -06:00
Ivan Kozlovic	c3da392832	Changes to IPQueues Removed the warnings, instead have a sync.Map where they are registered/unregistered and can be inspected with an undocumented monitor page. Added the notion of "in progress" which is the number of messages that have beend pop()'ed. When recycle() is invoked this count goes down. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-17 17:53:06 -06:00
Derek Collison	fa098f1af0	Show version on main monitoring page with link to source Signed-off-by: Derek Collison <derek@nats.io>	2022-03-17 11:04:11 -07:00
Ivan Kozlovic	2c0f5046f1	Merge pull request #2923 from nats-io/gw_detect_duplicate_srv_name [CHANGED] Gateway: Detect duplicate names between clusters	2022-03-17 10:57:08 -06:00
Derek Collison	287b567b1c	Add consumer check to healthz and allow to be called directly Signed-off-by: Derek Collison <derek@nats.io>	2022-03-16 20:52:31 -07:00
Ivan Kozlovic	63c750e295	[CHANGED] Gateway: Detect duplicate names between clusters Gateway connection will be closed and error reported if a remote has a name that is a duplicate of the local cluster. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-15 15:00:13 -06:00
Matthias Hanel	d0c183106a	Fixed lock inversion by not using account lock to get the name Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-03-07 21:22:41 -05:00
Derek Collison	037e3c6bbe	Spiffied up monitoring landing page a bit Signed-off-by: Derek Collison <derek@nats.io>	2022-03-05 09:18:07 -08:00
Ivan Kozlovic	7f81f2d4c6	Merge pull request #2816 from nats-io/revocation-issue-442 Fix jwt based user/activation token revocation and granularity	2022-01-25 13:42:14 -07:00

1 2 3 4 5 ...

278 Commits