nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-14 10:10:42 -07:00

Author	SHA1	Message	Date
Derek Collison	9e9a9a082b	When restoring a filestore with no key generator but it was encrypted, fail to restore. Signed-off-by: Derek Collison <derek@nats.io>	2023-07-11 16:27:50 -07:00
Derek Collison	2e2ac33920	[IMPROVED] When R1 consumers were recreated with the same name when they became inactive. (#4216 ) When consumers were R1 and the same name was reused, server restarts could try to cleanup old ones and effect the new ones. These changes allow consumer name reuse more effectively during server restarts. Signed-off-by: Derek Collison <derek@nats.io>	2023-06-05 14:04:53 -07:00
Derek Collison	df5df3ce99	Panic fixes (#4214 ) - [ ] Link to issue, e.g. `Resolves #NNN` - [ ] Documentation added (if applicable) - [ ] Tests added - [ ] Branch rebased on top of current main (`git pull --rebase origin main`) - [ ] Changes squashed to a single commit (described [here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)) - [x] Build is green in Travis CI - [x] You have certified that the contribution is your original work and that you license the work to the project under the [Apache 2 license](https://github.com/nats-io/nats-server/blob/main/LICENSE) Resolves panics in the code. ### Changes proposed in this pull request: - This PR fixes some of the panics in the code	2023-06-05 13:02:05 -07:00
Derek Collison	4ac45ff6f3	When consumers were R1 and the same name was reused, server restarts could try to cleanup old ones and effect the new ones. These changes allow consumer name reuse more effectively during server restarts. Signed-off-by: Derek Collison <derek@nats.io>	2023-06-05 12:48:18 -07:00
Derek Collison	dee532495d	Make sure to process extended purge operations correctly when being replayed on a restart. Signed-off-by: Derek Collison <derek@nats.io>	2023-06-03 17:49:45 -07:00
Derek Collison	238282d974	Fix some data races detected in internal testing Signed-off-by: Derek Collison <derek@nats.io>	2023-06-03 13:58:15 -07:00
Artem Seleznev	27a8b96ee3	different panic fixes Signed-off-by: Artem Seleznev <seleznyov.artyom@gmail.com>	2023-06-02 13:19:22 +03:00
Derek Collison	7e3f3f4908	Make health checks more consistent with stream health checks. Check for closed state on leader change for consumers. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-18 08:18:53 -07:00
Derek Collison	a8d7d3886e	Make sure to delete the stream assignment node here Signed-off-by: Derek Collison <derek@nats.io>	2023-05-17 16:19:39 -07:00
Derek Collison	f3553791b1	Updates to stream reset logic. 1. When catching up do not try forever and if needed reset cluster state. 2. In checking if a stream is healthy check for node drift. 3. When restarting a stream make sure the current node is stopped. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-17 13:14:33 -07:00
Derek Collison	a06e1c9b43	Make sure to also stop nodes when dealing with consumer after stream restart Signed-off-by: Derek Collison <derek@nats.io>	2023-05-16 13:16:47 -07:00
Derek Collison	3752a6c500	Make sure to stop the node on a consumer restart if still running Signed-off-by: Derek Collison <derek@nats.io>	2023-05-16 12:49:46 -07:00
Derek Collison	b0340ce598	Make sure to wait properly until we believe we are caught up to enable direct gets. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-16 11:02:06 -07:00
Derek Collison	5e029d08d5	For older R1 streams created by previous servers we could have no cluster for the stream assignment group which would prevent scale up with newer servers. This will inherit cluster if detected from placement tags or client cluster designation. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-10 17:59:28 -07:00
Derek Collison	b44beb4b54	Make sure to update peer set and remove old peers after new leader takes over Signed-off-by: Derek Collison <derek@nats.io>	2023-05-09 15:15:02 -07:00
Ivan Kozlovic	840c264f45	Cleanup use of s.opts and fixed some lock (deadlock/inversion) issues One should not access s.opts directly but instead use s.getOpts(). Also, server lock needs to be released when performing an account lookup (since this may result in server lock being acquired). A function was calling s.LookupAccount under the client lock, which technically creates a lock inversion situation. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-05-03 14:09:02 -06:00
Derek Collison	b27ce6de80	Add in a few more places to check on jetstream shutting down. Add in a helper method. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 11:27:18 -07:00
Derek Collison	4eb4e5496b	Do health check on startup once we have processed existing state. Also do health checks in separate go routine. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 09:36:35 -07:00
Derek Collison	fac5658966	If we fail to create a consumer, make sure to clean up any raft nodes in meta layer and to shutdown the consumer if created but we encountered an error. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 08:15:33 -07:00
Derek Collison	546dd0c9ab	Make sure we can recover an underlying node being stopped. Do not return healthy if the node is closed, and wait a bit longer for forward progress. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 07:42:23 -07:00
Derek Collison	85f6bfb2ac	Check healthz periodically Signed-off-by: Derek Collison <derek@nats.io>	2023-04-28 17:58:45 -07:00
Derek Collison	d107ba3549	Under certain scenarios we have witnessed healthz() that never retrun healthy due to a stream or consumer being missing or stopped. This will now allow the healthy call to attempt to restart those assets. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-28 17:11:08 -07:00
Neil Twigg	e30ea34625	Add op type to `panic`s Signed-off-by: Neil Twigg <neil@nats.io>	2023-04-27 11:38:52 +01:00
Derek Collison	83293f86ff	Reduce threshold for compressing messages during a catchup Signed-off-by: Derek Collison <derek@nats.io>	2023-04-25 19:01:06 -07:00
Derek Collison	3c964a12d7	Migration could be delayed due to transferring leadership while the new leader was still paused. Also check quicker but slow down if the state we need to have is not there yet. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-25 18:58:49 -07:00
Derek Collison	f6195a5ee3	A stream could have a complicated state with interior deletes. This is a simpler way to determine if we need to consider a snapshot that involves much less time and CPU and memory. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-18 19:11:49 -07:00
Derek Collison	093564f7e0	The meta layer should snapshot if any oustanding entries are present, regardless of hash. Fixes this test [TestJetStreamClusterDeleteAndRestoreAndRestart] which would flap since it would not snapshot since hash was same but had entries that would erase stream data. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-13 20:37:00 -07:00
Derek Collison	cc77d662bb	Make sure to process consumer entries on recovery in case state was not committed. Also sync other consumers when taking over as leader but no need to process snapshots when we are in fact the leader. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-13 18:40:17 -07:00
Derek Collison	2f4677d29e	Delay a bit longer if we are not the actual leader, helpful for very large stream reports to avoid possible dupes Signed-off-by: Derek Collison <derek@nats.io>	2023-04-12 12:36:47 -07:00
Derek Collison	3b9cf1e381	Needed to do more in separate go routine to avoid deadlock Signed-off-by: Derek Collison <derek@nats.io>	2023-04-08 18:43:58 -07:00
Derek Collison	35bb7c1737	Pool CommittedEntries as well with a ReturnToPool() that will also recycle the Entry. Needs to integrate with upper layers Signed-off-by: Derek Collison <derek@nats.io>	2023-04-08 11:34:10 -07:00
Derek Collison	d02d59534f	Fix data race Signed-off-by: Derek Collison <derek@nats.io>	2023-04-07 07:18:30 -07:00
Derek Collison	c16915bff4	For checking the health of jetstream, do not hold the lock as we traverse the streams and consumers. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-06 11:56:55 -07:00
Neil Twigg	03a5a4deaf	Possibly de-race `sysRequest` Signed-off-by: Neil Twigg <neil@nats.io>	2023-04-04 10:30:59 +01:00
Derek Collison	b0c3cf0dbd	Only apply consumer entries if not recovering Signed-off-by: Derek Collison <derek@nats.io>	2023-04-03 17:22:50 -07:00
Derek Collison	59175c491f	Fix for a datarace Signed-off-by: Derek Collison <derek@nats.io>	2023-04-03 14:46:57 -07:00
Derek Collison	9dd727034a	Make sure to not stop raft layer when we detect we are already running the monitor Signed-off-by: Derek Collison <derek@nats.io>	2023-04-03 14:46:47 -07:00
Derek Collison	ff3f102cdd	Fix for datarace in healthcheck Signed-off-by: Derek Collison <derek@nats.io>	2023-04-02 16:30:13 -07:00
Derek Collison	e6447c982a	Protect against concurrent creation of streams and consumers. Also make sure we have exited monotoring routines when doing resets for both streams and consumers. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-02 14:29:52 -07:00
Derek Collison	58ca525b3b	Process replicated ack regardless of store update. Delay but still stepdown Signed-off-by: Derek Collison <derek@nats.io>	2023-04-02 03:53:16 -07:00
Derek Collison	a8bd2793d5	Fix concurrent map bug on preAcks. Use monitor check for streams like consumers. Make sure to stop raft layer if exiting monitorConsumer early. Allow consumers to force a snapshot on leadership change. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-02 03:53:11 -07:00
Derek Collison	ad5bb366a0	Updates to preacks when multiple consumers are present but mutually exlusive (filtered). Signed-off-by: Derek Collison <derek@nats.io>	2023-03-31 10:43:28 -07:00
Derek Collison	937ef0d2a6	Improvements to preAcks. Better handling of multiple consumers so as to not delete too early. Signed-off-by: Derek Collison <derek@nats.io>	2023-03-30 20:29:15 -07:00
Derek Collison	ade0e9d295	Snapshot meta for this function to use in case it gets removed out from underneath of us. Signed-off-by: Derek Collison <derek@nats.io>	2023-03-29 16:51:17 -07:00
Derek Collison	c77872b519	Update server/jetstream_cluster.go Pre-allocate Co-authored-by: Neil <neil@nats.io>	2023-03-29 15:29:38 -07:00
Derek Collison	2b89fea9b0	Double check here if the jetstream cluster was shutdown when we released the lock Signed-off-by: Derek Collison <derek@nats.io>	2023-03-29 14:46:49 -07:00
Derek Collison	6c3e64b83b	Always make sure cluster and meta raft node available when needed Signed-off-by: Derek Collison <derek@nats.io>	2023-03-29 13:56:04 -07:00
Derek Collison	71af150448	General improvements to interest based stream processing when acks arrive before the actual msgs. 1. If we are retention based, make sure our consumers are running before entering into monitorStream logic. 2. If we skip messages and are interest based, make sure we check for a preAck state. 3. On finalization of recovery for consumers have them check against the interest based stream. 4. Do not process ack state updates if consumer is closed and shutting down. 5. When processing final state for a stream after upper layer catchup, check all attached consumers for ack skew. 6. During catchup of stream messages consult preAck state and skip messages as needed. Signed-off-by: Derek Collison <derek@nats.io>	2023-03-29 12:43:53 -07:00
Derek Collison	ed9de4b0a1	Improved publisher performance under some instances of asymmetric network latency clusters on interest based streams. Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself. In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node. Signed-off-by: Derek Collison <derek@nats.io>	2023-03-20 20:53:45 -07:00
Derek Collison	5a16f98427	Fixed an off by one bug that under certain circumstances could cause large consumer replica states. This could lead to instability in the system. The bug would manifest in replicated consumers when certain messages could be acked out of order, and, the pending list would never go to zero. Signed-off-by: Derek Collison <derek@nats.io>	2023-03-19 10:41:59 -07:00

1 2 3 4 5 ...

465 Commits