nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-17 11:24:44 -07:00

Author	SHA1	Message	Date
Derek Collison	7e38ebcb6e	Allow assets such as streams and their associated consumers to migrate between clusters. The system will allow an update to a stream, and subsequently all attached consumers, to be placed in another cluster either directly or via tag placement. The meta layer will scale the underlying peerset appropriately to straddle the two clusters for both the stream and consumers, taking into account the consumer type. Control will then pass to the current leaders of the assets who will monitor the catchup status of the new peers. (Note we can optimize this later to only traverse once across a GW for any given asset, but for now this is simpler) Once the original leaders have determined the assets are synched it will pass leadership to a member of the new peerset. Once the new leader has been elected, it will forward a request for the meta layer to shrink the peerset by removing the old peers. Signed-off-by: Derek Collison <derek@nats.io>	2022-04-04 18:28:36 -07:00
Derek Collison	6b379329d8	Fix for #2955 . When scaling up a stream with existing messages the existing messages were not being replicated. Also fixed a bug where we were incorrectly not spining up the monitoring loop for a stream when going from 3->1->3. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-26 07:26:46 -07:00
Derek Collison	ef8f543ea5	Improve memory usage through JetStream storage layer. Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies. Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them. The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally. Also fixed some flappers and a bug where de-dupe might not be reformed correctly. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-24 17:45:15 -06:00
Derek Collison	d7e1e5ae61	Make sure that we do not become a candidate/leader too soon or if we are not caughtup. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-24 17:45:15 -06:00
Ivan Kozlovic	c3da392832	Changes to IPQueues Removed the warnings, instead have a sync.Map where they are registered/unregistered and can be inspected with an undocumented monitor page. Added the notion of "in progress" which is the number of messages that have beend pop()'ed. When recycle() is invoked this count goes down. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-17 17:53:06 -06:00
Ivan Kozlovic	b4128693ed	Ensure file path is correct during stream restore Also had to change all references from `path.` to `filepath.` when dealing with files, so that it works properly on Windows. Fixed also lots of tests to defer the shutdown of the server after the removal of the storage, and fixed some config files directories to use the single quote `'` to surround the file path, again to work on Windows. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-09 13:31:51 -07:00
Ivan Kozlovic	b7917e1d05	Various changes - Limit IPQueue logging - Add a rand per raft node. This is because in some situations when running 3 servers at the same time, we would end-up with identical inboxes for different subjects on the different nodes which would cause panics - Move the creation of internal subscriptions after the tracking of the node and its peers. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-02-22 16:22:32 -07:00
Derek Collison	fb15dfd9b7	Allow replica updates during stream update. Also add in HAAssets count to Jsz. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-13 19:33:46 -08:00
Ivan Kozlovic	29c40c874c	Adding logger for IPQueue Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-01-13 13:14:00 -07:00
Ivan Kozlovic	48fd559bfc	Reworked RAFT's leader change channel Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-01-13 13:12:11 -07:00
Ivan Kozlovic	fc7a4047a5	Renamed variables, removing the "c" that indicated it was a channel	2022-01-13 13:11:05 -07:00
Ivan Kozlovic	62a07adeb9	Replaced catchup and stream restore channels Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-01-13 13:09:49 -07:00
Ivan Kozlovic	645a9a14b7	Replaced RAFT's stepdown channel Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-01-13 13:09:01 -07:00
Ivan Kozlovic	2ad95f7e52	Replaced RAFT's vote request and response channels Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-01-13 13:08:05 -07:00
Ivan Kozlovic	d74dba2df9	Replaced RAFT's append entry response channel Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-01-13 13:06:48 -07:00
Ivan Kozlovic	b5979294db	Replaced RAFT's append entry channel Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-01-13 13:06:29 -07:00
Ivan Kozlovic	ceb06d6a13	Replaced RAFT's apply channel Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-01-13 13:06:10 -07:00
Ivan Kozlovic	ffe50d8573	[IMPROVED] JetStream clustering with lots of streams/consumers Some operations could cause the route to block due to lock being held during store operations. On macOS, having lots of streams/consumers and restarting the cluster would cause lots of concurrent IO that would cause lock to be held for too long, causing head-of-line blocking in processing of messages from a route. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-01-12 20:37:00 -07:00
Ivan Kozlovic	299b6b53eb	[FIXED] JetStream: stream blocked recovering snapshot If a node falled behind, when catching up with the rest of the cluster, it is possible that a lot of append entries accumulate and the server would print warnings such as: ``` [WRN] RAFT [jZ6RvVRH - S-R3F-CQw2ImK6] <some number> append entries pending ``` It would then continously print the following warning: ``` AppendEntry failed to be placed on internal channel ``` When that happens, this node would always be shown with be running the same number of operations behind (using `nats s info`) if there are no new messages added to the stream, or an increasing number of operations if there is still activity. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-12-20 11:41:34 -07:00
Matthias Hanel	3e8b66286d	Js leaf deny (#2693 ) Along a leaf node connection, unless the system account is shared AND the JetStream domain name is identical, the default JetStream traffic (without a domain set) will be denied. As a consequence, all clients that wants to access a domain that is not the one in the server they are connected to, a domain name must be specified. Affected from this change are setups where: a leaf node had no local JetStream OR the server the leaf node connected to had no local JetStream. One of the two accounts that are connected via a leaf node remote, must have no JetStream enabled. The side that does not have JetStream enabled, will loose JetStream access and it's clients must set `nats.Domain` manually. For workarounds on how to restore the old behavior, look at: https://github.com/nats-io/nats-server/pull/2693#issuecomment-996212582 New config values added: `default_js_domain` is a mapping from account to domain, settable when JetStream is not enabled in an account. `extension_hint` are hints for non clustered server to start in clustered mode (and be usable to extend) `js_domain` is a way to set the JetStream domain to use for mqtt. Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-12-16 16:53:20 -05:00
Ivan Kozlovic	9f30bf00e0	[FIXED] Corrupted headers receiving from consumer with meta-only When a consumer is configured with "meta-only" option, and the stream was backed by a memory store, a memory corruption could happen causing the application to receive corrupted headers. Also replaced most of usage of `append(a[:0:0], a...)` to make copies. This was based on this wiki: https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F But since Go 1.15, it is actually faster to call make+copy instead. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-12-01 10:50:15 -07:00
Derek Collison	1af3ab1b4e	Fix for #2666 When encountering errors for sequence mismatches that were benign we were returning an error and not processing the rest of the entries. This would lead to more severe sequence mismatches later on that would cause stream resets. Also added code to deal with server restarts and the clfs fixup states which should have been reset properly. Signed-off-by: Derek Collison <derek@nats.io>	2021-11-02 14:38:22 -07:00
Derek Collison	bbffd71c4a	Improvements to meta raft layer around snapshots and recovery. Signed-off-by: Derek Collison <derek@nats.io>	2021-10-12 05:53:52 -07:00
Derek Collison	1a4410a3f7	Added more robust checking for decoding append entries. Allow a buffer to be passed in to relive GC pressure. Signed-off-by: Derek Collison <derek@nats.io>	2021-10-09 09:37:03 -07:00
Derek Collison	8223275c44	On cold start in mixed mode if the js servers were not > non-js we could stall. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-27 16:59:42 -07:00
Derek Collison	63c242843c	Avoid panic if WAL was truncated out from underneath of us. If we were leader stepdown as well. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-21 07:26:03 -07:00
Derek Collison	12bb46032c	Fix RAFT WAL repair. When we stored a message in the raft layer in a wrong position (state corrupt), we would panic, leaving the message there. On restart we would truncate the WAL and try to repair, but we truncated to the wrong index of the bad entry. This change also includes additional changes to truncateWAL and also reduces the conditional for panic on storeMsg. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-20 18:41:37 -07:00
Derek Collison	08b498fbda	Log error on write errors Signed-off-by: Derek Collison <derek@nats.io>	2021-09-19 12:14:31 -07:00
Derek Collison	3099327697	During peer removal, try to remap any stream or consumer assets. Also if we do not have room trap add peer and process there. Fixed a bug that would treat ephemerals same as durables during remapping after peer removal. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-06 17:29:45 -07:00
Derek Collison	f13fa767c2	Remove the swapping of accounts during processing of service imports. When processing service imports we would swap out the accounts during processing. With the addition of internal subscriptions and internal clients publishing in JetStream we had an issue with the wrong account being used. This was specific to delyaed pull subscribers trying to unsubscribe due to max of 1 while other JetStream API calls were running concurrently.	2021-07-26 07:57:10 -07:00
Derek Collison	6eef31c0fc	Fixed peer info reports that had large last active values. Also put in safety for lag going upside down as well. Signed-off-by: Derek Collison <derek@nats.io>	2021-07-06 10:14:43 -07:00
Derek Collison	5ec0f291a6	When we got into certain situations where we are catching up but the first entry matches the index but not the term, we would not update term. This would cause CPU spikes and catchup cycles that could spin. Signed-off-by: Derek Collison <derek@nats.io>	2021-06-11 15:02:46 -07:00
Derek Collison	9ccc843382	Removing peers should wait for RemovePeer entry replication. Signed-off-by: Derek Collison <derek@nats.io>	2021-05-19 18:58:19 -07:00
Derek Collison	57395bba02	Fixed bug that could cause raft group to spin trying to catchup. When a raft group was trying to catch up a consumer but the log is empty and we do have a snapshot but the requested sequence was the first sequence. Signed-off-by: Derek Collison <derek@nats.io>	2021-05-07 09:13:18 -07:00
Derek Collison	db402cc444	Under heavy load and a leader change we could warn about not processing responses. This also adjust the min election timeout to 2 seconds vs just 1 for very large networks. Signed-off-by: Derek Collison <derek@nats.io>	2021-05-03 19:40:40 -07:00
scottf	486df98373	close tempfiles, fix path print	2021-04-22 12:47:21 -04:00
Waldemar Quevedo	c9ab7ce8a1	Fix for data race when disabling JS running out of resources Signed-off-by: Waldemar Quevedo <wally@synadia.com>	2021-04-21 14:26:52 -07:00
Derek Collison	902b9dec12	Merge pull request #2131 from nats-io/updates General Updates and Stability Improvements	2021-04-20 13:52:39 -07:00
Matthias Hanel	b73be52862	[fixed] only become observer if the leaf config has raft not restricted (#2125 ) If a subject in the system accounts leafnode deny_imports matches $NRG.> then jetstream is explicitly disconnected and the server can become leader. Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-04-19 13:10:49 -04:00
Derek Collison	1dd7e8c7d1	Increase apply channel size Signed-off-by: Derek Collison <derek@nats.io>	2021-04-16 14:00:46 -07:00
Derek Collison	8e82f36c5b	Track removed peers properly Signed-off-by: Derek Collison <derek@nats.io>	2021-04-14 20:29:09 -07:00
Derek Collison	cf34514f9f	Do not limit expansion of new peers Signed-off-by: Derek Collison <derek@nats.io>	2021-04-14 18:47:11 -07:00
Derek Collison	755ef74855	When a cluser of leafnodes connects to a cluster or supercluster hub and they share the system account make the leafnode servers observers. Signed-off-by: Derek Collison <derek@nats.io>	2021-04-12 17:00:55 -07:00
Derek Collison	69269c5653	Merge pull request #2095 from nats-io/mixed Mixed mode improvements.	2021-04-09 16:56:41 -07:00
Jaime Piña	27e9628c3a	Run gofmt -s to simplify code	2021-04-09 15:18:06 -07:00
Derek Collison	e438d2f5fa	Mixed mode improvements. 1. When in mixed mode and only running the global account we now will check the account for JS. 2. Added code to decrease the cluster set size if we guessed wrong in mixed mode setup. Signed-off-by: Derek Collison <derek@nats.io>	2021-04-09 14:58:35 -07:00
Derek Collison	14a826fb60	Check for entries going negative. Shutdown in place on server exit Signed-off-by: Derek Collison <derek@nats.io>	2021-03-30 11:46:15 -07:00
Derek Collison	327d913ae1	Under rare scenarios we could fail to load, but this should not be a panic. We should recover on the lines below. Signed-off-by: Derek Collison <derek@nats.io>	2021-03-29 07:34:28 -07:00
Derek Collison	0f71c260fb	Durable consumers with R>1 had performance challenges. This code changes the way we handle raft based proposals for consumers. Signed-off-by: Derek Collison <derek@nats.io>	2021-03-26 12:53:49 -07:00
Derek Collison	5d5de5925f	Introduce a previous leader state in the raft layer to allow quicker responses when leaderless. Signed-off-by: Derek Collison <derek@nats.io>	2021-03-25 17:08:29 -07:00

1 2 3

141 Commits