nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-17 11:24:44 -07:00

Author	SHA1	Message	Date
Derek Collison	dbfa47f9b1	Improve state preservation for consumers, specifically DeliverNew variants when no activity has been present. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-16 20:55:14 -07:00
Ivan Kozlovic	b4128693ed	Ensure file path is correct during stream restore Also had to change all references from `path.` to `filepath.` when dealing with files, so that it works properly on Windows. Fixed also lots of tests to defer the shutdown of the server after the removal of the storage, and fixed some config files directories to use the single quote `'` to surround the file path, again to work on Windows. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-09 13:31:51 -07:00
Derek Collison	3216eb5ee5	When a consumer has no state we are now compacting the log, but were not snapshotting. This caused issues on leader change and losing quorum. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-09 07:21:25 -05:00
Derek Collison	b759ff481f	Some users reporting checksums don't match and "no message cache" on recovery. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-04 11:50:15 -08:00
Derek Collison	1b5f651c22	Fixed bug that would not recover a stream after non-clean shutdown with deleted messages. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-04 10:48:10 -08:00
Derek Collison	2942d012f6	Merge pull request #2878 from nats-io/key_file_leak Cleanup key files when removing message blocks.	2022-02-17 13:26:41 -07:00
Derek Collison	330a40009c	Cleanup key files when removing message blocks. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-17 11:33:41 -08:00
Derek Collison	4efce40bbd	Small improvements to send performance to a full stream. Cleaned up some locking and if fifo make index updates lazy like writeMsgRecord. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-17 05:39:27 -08:00
Derek Collison	68104d7cf3	During a filestore snapshot we generate the fss files but were not cleaning them up if the block was deleted before a server restart. https://gist.github.com/nekufa/010185dfb59261f222a0042d3a7d2a1c Signed-off-by: Derek Collison <derek@nats.io>	2022-02-09 17:12:08 -08:00
Derek Collison	0cc7302be9	A stream name is tied to its identity and can not be changed on a restore. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-09 12:38:45 -08:00
Derek Collison	c13a84cf44	Fixed a bug that would calculate the first sequence of a filteredPending incorrectly. Also added in more optimized version to select the first matching message in a message block for LoadNextMsg. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-08 13:29:38 -08:00
Derek Collison	d50febeeff	Improved sparse consumers replay time. When a stream has multiple subjects and a consumer filters the stream to a small and spread out list of messages the logic would do a linear scan looking for the next message for the filtered consumer. This CL allows the store layer to utilize the per subject info to improve the times. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-07 17:26:32 -08:00
Derek Collison	5da0453964	Add in NumSubjects to StreamInfo Signed-off-by: Derek Collison <derek@nats.io>	2022-02-02 08:51:13 -08:00
Derek Collison	6a3cf0f71e	Added in ability to get number of subjects from StreamInfo, and optionally details per subject on how many messages each subject has. This can also be filtered, meaning you can filter out the subjects when asking for details. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-02 08:51:13 -08:00
Derek Collison	8fce45dfa7	Under certain scenarios the pending for a consumer could appear to get stuck. Under the covers we were calculating pending per msg block incorrectly when a single message existed beyond the requested sequence. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-01 12:17:08 -08:00
Derek Collison	fa814f7cee	Fixed behavior for when MaxMsgsPerSubject is set and DiscardNew is also set. Signed-off-by: Derek Collison <derek@nats.io>	2022-01-31 08:36:37 -08:00
Derek Collison	8815072e34	Fix flapping test Signed-off-by: Derek Collison <derek@nats.io>	2022-01-30 14:54:24 -08:00
Derek Collison	275d42628b	Fix for #2828 . The original design of the consumer and the subsequent store did not allow updates. Now that we do, we need to store the new config into our storage layer. Signed-off-by: Derek Collison <derek@nats.io>	2022-01-30 09:45:05 -08:00
Derek Collison	579bf336ad	Allow NAK to take a delay parameter to delay redelivery for a certain amount of time. Signed-off-by: Derek Collison <derek@nats.io>	2022-01-24 14:57:28 -08:00
Derek Collison	d07000cde0	If we detect negative deleted, adjust Signed-off-by: Derek Collison <derek@nats.io>	2022-01-24 10:52:46 -08:00
Derek Collison	420a2ef514	When rebuilding the complete state need to do this in a go routine. We did this properly above but forgot this one. Signed-off-by: Derek Collison <derek@nats.io>	2022-01-12 20:19:45 -08:00
Derek Collison	c5fbb63614	JetStream ephemeral consumers could create a situation where the server would exhaust the OS thread limit - default 10k. Under certain situations large number of consumers that are racing to update state or delete their stores during a delete would start taking up OS threads due to blocking disk IO. When this happened and their were a bunch of Go routines becoming runnable the Go runtime would create extra OS threads to fill in the runnable pool and would exhaust the max thread setting. This code places a channel as a simple semaphore to limit the number of disk IO blocking OS threads. Signed-off-by: Derek Collison <derek@nats.io>	2021-12-29 07:05:34 -08:00
Derek Collison	b7c61cd0bf	Stabilize filstore to eliminate sporadic errPartialCache errors under certain situations. Related to #2732 The filestore would release a msgBlock lock while trying to load a cache block if it thought it needed to flush pending data. With async false, this should be very rare but was possible after careful inspection. I constructed an artificial test with sleeps throughout the filestore code to reproduce. It involved having 2 Go routines that were through and waiting on the last msg block, and another one that was writing. After the write, but before we flushed after releasing the lock we would also artificially sleep. This would lead to the second read seeing the cache load was already in progress and return no error. If the load was for a sequence before the current write sequence, and async was false, the cache fseq would be higher than what was requested. This would cause the errPartialCache to be returned. Once returned to the consumer level in loopAndGather, it would exit that Go routine and the consumer would cease to function. This change removed the unlock of a msgBlock to perform and flush, ensuring that two cacheLoads would not yield the errPartialCache. I also updated the consumer in the case this does happen in the future to not exit the loopAndGather Go routine. Signed-off-by: Derek Collison <derek@nats.io>	2021-12-27 09:54:02 -08:00
Derek Collison	89b94ae650	Improved selectMsgBlock with lots of messages. Also have fetchMsg return hint about clearing cache. Signed-off-by: Derek Collison <derek@nats.io>	2021-12-22 17:45:12 -08:00
Ivan Kozlovic	7c3c9ef1ee	[FIXED] JetStream: stream first/last sequence possibly reset A low-level Filestore issue would cause a new block to be created when the last block was empty, but the index for the new block would not be forced to be written on disk. The observed issue could be that with a stream with a WorkQueue retention policy, its first/last sequence values could be reset after a pull subscriber would have consumed all messages and the server was restarted without a clean shutdown. This would cause the pull subscriber to "stall" until enough new messages are sent to reach a stream sequence that catches up with the consumer's view of the stream first sequence prior to the restart. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-12-20 19:08:08 -07:00
Ivan Kozlovic	1b8878138a	[FIXED] JetStream: panic "could not decode consumer snapshot" Resolves #2720 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-12-08 12:22:03 -07:00
Ivan Kozlovic	9f30bf00e0	[FIXED] Corrupted headers receiving from consumer with meta-only When a consumer is configured with "meta-only" option, and the stream was backed by a memory store, a memory corruption could happen causing the application to receive corrupted headers. Also replaced most of usage of `append(a[:0:0], a...)` to make copies. This was based on this wiki: https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F But since Go 1.15, it is actually faster to call make+copy instead. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-12-01 10:50:15 -07:00
Derek Collison	e65f3d4a30	[FIXED #2706 ] - Only utilize full state with deleted details when really needed. Otherwise fast state will suffice. Signed-off-by: Derek Collison <derek@nats.io>	2021-11-29 10:50:28 -08:00
Derek Collison	63c4c23cae	Needed to undo since we already recorded Signed-off-by: Derek Collison <derek@nats.io>	2021-11-18 14:09:52 -08:00
Derek Collison	7e615a1de9	Handle skip msgs better, do not update mb stats, clear erased bit Signed-off-by: Derek Collison <derek@nats.io>	2021-11-18 13:59:29 -08:00
Derek Collison	14469ccfc8	Fix for #2662 . Upon server restart a server would set the check expiration to the configured amount vs delta of next to expire. Signed-off-by: Derek Collison <derek@nats.io>	2021-11-01 18:04:37 -07:00
Derek Collison	3a14a984fc	Fix for a bug that did not properly decode redelivered state for consumers from a filestore. This also caused state abnormalities in a user's setup so added code to clean up bad state as needed. Signed-off-by: Derek Collison <derek@nats.io>	2021-10-28 08:33:48 -07:00
Derek Collison	cc4f802e09	Optimize compaction under heavy KV use Signed-off-by: Derek Collison <derek@nats.io>	2021-10-26 08:39:22 -07:00
Derek Collison	06168083c7	Fix for #2622 . We were not escaping the top level iterator across message blocks when calculating when to break due to keep > 0. Signed-off-by: Derek Collison <derek@nats.io>	2021-10-14 09:25:21 -07:00
Derek Collison	075e8c9070	Make sure wp is > len(cache.buf) Signed-off-by: Derek Collison <derek@nats.io>	2021-09-22 14:46:31 -07:00
Derek Collison	de851e513f	Fix for #2548 Replicated durable consumers that were backed by a memory store were bypassing snapshotting which also did compaction of the raft WAL. This change adapts for memory store backed consumers by compacting the raft WAL directly on snapshot logic. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-21 08:02:11 -07:00
Derek Collison	4283358dcd	Improvments to writeIndexInfo logic and managing open FDs. Also hold lock while doing sync and optionally close FDs if idle. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-19 11:45:16 -07:00
Derek Collison	7a4c904761	Improvements to cache management. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-18 15:21:12 -07:00
Derek Collison	620b56e12f	During compaction the cache may not be loaded completely if msg block was lmb (active writing). This could lead to the filtered subject state being incorrect. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-13 14:36:50 -07:00
Derek Collison	f75371022d	Fix for issue #2488 . When we triggered a filestore msg block compact we were not properly dealing with interior deletes. Subsequent lookups past the skipped messages would cause an error and stop delivering messages. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-09 09:53:22 -07:00
Derek Collison	2b2c4ba4a6	Bump Go test timeout Signed-off-by: Derek Collison <derek@nats.io>	2021-09-07 08:20:54 -07:00
Derek Collison	29eaa9c614	Fixed bug that could lead to perceived message loss. Under load and pressure from concurrent publishing and consuming with multiple consumers the filestore would return a partial or no cache error to the upper layers. For consumers this could result in us skipping a stream sequence when we should not. This change stabilizes the filestore and removes the flush state for msg blocks. I also found some bugs that did not track last sequence properly after snapshots / restore. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-05 16:36:23 -07:00
Derek Collison	4b97f98d18	Merge pull request #2467 from nats-io/slow_encrypt Do not use crypto rand for nonce generation.	2021-08-25 14:09:27 -07:00
Derek Collison	ba4937f04e	The slowdown was due to trying top expire messages without a proper index info. So now we read and encrypt index info in place as well. Signed-off-by: Derek Collison <derek@nats.io>	2021-08-25 13:22:18 -07:00
Derek Collison	4a6f1b4819	Do not use crypto rand for nonce generation. Crypto rand is not needed for nonce generation and could drain entropy. Signed-off-by: Derek Collison <derek@nats.io>	2021-08-24 12:51:13 -07:00
Derek Collison	752fd295a5	Consumer num pending fixes for multiple matches and merging. Signed-off-by: Derek Collison <derek@nats.io>	2021-08-24 07:52:29 -07:00
Derek Collison	12c912d7f4	Only compact when msg is not first. Make sure compact works with snapshots. Signed-off-by: Derek Collison <derek@nats.io>	2021-08-20 06:47:53 -07:00
Derek Collison	ea040b77ef	Updates based on feedback Signed-off-by: Derek Collison <derek@nats.io>	2021-08-19 19:04:36 -07:00
Derek Collison	d349edeeb6	When a JetStream stream was used as a KV, there could be times where we have lots of file storage unused. This change introduces utilization, better interior block deletes, and individual block compaction when we are below 50% utilization of the block. Signed-off-by: Derek Collison <derek@nats.io>	2021-08-19 18:24:41 -07:00
Derek Collison	a5afa86790	Merge pull request #2453 from nats-io/encrypt-checks Add in additional checks for failures during filestore encryption.	2021-08-17 14:55:41 -07:00

1 2 3 4 5

201 Commits