nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-16 19:14:41 -07:00

Author	SHA1	Message	Date
Derek Collison	4291433a46	General improvements to accounting for the filestore. This in response to tracking issue #3114 . Signed-off-by: Derek Collison <derek@nats.io>	2022-05-12 15:43:11 -07:00
Derek Collison	b35988adf9	Remember the last timestamp by not removing last msgBlk when empty and during purge pull last timestamp forward until new messages arrive. When a downstream stream uses retention modes that delete messages, fallback to timebased start time for the new source consumers. Signed-off-by: Derek Collison <derek@nats.io>	2022-05-09 09:04:19 -07:00
Ivan Kozlovic	34650e9dd5	Fixed data race and some flappers Data race that has been seen: ``` Read at 0x00c00134bec0 by goroutine 159: github.com/nats-io/nats-server/v2/server.(client).msgHeaderForRouteOrLeaf() /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:2935 +0x254 github.com/nats-io/nats-server/v2/server.(client).processMsgResults() /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4364 +0x2147 (...) Previous write at 0x00c00134bec0 by goroutine 201: github.com/nats-io/nats-server/v2/server.(Server).addRoute() /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1475 +0xdb4 github.com/nats-io/nats-server/v2/server.(client).processRouteInfo() /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:641 +0x1704 ``` Also fixed some flappers and removed use of `s.js.` since we have already captured `js` in Jsz monitoring. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-31 10:05:34 -06:00
Derek Collison	607858f213	Improved consumer snapshot logic in clustered mode and disk usage. Also fixed a bug that could cause memory based replicated consumers to no longer work after snapshots and server restarts. The snapshot logic would allow non-state changing updates to continously grow the raft logs. We also were too conservative on when we snapshotted and why. Also added in ability to have FileStore.Compact() reclaim space from the block file from the head of last changed block. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-29 18:02:49 -07:00
Derek Collison	bd0a0b28c7	When recycling blocks we could potentially place partials into a tier. This would possibly cause the load code to thrash since it would not be big enough for a full block and we would need to recycle again and make a new one. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-28 16:46:46 -07:00
Derek Collison	5e5aab378e	Additional improvements to memory pooling and management. Also logic fix for firstMatching that did unnecessary work when matching all. During contention to the head write blk, the system could perform worse memory wise compared to simple go runtime. Also had some references for the subject of messages bloating memory. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-28 10:15:23 -07:00
Derek Collison	ef8f543ea5	Improve memory usage through JetStream storage layer. Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies. Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them. The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally. Also fixed some flappers and a bug where de-dupe might not be reformed correctly. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-24 17:45:15 -06:00
Ivan Kozlovic	b4128693ed	Ensure file path is correct during stream restore Also had to change all references from `path.` to `filepath.` when dealing with files, so that it works properly on Windows. Fixed also lots of tests to defer the shutdown of the server after the removal of the storage, and fixed some config files directories to use the single quote `'` to surround the file path, again to work on Windows. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-09 13:31:51 -07:00
Derek Collison	3216eb5ee5	When a consumer has no state we are now compacting the log, but were not snapshotting. This caused issues on leader change and losing quorum. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-09 07:21:25 -05:00
Derek Collison	8fce45dfa7	Under certain scenarios the pending for a consumer could appear to get stuck. Under the covers we were calculating pending per msg block incorrectly when a single message existed beyond the requested sequence. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-01 12:17:08 -08:00
Derek Collison	d486c24199	Allow a consumer to be configured with BackOffs. This allows a consumer to have exponential backoffs vs static AckWait and MaxDeliver. When BackOff is set it will overridde AckWait to BackOff[0] and MaxDeliver will be len(BackOff)+1. Signed-off-by: Derek Collison <derek@nats.io>	2022-01-24 14:57:36 -08:00
Derek Collison	89b94ae650	Improved selectMsgBlock with lots of messages. Also have fetchMsg return hint about clearing cache. Signed-off-by: Derek Collison <derek@nats.io>	2021-12-22 17:45:12 -08:00
Ivan Kozlovic	7c3c9ef1ee	[FIXED] JetStream: stream first/last sequence possibly reset A low-level Filestore issue would cause a new block to be created when the last block was empty, but the index for the new block would not be forced to be written on disk. The observed issue could be that with a stream with a WorkQueue retention policy, its first/last sequence values could be reset after a pull subscriber would have consumed all messages and the server was restarted without a clean shutdown. This would cause the pull subscriber to "stall" until enough new messages are sent to reach a stream sequence that catches up with the consumer's view of the stream first sequence prior to the restart. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-12-20 19:08:08 -07:00
Ivan Kozlovic	1b8878138a	[FIXED] JetStream: panic "could not decode consumer snapshot" Resolves #2720 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-12-08 12:22:03 -07:00
Derek Collison	3a14a984fc	Fix for a bug that did not properly decode redelivered state for consumers from a filestore. This also caused state abnormalities in a user's setup so added code to clean up bad state as needed. Signed-off-by: Derek Collison <derek@nats.io>	2021-10-28 08:33:48 -07:00
Derek Collison	06168083c7	Fix for #2622 . We were not escaping the top level iterator across message blocks when calculating when to break due to keep > 0. Signed-off-by: Derek Collison <derek@nats.io>	2021-10-14 09:25:21 -07:00
Derek Collison	f75371022d	Fix for issue #2488 . When we triggered a filestore msg block compact we were not properly dealing with interior deletes. Subsequent lookups past the skipped messages would cause an error and stop delivering messages. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-09 09:53:22 -07:00
Derek Collison	29eaa9c614	Fixed bug that could lead to perceived message loss. Under load and pressure from concurrent publishing and consuming with multiple consumers the filestore would return a partial or no cache error to the upper layers. For consumers this could result in us skipping a stream sequence when we should not. This change stabilizes the filestore and removes the flush state for msg blocks. I also found some bugs that did not track last sequence properly after snapshots / restore. Signed-off-by: Derek Collison <derek@nats.io>	2021-09-05 16:36:23 -07:00
Ivan Kozlovic	1308c73273	[CHANGED] ConsumerInfo's SequencePair replaced with SequenceInfo This change was made in a previous PR wit this commit: `9405b77e46` After some discussions, we agreed that the original approach is best, so using a dedicated object SequenceInfo for ConsumerInfo. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-08-23 12:28:23 -06:00
Derek Collison	12c912d7f4	Only compact when msg is not first. Make sure compact works with snapshots. Signed-off-by: Derek Collison <derek@nats.io>	2021-08-20 06:47:53 -07:00
Derek Collison	d349edeeb6	When a JetStream stream was used as a KV, there could be times where we have lots of file storage unused. This change introduces utilization, better interior block deletes, and individual block compaction when we are below 50% utilization of the block. Signed-off-by: Derek Collison <derek@nats.io>	2021-08-19 18:24:41 -07:00
Derek Collison	969c563822	Fix for flapper Signed-off-by: Derek Collison <derek@nats.io>	2021-08-15 11:31:55 -07:00
Derek Collison	9d7123213a	Keep SequencePair vs SequenceInfo Signed-off-by: Derek Collison <derek@nats.io>	2021-08-14 12:01:29 -07:00
Derek Collison	b517229c32	Fix for #2417 Signed-off-by: Derek Collison <derek@nats.io>	2021-08-06 14:44:00 -07:00
Derek Collison	4e92b0ed6e	When a server was restarting, if a stream had a MaxAge and there were a very large amount of messages to expire, this would take too long. During normal operation and quick restarts the number of expired messages per cycle is manageable and correct. However if a server is shutdown for quite a long time and many messages have expired this process is too slow. This commit introduces an optimized expiration tailored for startup vs running state. Signed-off-by: Derek Collison <derek@nats.io>	2021-07-30 12:48:47 -07:00
Matthias Hanel	d89aa258d1	[fixed] crash in unit test caused by *ConsumerStore.State() return nil, nil Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-07-27 19:15:09 -04:00
Derek Collison	99fed910f0	Improvements to large numbers of JetStream R1 consumers per stream. 1. We were holding open FDs longer than we should for consumers causing issues with open FD limits. We now do not hold them open and cap updates a bit better. 2. When doing a stream delete, consumer delete was repeating alot of work that was not necessary, causing longer delays. This has been optimized a bit, still more improvements to be made. 3. We cover all JS under a single export, but that was also trapping GetNext for pull based consumers, and since this was a no-op (is handled at user account level) we were creating alot of garbage service import responses and reverse map entries that had to be garbage collected. We have a fix in to avoind this but still looking for a better one. 4. Still had some lingering references to all exports vs single JS export. Signed-off-by: Derek Collison <derek@nats.io>	2021-06-29 05:45:55 -07:00
Derek Collison	ad32edfb59	If a consumer state has a redelivered that is not quite correct that should not be a corrupt state error. Signed-off-by: Derek Collison <derek@nats.io>	2021-05-17 17:44:58 -07:00
Derek Collison	a33c64959d	[FIXED] Under certain conditions with messages being auto-deleted we would not honor blk size properly. Signed-off-by: Derek Collison <derek@nats.io>	2021-05-05 08:27:22 -07:00
Derek Collison	71ba4b1bf6	Improved stability when expireMsgs and writeMsg and Compact/Purge all concurrent. We had issues of instability and incorrect behavior during concurrent operations. This CL optimizes expiring msgs to be more efficient and hold the lock until completion. Compact will also now hold the top level lock through completion. Signed-off-by: Derek Collison <derek@nats.io>	2021-05-04 20:12:38 -07:00
Derek Collison	e0eaf704dc	Partial fix for #2068 . This will not protect against the index file being completely removed. Signed-off-by: Derek Collison <derek@nats.io>	2021-04-10 12:46:31 -07:00
Derek Collison	2bc4bf19a7	Make sure to clean up directories Signed-off-by: Derek Collison <derek@nats.io>	2021-04-10 10:25:12 -07:00
Jaime Piña	27e9628c3a	Run gofmt -s to simplify code	2021-04-09 15:18:06 -07:00
Derek Collison	bfb482121d	In some scenarios compact could count messages it should not and make mb.msgs go negative which meant max uint64. Signed-off-by: Derek Collison <derek@nats.io>	2021-04-07 14:39:23 -07:00
Jaime Piña	d929ee1348	Check errors when removing test directories and files Currently in tests, we have calls to os.Remove and os.RemoveAll where we don't check the returned error. This hides useful error messages when tests fail to run, such as "too many open files". This change checks for more filesystem related errors and calls t.Fatal if there is an error.	2021-04-07 11:09:47 -07:00
Jaime Piña	e44275b963	Consolidate temporary test files and directories Currently, temporary test files and directories are written in lots of different paths within the OS's temp dir. This makes it hard to know which files are from nats-server and which are unrelated. This in turn makes it hard to clean up nats-server test files.	2021-04-06 10:42:55 -07:00
Derek Collison	d4e4c37e94	Test fixes Signed-off-by: Derek Collison <derek@nats.io>	2021-03-14 06:18:50 -07:00
Derek Collison	3c85df0a44	Truncate up to entry, no need for previous Signed-off-by: Derek Collison <derek@nats.io>	2021-03-14 05:18:52 -07:00
Waldemar Quevedo	86a64fbc46	Updates to JS consumer errors Signed-off-by: Waldemar Quevedo <wally@synadia.com>	2021-03-09 09:46:28 -08:00
Derek Collison	18ee2b7a40	Changes to handle short writes. Bug fixes to truncation. Signed-off-by: Derek Collison <derek@nats.io>	2021-03-03 20:07:39 -08:00
Derek Collison	f5cbd55b46	Fixed data corruption bug, optimized Compact(). Also trim fs.blks slice appropriately to avoid unbound growth. Signed-off-by: Derek Collison <derek@nats.io>	2021-02-28 05:14:25 -08:00
Derek Collison	a9ab41f13b	Fix lock bug, capture write errors and report better Signed-off-by: Derek Collison <derek@nats.io>	2021-02-25 15:01:47 -08:00
Derek Collison	55886616b7	Fix for index mismatch on first seq Signed-off-by: Derek Collison <derek@nats.io>	2021-02-11 11:16:00 -08:00
Derek Collison	fa8a95a06a	Improved snapshots and compactions. Various bug fixes and stability improvements. Signed-off-by: Derek Collison <derek@nats.io>	2021-02-11 11:16:00 -08:00
Derek Collison	c16f6e193d	Move JetStream direct APIs to private. Signed-off-by: Derek Collison <derek@nats.io>	2021-02-07 15:19:22 -08:00
Derek Collison	b358773ddf	Force filestore to flush in place by default. Track lost data and truncate message blocks when detecting failures or write errors. Signed-off-by: Derek Collison <derek@nats.io>	2021-02-06 20:04:47 -08:00
Derek Collison	eb429022e8	There was a bug when a messages was erased and we expired the cache we would not properly re-index. We need to clear the erase bit to properly set fseq in the cache structure. Signed-off-by: Derek Collison <derek@nats.io>	2021-02-03 13:58:13 -08:00
Derek Collison	f0cdf89c61	JetStream Clustering WIP Signed-off-by: Derek Collison <derek@nats.io>	2021-01-14 01:14:52 -08:00
Derek Collison	be6289be51	Fix for https://github.com/nats-io/jetstream/issues/406 Signed-off-by: Derek Collison <derek@nats.io>	2020-12-13 11:59:50 -08:00
Derek Collison	7564768027	Added Compact to store interface for WAL functionality Signed-off-by: Derek Collison <derek@nats.io>	2020-12-03 16:18:58 -08:00

1 2

95 Commits