Commit Graph

214 Commits

Author SHA1 Message Date
Ivan Kozlovic
50c3986863 [FIXED] JetStream stream catchup issues
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 16:05:12 -06:00
Derek Collison
e7ff38a4ca Add consumerMemStore impl to allow proper replication of state.
Resolves #3006

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-10 08:01:13 -07:00
Derek Collison
ef9728997d During recovery check our guess on the last block.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-05 19:20:31 -07:00
Derek Collison
ab5e2344e0 When loading blocks in use len(mb.fss) to determine if we can use sfilter optimization.
Also check fs.lmb when the stream config is updated.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-05 18:49:21 -07:00
Ivan Kozlovic
371ce36712 [IMPROVED] Stream with multiple subjects and consumer with filter
This is more of a regression introduced in v2.7.3 (with PR #2848).
When the store has a list of subjects, finding the next message
to deliver would go through the subjects map and have to match
to find out if it is a subset (if the filter had a wildcard).
In situations where there were lots of subjects (for instance 1
message per subject), but the consumer did not filter on anything
specific, then this processing was becoming slow.

We now check that if the stream has a single subject (even with
wildcard) and the consumer filters on that exact subject, then
we can do a linear scan. We also do a linear scan if the number
of messages in the block is 1/2 the number of subjects in the
subjects map.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-05 18:19:17 -06:00
Derek Collison
607858f213 Improved consumer snapshot logic in clustered mode and disk usage.
Also fixed a bug that could cause memory based replicated consumers to no longer work after snapshots and server restarts.

The snapshot logic would allow non-state changing updates to continously grow the raft logs. We also were too conservative on when we snapshotted and why.
Also added in ability to have FileStore.Compact() reclaim space from the block file from the head of last changed block.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-29 18:02:49 -07:00
Derek Collison
780d4c0dd8 Merge pull request #2960 from nats-io/mem_pool
Additional improvements to memory pooling and management.
2022-03-28 17:10:16 -07:00
Derek Collison
bd0a0b28c7 When recycling blocks we could potentially place partials into a tier. This would possibly cause the load code to thrash since it would not be big enough for a full block and we would need to recycle again and make a new one.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 16:46:46 -07:00
Ivan Kozlovic
f82eda30aa Fix map init
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-28 17:46:01 -06:00
Ivan Kozlovic
909c6754cb Changed subjString to accept a byte slice
This may prevent memory copies when not necessary. Also fixed a bug
there that would check twice if there was only 1 subject and that
subject did not match (say configured subject is foo.* and key is
foo.bar).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-28 17:37:28 -06:00
Derek Collison
5e5aab378e Additional improvements to memory pooling and management. Also logic fix for firstMatching that did unnecessary work when matching all.
During contention to the head write blk, the system could perform worse memory wise compared to simple go runtime.
Also had some references for the subject of messages bloating memory.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 10:15:23 -07:00
Derek Collison
04d4f08e8c Under heavy contention skip combined with remove could result in index being stamped with underflow for number of messages.
We had a report of a panic on server restart with 2.8.0-beta.1. The panic was trying to malloc the size of a load block based off of the number of messages we thought the block had from the index.
Before, SkipMsg would decrement and when we added the record via writeMsgRecord we would add it back in. However we did release the lock, meaning other things could run.
If in between the decrement, say to 0 (we did protect against underflow there), then a remove and subsequent writeIndexInfo would stamp and underflow.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-26 11:05:38 -07:00
Derek Collison
ef8f543ea5 Improve memory usage through JetStream storage layer.
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.

The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.

Also fixed some flappers and a bug where de-dupe might not be reformed correctly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Derek Collison
dbfa47f9b1 Improve state preservation for consumers, specifically DeliverNew variants when no activity has been present.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-16 20:55:14 -07:00
Ivan Kozlovic
b4128693ed Ensure file path is correct during stream restore
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.

Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 13:31:51 -07:00
Derek Collison
3216eb5ee5 When a consumer has no state we are now compacting the log, but were not snapshotting.
This caused issues on leader change and losing quorum.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-09 07:21:25 -05:00
Derek Collison
b759ff481f Some users reporting checksums don't match and "no message cache" on recovery.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-04 11:50:15 -08:00
Derek Collison
1b5f651c22 Fixed bug that would not recover a stream after non-clean shutdown with deleted messages.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-04 10:48:10 -08:00
Derek Collison
2942d012f6 Merge pull request #2878 from nats-io/key_file_leak
Cleanup key files when removing message blocks.
2022-02-17 13:26:41 -07:00
Derek Collison
330a40009c Cleanup key files when removing message blocks.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-17 11:33:41 -08:00
Derek Collison
4efce40bbd Small improvements to send performance to a full stream.
Cleaned up some locking and if fifo make index updates lazy like writeMsgRecord.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-17 05:39:27 -08:00
Derek Collison
68104d7cf3 During a filestore snapshot we generate the fss files but were not cleaning them up if the block was deleted before a server restart.
https://gist.github.com/nekufa/010185dfb59261f222a0042d3a7d2a1c

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-09 17:12:08 -08:00
Derek Collison
0cc7302be9 A stream name is tied to its identity and can not be changed on a restore.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-09 12:38:45 -08:00
Derek Collison
c13a84cf44 Fixed a bug that would calculate the first sequence of a filteredPending incorrectly.
Also added in more optimized version to select the first matching message in a message block for LoadNextMsg.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-08 13:29:38 -08:00
Derek Collison
d50febeeff Improved sparse consumers replay time.
When a stream has multiple subjects and a consumer filters the stream to a small and spread out list of messages the logic would do a linear scan looking for the next message for the filtered consumer.
This CL allows the store layer to utilize the per subject info to improve the times.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-07 17:26:32 -08:00
Derek Collison
5da0453964 Add in NumSubjects to StreamInfo
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-02 08:51:13 -08:00
Derek Collison
6a3cf0f71e Added in ability to get number of subjects from StreamInfo, and optionally details per subject on how many messages each subject has.
This can also be filtered, meaning you can filter out the subjects when asking for details.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-02 08:51:13 -08:00
Derek Collison
8fce45dfa7 Under certain scenarios the pending for a consumer could appear to get stuck.
Under the covers we were calculating pending per msg block incorrectly when a single message existed beyond the requested sequence.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-01 12:17:08 -08:00
Derek Collison
fa814f7cee Fixed behavior for when MaxMsgsPerSubject is set and DiscardNew is also set.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-31 08:36:37 -08:00
Derek Collison
8815072e34 Fix flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-30 14:54:24 -08:00
Derek Collison
275d42628b Fix for #2828. The original design of the consumer and the subsequent store did not allow updates.
Now that we do, we need to store the new config into our storage layer.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-30 09:45:05 -08:00
Derek Collison
579bf336ad Allow NAK to take a delay parameter to delay redelivery for a certain amount of time.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 14:57:28 -08:00
Derek Collison
d07000cde0 If we detect negative deleted, adjust
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 10:52:46 -08:00
Derek Collison
420a2ef514 When rebuilding the complete state need to do this in a go routine.
We did this properly above but forgot this one.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-12 20:19:45 -08:00
Derek Collison
c5fbb63614 JetStream ephemeral consumers could create a situation where the server would exhaust the OS thread limit - default 10k.
Under certain situations large number of consumers that are racing to update state or delete their stores during a delete
would start taking up OS threads due to blocking disk IO. When this happened and their were a bunch of Go routines becoming
runnable the Go runtime would create extra OS threads to fill in the runnable pool and would exhaust the max thread setting.

This code places a channel as a simple semaphore to limit the number of disk IO blocking OS threads.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-29 07:05:34 -08:00
Derek Collison
b7c61cd0bf Stabilize filstore to eliminate sporadic errPartialCache errors under certain situations. Related to #2732
The filestore would release a msgBlock lock while trying to load a cache block if it thought it needed to flush pending data.
With async false, this should be very rare but was possible after careful inspection.

I constructed an artificial test with sleeps throughout the filestore code to reproduce.
It involved having 2 Go routines that were through and waiting on the last msg block, and another one that was writing.
After the write, but before we flushed after releasing the lock we would also artificially sleep.
This would lead to the second read seeing the cache load was already in progress and return no error.
If the load was for a sequence before the current write sequence, and async was false, the cache fseq would be higher than what was requested.
This would cause the errPartialCache to be returned.

Once returned to the consumer level in loopAndGather, it would exit that Go routine and the consumer would cease to function.

This change removed the unlock of a msgBlock to perform and flush, ensuring that two cacheLoads would not yield the errPartialCache.

I also updated the consumer in the case this does happen in the future to not exit the loopAndGather Go routine.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-27 09:54:02 -08:00
Derek Collison
89b94ae650 Improved selectMsgBlock with lots of messages. Also have fetchMsg return hint about clearing cache.
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-22 17:45:12 -08:00
Ivan Kozlovic
7c3c9ef1ee [FIXED] JetStream: stream first/last sequence possibly reset
A low-level Filestore issue would cause a new block to be created
when the last block was empty, but the index for the new block
would not be forced to be written on disk.

The observed issue could be that with a stream with a WorkQueue
retention policy, its first/last sequence values could be reset
after a pull subscriber would have consumed all messages and
the server was restarted without a clean shutdown.
This would cause the pull subscriber to "stall" until enough
new messages are sent to reach a stream sequence that catches
up with the consumer's view of the stream first sequence prior
to the restart.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-20 19:08:08 -07:00
Ivan Kozlovic
1b8878138a [FIXED] JetStream: panic "could not decode consumer snapshot"
Resolves #2720

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-08 12:22:03 -07:00
Ivan Kozlovic
9f30bf00e0 [FIXED] Corrupted headers receiving from consumer with meta-only
When a consumer is configured with "meta-only" option, and the
stream was backed by a memory store, a memory corruption could
happen causing the application to receive corrupted headers.

Also replaced most of usage of `append(a[:0:0], a...)` to make
copies. This was based on this wiki:
https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F

But since Go 1.15, it is actually faster to call make+copy instead.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-01 10:50:15 -07:00
Derek Collison
e65f3d4a30 [FIXED #2706] - Only utilize full state with deleted details when really needed. Otherwise fast state will suffice.
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-29 10:50:28 -08:00
Derek Collison
63c4c23cae Needed to undo since we already recorded
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 14:09:52 -08:00
Derek Collison
7e615a1de9 Handle skip msgs better, do not update mb stats, clear erased bit
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 13:59:29 -08:00
Derek Collison
14469ccfc8 Fix for #2662.
Upon server restart a server would set the check expiration to the configured amount vs delta of next to expire.

Signed-off-by: Derek Collison <derek@nats.io>
2021-11-01 18:04:37 -07:00
Derek Collison
3a14a984fc Fix for a bug that did not properly decode redelivered state for consumers from a filestore.
This also caused state abnormalities in a user's setup so added code to clean up bad state as needed.

Signed-off-by: Derek Collison <derek@nats.io>
2021-10-28 08:33:48 -07:00
Derek Collison
cc4f802e09 Optimize compaction under heavy KV use
Signed-off-by: Derek Collison <derek@nats.io>
2021-10-26 08:39:22 -07:00
Derek Collison
06168083c7 Fix for #2622.
We were not escaping the top level iterator across message blocks when calculating when to break due to keep > 0.

Signed-off-by: Derek Collison <derek@nats.io>
2021-10-14 09:25:21 -07:00
Derek Collison
075e8c9070 Make sure wp is > len(cache.buf)
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-22 14:46:31 -07:00
Derek Collison
de851e513f Fix for #2548
Replicated durable consumers that were backed by a memory store were bypassing snapshotting which also did compaction of the raft WAL.
This change adapts for memory store backed consumers by compacting the raft WAL directly on snapshot logic.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-21 08:02:11 -07:00
Derek Collison
4283358dcd Improvments to writeIndexInfo logic and managing open FDs.
Also hold lock while doing sync and optionally close FDs if idle.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-19 11:45:16 -07:00