Commit Graph

181 Commits

Author SHA1 Message Date
Derek Collison
420a2ef514 When rebuilding the complete state need to do this in a go routine.
We did this properly above but forgot this one.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-12 20:19:45 -08:00
Derek Collison
c5fbb63614 JetStream ephemeral consumers could create a situation where the server would exhaust the OS thread limit - default 10k.
Under certain situations large number of consumers that are racing to update state or delete their stores during a delete
would start taking up OS threads due to blocking disk IO. When this happened and their were a bunch of Go routines becoming
runnable the Go runtime would create extra OS threads to fill in the runnable pool and would exhaust the max thread setting.

This code places a channel as a simple semaphore to limit the number of disk IO blocking OS threads.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-29 07:05:34 -08:00
Derek Collison
b7c61cd0bf Stabilize filstore to eliminate sporadic errPartialCache errors under certain situations. Related to #2732
The filestore would release a msgBlock lock while trying to load a cache block if it thought it needed to flush pending data.
With async false, this should be very rare but was possible after careful inspection.

I constructed an artificial test with sleeps throughout the filestore code to reproduce.
It involved having 2 Go routines that were through and waiting on the last msg block, and another one that was writing.
After the write, but before we flushed after releasing the lock we would also artificially sleep.
This would lead to the second read seeing the cache load was already in progress and return no error.
If the load was for a sequence before the current write sequence, and async was false, the cache fseq would be higher than what was requested.
This would cause the errPartialCache to be returned.

Once returned to the consumer level in loopAndGather, it would exit that Go routine and the consumer would cease to function.

This change removed the unlock of a msgBlock to perform and flush, ensuring that two cacheLoads would not yield the errPartialCache.

I also updated the consumer in the case this does happen in the future to not exit the loopAndGather Go routine.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-27 09:54:02 -08:00
Derek Collison
89b94ae650 Improved selectMsgBlock with lots of messages. Also have fetchMsg return hint about clearing cache.
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-22 17:45:12 -08:00
Ivan Kozlovic
7c3c9ef1ee [FIXED] JetStream: stream first/last sequence possibly reset
A low-level Filestore issue would cause a new block to be created
when the last block was empty, but the index for the new block
would not be forced to be written on disk.

The observed issue could be that with a stream with a WorkQueue
retention policy, its first/last sequence values could be reset
after a pull subscriber would have consumed all messages and
the server was restarted without a clean shutdown.
This would cause the pull subscriber to "stall" until enough
new messages are sent to reach a stream sequence that catches
up with the consumer's view of the stream first sequence prior
to the restart.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-20 19:08:08 -07:00
Ivan Kozlovic
1b8878138a [FIXED] JetStream: panic "could not decode consumer snapshot"
Resolves #2720

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-08 12:22:03 -07:00
Ivan Kozlovic
9f30bf00e0 [FIXED] Corrupted headers receiving from consumer with meta-only
When a consumer is configured with "meta-only" option, and the
stream was backed by a memory store, a memory corruption could
happen causing the application to receive corrupted headers.

Also replaced most of usage of `append(a[:0:0], a...)` to make
copies. This was based on this wiki:
https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F

But since Go 1.15, it is actually faster to call make+copy instead.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-01 10:50:15 -07:00
Derek Collison
e65f3d4a30 [FIXED #2706] - Only utilize full state with deleted details when really needed. Otherwise fast state will suffice.
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-29 10:50:28 -08:00
Derek Collison
63c4c23cae Needed to undo since we already recorded
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 14:09:52 -08:00
Derek Collison
7e615a1de9 Handle skip msgs better, do not update mb stats, clear erased bit
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 13:59:29 -08:00
Derek Collison
14469ccfc8 Fix for #2662.
Upon server restart a server would set the check expiration to the configured amount vs delta of next to expire.

Signed-off-by: Derek Collison <derek@nats.io>
2021-11-01 18:04:37 -07:00
Derek Collison
3a14a984fc Fix for a bug that did not properly decode redelivered state for consumers from a filestore.
This also caused state abnormalities in a user's setup so added code to clean up bad state as needed.

Signed-off-by: Derek Collison <derek@nats.io>
2021-10-28 08:33:48 -07:00
Derek Collison
cc4f802e09 Optimize compaction under heavy KV use
Signed-off-by: Derek Collison <derek@nats.io>
2021-10-26 08:39:22 -07:00
Derek Collison
06168083c7 Fix for #2622.
We were not escaping the top level iterator across message blocks when calculating when to break due to keep > 0.

Signed-off-by: Derek Collison <derek@nats.io>
2021-10-14 09:25:21 -07:00
Derek Collison
075e8c9070 Make sure wp is > len(cache.buf)
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-22 14:46:31 -07:00
Derek Collison
de851e513f Fix for #2548
Replicated durable consumers that were backed by a memory store were bypassing snapshotting which also did compaction of the raft WAL.
This change adapts for memory store backed consumers by compacting the raft WAL directly on snapshot logic.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-21 08:02:11 -07:00
Derek Collison
4283358dcd Improvments to writeIndexInfo logic and managing open FDs.
Also hold lock while doing sync and optionally close FDs if idle.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-19 11:45:16 -07:00
Derek Collison
7a4c904761 Improvements to cache management.
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-18 15:21:12 -07:00
Derek Collison
620b56e12f During compaction the cache may not be loaded completely if msg block was lmb (active writing).
This could lead to the filtered subject state being incorrect.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-13 14:36:50 -07:00
Derek Collison
f75371022d Fix for issue #2488.
When we triggered a filestore msg block compact we were not properly dealing with interior deletes.
Subsequent lookups past the skipped messages would cause an error and stop delivering messages.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-09 09:53:22 -07:00
Derek Collison
2b2c4ba4a6 Bump Go test timeout
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-07 08:20:54 -07:00
Derek Collison
29eaa9c614 Fixed bug that could lead to perceived message loss.
Under load and pressure from concurrent publishing and consuming with multiple consumers the filestore would
return a partial or no cache error to the upper layers. For consumers this could result in us skipping a stream sequence when we should not.

This change stabilizes the filestore and removes the flush state for msg blocks. I also found some bugs that did not track last sequence properly
after snapshots / restore.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-05 16:36:23 -07:00
Derek Collison
4b97f98d18 Merge pull request #2467 from nats-io/slow_encrypt
Do not use crypto rand for nonce generation.
2021-08-25 14:09:27 -07:00
Derek Collison
ba4937f04e The slowdown was due to trying top expire messages without a proper index info.
So now we read and encrypt index info in place as well.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-25 13:22:18 -07:00
Derek Collison
4a6f1b4819 Do not use crypto rand for nonce generation.
Crypto rand is not needed for nonce generation and could drain entropy.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-24 12:51:13 -07:00
Derek Collison
752fd295a5 Consumer num pending fixes for multiple matches and merging.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-24 07:52:29 -07:00
Derek Collison
12c912d7f4 Only compact when msg is not first.
Make sure compact works with snapshots.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-20 06:47:53 -07:00
Derek Collison
ea040b77ef Updates based on feedback
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-19 19:04:36 -07:00
Derek Collison
d349edeeb6 When a JetStream stream was used as a KV, there could be times where we have lots of file storage unused.
This change introduces utilization, better interior block deletes, and individual block compaction when we are below 50% utilization of the block.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-19 18:24:41 -07:00
Derek Collison
a5afa86790 Merge pull request #2453 from nats-io/encrypt-checks
Add in additional checks for failures during filestore encryption.
2021-08-17 14:55:41 -07:00
Derek Collison
a7cf0ad985 Add in additional checks for failures during filestore encryption.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-17 14:08:50 -07:00
Derek Collison
6871d1240b When we expired all messages on a restart we did not properly setup lmb.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-17 13:45:50 -07:00
Derek Collison
b517229c32 Fix for #2417
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-06 14:44:00 -07:00
Derek Collison
398ae95a4a Various bug fixes and improvements to filestore consumer stores.
Improved behavior around clustered persistent consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-03 22:17:49 -07:00
Derek Collison
22af59f758 Merge pull request #2387 from nats-io/expire-on-start-fixes
[IMPROVED] Server restart time with many expired JetStream messages.
2021-07-30 12:49:50 -07:00
Derek Collison
4e92b0ed6e When a server was restarting, if a stream had a MaxAge and there were a very large amount of messages to expire, this would take too long.
During normal operation and quick restarts the number of expired messages per cycle is manageable and correct.
However if a server is shutdown for quite a long time and many messages have expired this process is too slow.

This commit introduces an optimized expiration tailored for startup vs running state.

Signed-off-by: Derek Collison <derek@nats.io>
2021-07-30 12:48:47 -07:00
Derek Collison
9b0158daf9 Allow delivery policy of DeliverLastPerSubject, which is helpful for scoped watchers for K/V.
Signed-off-by: Derek Collison <derek@nats.io>
2021-07-28 12:49:02 -07:00
Derek Collison
960c45df81 Use of sync.Pool for filestore could cause msg corruption.
Signed-off-by: Derek Collison <derek@nats.io>
2021-07-06 08:41:01 -07:00
Derek Collison
c2c146c9f2 Fix for #2329.
When we created a filestore we would figure out if we should track by subject based on stream config.
This would cause bad results when a stream was updated to multiple subjects or wildcards.
This change tightens when and what we track but turns it on all the time now.

Signed-off-by: Derek Collison <derek@nats.io>
2021-06-30 19:10:31 -07:00
Derek Collison
35f6be2056 If dirty flag set always write state out
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-29 12:25:28 -07:00
Derek Collison
99fed910f0 Improvements to large numbers of JetStream R1 consumers per stream.
1. We were holding open FDs longer than we should for consumers causing issues with open FD limits. We now do not hold them open and cap updates a bit better.

2. When doing a stream delete, consumer delete was repeating alot of work that was not necessary, causing longer delays. This has been optimized a bit, still more improvements to be made.

3. We cover all JS under a single export, but that was also trapping GetNext for pull based consumers, and since this was a no-op (is handled at user account level) we were creating alot of garbage service import responses and reverse map entries that had to be garbage collected. We have a fix in to avoind this but still looking for a better one.

4. Still had some lingering references to all exports vs single JS export.

Signed-off-by: Derek Collison <derek@nats.io>
2021-06-29 05:45:55 -07:00
Derek Collison
bb84ef7d91 Added ability to match based on last expected sequence per subject.
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-28 10:57:50 -07:00
Derek Collison
c0e47966ab Added in Stream get last message by subject.
This is to aid in K/V overlay for simple Get ops vs creating a full blown consumer.

Signed-off-by: Derek Collison <derek@nats.io>
2021-06-24 13:21:39 -07:00
Derek Collison
6bbc29281c Make sure to return tmp bufs to pool when we can
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-22 16:11:21 -07:00
Derek Collison
b3753aba1b Improvements to filtered purge and general memory use for filestore.
We optimized the filtered purge to skip msgBlks that are not in play.
Also optimized msgBlock buffer usage by using two sync.Pools to enhance reuse.

Signed-off-by: Derek Collison <derek@nats.io>
2021-06-22 15:47:26 -07:00
Derek Collison
7739eae45a Merge pull request #2302 from nats-io/js-encryption
JetStream Encryption at Rest
2021-06-22 10:35:02 -07:00
Derek Collison
2e145196b8 Fix for extended purge by sequence.
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-22 07:38:30 -07:00
Derek Collison
bf6335dff9 Add in ability to have encrypted JetStream filestores.
This supports XChaChaPoly1305 for Seal and Open and ChaCha20 for our message blocks which use highway hashes and sequence numbers for authenticity.
We support snapshot and restore as well.

Signed-off-by: Derek Collison <derek@nats.io>
2021-06-21 19:28:10 -07:00
Derek Collison
89d930fd0f Updates and fixes to PurgeEx
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-20 10:34:27 -07:00
Derek Collison
9398c3ca28 Allow for more advanced purge operations that filter by subject, specify the sequence or number of messages to keep.
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-19 07:04:44 -07:00