Under certain situations large number of consumers that are racing to update state or delete their stores during a delete
would start taking up OS threads due to blocking disk IO. When this happened and their were a bunch of Go routines becoming
runnable the Go runtime would create extra OS threads to fill in the runnable pool and would exhaust the max thread setting.
This code places a channel as a simple semaphore to limit the number of disk IO blocking OS threads.
Signed-off-by: Derek Collison <derek@nats.io>
The filestore would release a msgBlock lock while trying to load a cache block if it thought it needed to flush pending data.
With async false, this should be very rare but was possible after careful inspection.
I constructed an artificial test with sleeps throughout the filestore code to reproduce.
It involved having 2 Go routines that were through and waiting on the last msg block, and another one that was writing.
After the write, but before we flushed after releasing the lock we would also artificially sleep.
This would lead to the second read seeing the cache load was already in progress and return no error.
If the load was for a sequence before the current write sequence, and async was false, the cache fseq would be higher than what was requested.
This would cause the errPartialCache to be returned.
Once returned to the consumer level in loopAndGather, it would exit that Go routine and the consumer would cease to function.
This change removed the unlock of a msgBlock to perform and flush, ensuring that two cacheLoads would not yield the errPartialCache.
I also updated the consumer in the case this does happen in the future to not exit the loopAndGather Go routine.
Signed-off-by: Derek Collison <derek@nats.io>
A low-level Filestore issue would cause a new block to be created
when the last block was empty, but the index for the new block
would not be forced to be written on disk.
The observed issue could be that with a stream with a WorkQueue
retention policy, its first/last sequence values could be reset
after a pull subscriber would have consumed all messages and
the server was restarted without a clean shutdown.
This would cause the pull subscriber to "stall" until enough
new messages are sent to reach a stream sequence that catches
up with the consumer's view of the stream first sequence prior
to the restart.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a consumer is configured with "meta-only" option, and the
stream was backed by a memory store, a memory corruption could
happen causing the application to receive corrupted headers.
Also replaced most of usage of `append(a[:0:0], a...)` to make
copies. This was based on this wiki:
https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F
But since Go 1.15, it is actually faster to call make+copy instead.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Upon server restart a server would set the check expiration to the configured amount vs delta of next to expire.
Signed-off-by: Derek Collison <derek@nats.io>
We were not escaping the top level iterator across message blocks when calculating when to break due to keep > 0.
Signed-off-by: Derek Collison <derek@nats.io>
Replicated durable consumers that were backed by a memory store were bypassing snapshotting which also did compaction of the raft WAL.
This change adapts for memory store backed consumers by compacting the raft WAL directly on snapshot logic.
Signed-off-by: Derek Collison <derek@nats.io>
When we triggered a filestore msg block compact we were not properly dealing with interior deletes.
Subsequent lookups past the skipped messages would cause an error and stop delivering messages.
Signed-off-by: Derek Collison <derek@nats.io>
Under load and pressure from concurrent publishing and consuming with multiple consumers the filestore would
return a partial or no cache error to the upper layers. For consumers this could result in us skipping a stream sequence when we should not.
This change stabilizes the filestore and removes the flush state for msg blocks. I also found some bugs that did not track last sequence properly
after snapshots / restore.
Signed-off-by: Derek Collison <derek@nats.io>
This change introduces utilization, better interior block deletes, and individual block compaction when we are below 50% utilization of the block.
Signed-off-by: Derek Collison <derek@nats.io>
During normal operation and quick restarts the number of expired messages per cycle is manageable and correct.
However if a server is shutdown for quite a long time and many messages have expired this process is too slow.
This commit introduces an optimized expiration tailored for startup vs running state.
Signed-off-by: Derek Collison <derek@nats.io>
When we created a filestore we would figure out if we should track by subject based on stream config.
This would cause bad results when a stream was updated to multiple subjects or wildcards.
This change tightens when and what we track but turns it on all the time now.
Signed-off-by: Derek Collison <derek@nats.io>
1. We were holding open FDs longer than we should for consumers causing issues with open FD limits. We now do not hold them open and cap updates a bit better.
2. When doing a stream delete, consumer delete was repeating alot of work that was not necessary, causing longer delays. This has been optimized a bit, still more improvements to be made.
3. We cover all JS under a single export, but that was also trapping GetNext for pull based consumers, and since this was a no-op (is handled at user account level) we were creating alot of garbage service import responses and reverse map entries that had to be garbage collected. We have a fix in to avoind this but still looking for a better one.
4. Still had some lingering references to all exports vs single JS export.
Signed-off-by: Derek Collison <derek@nats.io>
We optimized the filtered purge to skip msgBlks that are not in play.
Also optimized msgBlock buffer usage by using two sync.Pools to enhance reuse.
Signed-off-by: Derek Collison <derek@nats.io>
This supports XChaChaPoly1305 for Seal and Open and ChaCha20 for our message blocks which use highway hashes and sequence numbers for authenticity.
We support snapshot and restore as well.
Signed-off-by: Derek Collison <derek@nats.io>