Commit Graph

130 Commits

Author SHA1 Message Date
Derek Collison
daacbf5580 Added optimized store NumPending() call.
Optimized and fixed a bug in filestore filteredPending().
Optimized memstore FilteredState().

Added comprehensive tests for NumPending() and FilteredState().

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-25 17:26:26 -08:00
Derek Collison
24c2f3b452 Improved performance of subjects details for stream info.
This version avoids all disk IO in the filestore version.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-24 17:22:18 -08:00
Derek Collison
3bc0af70d0 Only update per subject information if we know we have an update.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-13 20:12:35 +02:00
Derek Collison
0da2a150cc Make sure we adjust per subject info when doing a Compact().
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-10 07:21:02 +02:00
Neil Twigg
9e8a5bfa3b File store subtests 2023-02-08 09:55:49 +00:00
Derek Collison
9c02be2409 Various fixes for snapshots.
Due to bug, in rare circumstances could write an empty snapshot for aplied == 0. This would cause a spinning at the raft layer.

1. Allow Truncate() to also properly do a reset of the store when terms were only mismatch.
2. During testing fixed memstore truncate and also made sure per subject info was also cleaned up.
3. Then added fix to detect a bad snapshot on initialization and remove.
4. Do not allow snapshots for applied == 0.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-04 13:46:06 -08:00
Derek Collison
68b4570226 Fix for filtered state for all subjects when the first sequence(s) are deleted.
Discovered doing the optimizations for interior deletes.

Signed-off-by: Derek Collison <derek@nats.io>
2023-01-21 16:01:44 -08:00
Derek Collison
d9cb1e6286 Fix for #3734
When a msg blk was not writen correctly, but the idx file was, max bytes for a stream would no longer be honored since the deletion of any messages in that empty block were not being handled properly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-31 18:50:46 -08:00
Derek Collison
a85318bc76 Under the right circumstances Compact could leave a danling last msg block leading to error opening msg block file [""]: open : no such file or directory
When the compaction would not uplevel to a normal purge, but after completion all msg blocks were empty the mb.lmb was not cleared or reset properly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-15 06:59:54 -08:00
Derek Collison
1a55eb5a7a Fix for condition after restart where first seq was wrong and reported zero timestamp and expiration stopped working.
Signed-off-by: Derek Collison <derek@nats.io>
2022-12-14 07:19:05 -08:00
Marco Primi
f8a030bc4a Use testing.TempDir() where possible
Refactor tests to use go built-in temporary directory utility for tests.

Also avoid binding to default port (which may be in use)
2022-12-12 13:18:44 -08:00
Derek Collison
1fa5e73177 Honor MaxMsgsPerSubject when a stream config is updated, including enforcing a lower limit.
Signed-off-by: Derek Collison <derek@nats.io>
2022-10-31 17:25:20 -07:00
Derek Collison
6128b83507 On abnormal server exit, for streams or KV with max msgs per subject set we could recover more than N msgs per subject.
This fix allows for recover of correct state on restart when index files are missing or not current.

Signed-off-by: Derek Collison <derek@nats.io>
2022-10-26 16:00:57 -07:00
Derek Collison
d97abf0b61 On a write error we rebuild, so update accounting after flush write attempt.
If we had to flush while loading in subject info we would fail there and not properly rebuild state as we did on a failure for flushPending.

Signed-off-by: Derek Collison <derek@nats.io>
2022-10-18 11:39:42 -07:00
Ivan Kozlovic
57a3594355 [FIXED] JetStream: Purge with additional options may leave some messages
While going over message blocks, when some blocks were removed there
was a risk that a block is ignored, leaving some messages around.

Resolves #3528

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-10-06 13:21:20 -06:00
Derek Collison
7e1bc54389 Fix for #3848.
When a block's subject meta state was swapped out and subsequently loaded back in with only one subject present, but other messages with different subjects were added later, a filtered get could return the wrong result.

Signed-off-by: Derek Collison <derek@nats.io>
2022-09-22 04:57:05 -07:00
Ivan Kozlovic
29224c8ea9 Split more tests to speed up Travis run
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-09 12:45:48 -06:00
Ivan Kozlovic
b69ffe244e Fixed some tests
Code change:
- Do not start the processMirrorMsgs and processSourceMsgs go routine
if the server has been detected to be shutdown. This would otherwise
leave some go routine running at the end of some tests.
- Pass the fch and qch to the consumerFileStore's flushLoop otherwise
in some tests this routine could be left running.

Tests changes:
- Added missing defer NATS connection close
- Added missing defer server shutdown

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-08 11:28:23 -06:00
Derek Collison
a9790aa0fa Merge pull request #3409 from nats-io/cc-secure
[IMPROVED] Secure consumer create
2022-08-30 09:48:06 -07:00
Derek Collison
aa94a0bc0f New consumer create that allows elevation of stream and consumer names, and optional filter subject to the request subject.
Similar to changes in direct get allows proper security if needed for filter subject selection.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 09:29:38 -07:00
Derek Collison
5f0ecef6f3 When writing a msg after the fss state was expired we would count the msg twice.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 05:38:16 -07:00
Derek Collison
e837a255cf FSS state could skew on expire on recover with no msgs left.
Also added in sanity check on server start.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-29 17:34:28 -07:00
Derek Collison
7c1618f91c Try to dump any cached state including fss on recovery to avoid memory bloat on restart.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-24 17:41:57 -07:00
Derek Collison
ef71087d56 Fixed a bug that would not track per subject info for streams that were mirrors or sources.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-23 15:46:57 -07:00
Derek Collison
d48ccf4c5a When filestore is used for raft layer do not attempt to track subject metadata.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-17 13:46:13 -07:00
Derek Collison
827b34a77a Add support for AES cipher encryption for filestore.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-15 14:21:37 -07:00
Derek Collison
d7534dff5f Make sure when SubjectState is called we have loaded fss state.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-12 07:14:39 -05:00
Derek Collison
8c04adc009 Improvements to filestore for large KVs.
Use better indexing for lookups, we used to do simple linear scan backwards, now track first and last block.
Will expire the fss cache at will to reduce memory usage.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-09 15:51:13 -05:00
Ivan Kozlovic
3c9a7cc6e5 Move to Go 1.19, remote io/util, fix data race and a flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-05 09:55:37 -06:00
Derek Collison
717969510d Make sure to reset block encryption counter when clearing block but holding state for tracking sequences.
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-31 07:59:19 -07:00
Derek Collison
8dc1e4b6de When compact would reclaim head of block space, we needed to update block key for counter for new writes.
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-30 13:05:41 -07:00
Derek Collison
5e98263de8 General stability improvements
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-29 16:02:31 -07:00
Derek Collison
e120bb86a9 Update tests to check last seq
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-28 07:23:39 -07:00
Derek Collison
52f7765322 When msgs were expired on restart recovery we could lose track on subsequent restart of starting sequence with no additional activity.
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-23 17:15:16 -07:00
Derek Collison
3a10456e68 Short index write could lead to loss of stream sequence for empty stream
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-22 06:37:19 -07:00
Derek Collison
4291433a46 General improvements to accounting for the filestore. This in response to tracking issue #3114.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-12 15:43:11 -07:00
Derek Collison
b35988adf9 Remember the last timestamp by not removing last msgBlk when empty and during purge pull last timestamp forward until new messages arrive.
When a downstream stream uses retention modes that delete messages, fallback to timebased start time for the new source consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-09 09:04:19 -07:00
Ivan Kozlovic
34650e9dd5 Fixed data race and some flappers
Data race that has been seen:
```
Read at 0x00c00134bec0 by goroutine 159:
  github.com/nats-io/nats-server/v2/server.(*client).msgHeaderForRouteOrLeaf()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:2935 +0x254
  github.com/nats-io/nats-server/v2/server.(*client).processMsgResults()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4364 +0x2147
(...)
Previous write at 0x00c00134bec0 by goroutine 201:
  github.com/nats-io/nats-server/v2/server.(*Server).addRoute()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1475 +0xdb4
  github.com/nats-io/nats-server/v2/server.(*client).processRouteInfo()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:641 +0x1704
```

Also fixed some flappers and removed use of `s.js.` since we have
already captured `js` in Jsz monitoring.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-31 10:05:34 -06:00
Derek Collison
607858f213 Improved consumer snapshot logic in clustered mode and disk usage.
Also fixed a bug that could cause memory based replicated consumers to no longer work after snapshots and server restarts.

The snapshot logic would allow non-state changing updates to continously grow the raft logs. We also were too conservative on when we snapshotted and why.
Also added in ability to have FileStore.Compact() reclaim space from the block file from the head of last changed block.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-29 18:02:49 -07:00
Derek Collison
bd0a0b28c7 When recycling blocks we could potentially place partials into a tier. This would possibly cause the load code to thrash since it would not be big enough for a full block and we would need to recycle again and make a new one.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 16:46:46 -07:00
Derek Collison
5e5aab378e Additional improvements to memory pooling and management. Also logic fix for firstMatching that did unnecessary work when matching all.
During contention to the head write blk, the system could perform worse memory wise compared to simple go runtime.
Also had some references for the subject of messages bloating memory.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 10:15:23 -07:00
Derek Collison
ef8f543ea5 Improve memory usage through JetStream storage layer.
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.

The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.

Also fixed some flappers and a bug where de-dupe might not be reformed correctly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Ivan Kozlovic
b4128693ed Ensure file path is correct during stream restore
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.

Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 13:31:51 -07:00
Derek Collison
3216eb5ee5 When a consumer has no state we are now compacting the log, but were not snapshotting.
This caused issues on leader change and losing quorum.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-09 07:21:25 -05:00
Derek Collison
8fce45dfa7 Under certain scenarios the pending for a consumer could appear to get stuck.
Under the covers we were calculating pending per msg block incorrectly when a single message existed beyond the requested sequence.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-01 12:17:08 -08:00
Derek Collison
d486c24199 Allow a consumer to be configured with BackOffs.
This allows a consumer to have exponential backoffs vs static AckWait and MaxDeliver.
When BackOff is set it will overridde AckWait to BackOff[0] and MaxDeliver will be len(BackOff)+1.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 14:57:36 -08:00
Derek Collison
89b94ae650 Improved selectMsgBlock with lots of messages. Also have fetchMsg return hint about clearing cache.
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-22 17:45:12 -08:00
Ivan Kozlovic
7c3c9ef1ee [FIXED] JetStream: stream first/last sequence possibly reset
A low-level Filestore issue would cause a new block to be created
when the last block was empty, but the index for the new block
would not be forced to be written on disk.

The observed issue could be that with a stream with a WorkQueue
retention policy, its first/last sequence values could be reset
after a pull subscriber would have consumed all messages and
the server was restarted without a clean shutdown.
This would cause the pull subscriber to "stall" until enough
new messages are sent to reach a stream sequence that catches
up with the consumer's view of the stream first sequence prior
to the restart.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-20 19:08:08 -07:00
Ivan Kozlovic
1b8878138a [FIXED] JetStream: panic "could not decode consumer snapshot"
Resolves #2720

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-08 12:22:03 -07:00
Derek Collison
3a14a984fc Fix for a bug that did not properly decode redelivered state for consumers from a filestore.
This also caused state abnormalities in a user's setup so added code to clean up bad state as needed.

Signed-off-by: Derek Collison <derek@nats.io>
2021-10-28 08:33:48 -07:00