Commit Graph

139 Commits

Author SHA1 Message Date
Derek Collison
9e9a9a082b When restoring a filestore with no key generator but it was encrypted, fail to restore.
Signed-off-by: Derek Collison <derek@nats.io>
2023-07-11 16:27:50 -07:00
Derek Collison
855e1bb14e Allow more tolerance for travis
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-23 14:24:00 -07:00
Derek Collison
2b2e22ed52 When creating a consumer on a stream with a very large number of msg blks, calculating numPending could be slow.
This aims to optimize a bit, more work to be done on streams with a very large (> 200k) number of msg blks.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-23 14:11:56 -07:00
Neil Twigg
e879a9fa0c Test MaxMsgs and MaxMsgsPer in combination
Signed-off-by: Neil Twigg <neil@nats.io>
2023-06-19 09:45:17 +01:00
Derek Collison
9999f63853 ConsumerFileStore could encode an empty state or update an empty state on startup.
We needed to make sure at the lowest level that the state was read from disk and not depend on upper layer consumer.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-26 15:48:10 -07:00
Maurice van Veen
1e2bba4c7b Fix FirstSeq not being updated with filestore when purging subject 2023-04-12 10:46:16 +02:00
Derek Collison
c546828359 Moved log running test to NoRace suite
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 16:56:04 -07:00
Derek Collison
182bf6cbae Bug fixes and general stability improvements.
1. If reset ignore Applied() that are greater then our commit.
2. Improved StepDown() by placing at back of queue if preferred.
3. Improved handling of leadership transfer during StepDown().
4. Do not store EntryLeaderTransfer records on disk.
5. Remove un-needed processing of older terms.
6. If append entry has higher term, also inherit pterm.
7. Only inherit a candidate's term if we decide to vote for them.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:46 -07:00
Derek Collison
5a16f98427 Fixed an off by one bug that under certain circumstances could cause large consumer replica states.
This could lead to instability in the system.

The bug would manifest in replicated consumers when certain messages could be acked out of order, and, the pending list would never go to zero.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-19 10:41:59 -07:00
Derek Collison
daacbf5580 Added optimized store NumPending() call.
Optimized and fixed a bug in filestore filteredPending().
Optimized memstore FilteredState().

Added comprehensive tests for NumPending() and FilteredState().

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-25 17:26:26 -08:00
Derek Collison
24c2f3b452 Improved performance of subjects details for stream info.
This version avoids all disk IO in the filestore version.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-24 17:22:18 -08:00
Derek Collison
3bc0af70d0 Only update per subject information if we know we have an update.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-13 20:12:35 +02:00
Derek Collison
0da2a150cc Make sure we adjust per subject info when doing a Compact().
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-10 07:21:02 +02:00
Neil Twigg
9e8a5bfa3b File store subtests 2023-02-08 09:55:49 +00:00
Derek Collison
9c02be2409 Various fixes for snapshots.
Due to bug, in rare circumstances could write an empty snapshot for aplied == 0. This would cause a spinning at the raft layer.

1. Allow Truncate() to also properly do a reset of the store when terms were only mismatch.
2. During testing fixed memstore truncate and also made sure per subject info was also cleaned up.
3. Then added fix to detect a bad snapshot on initialization and remove.
4. Do not allow snapshots for applied == 0.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-04 13:46:06 -08:00
Derek Collison
68b4570226 Fix for filtered state for all subjects when the first sequence(s) are deleted.
Discovered doing the optimizations for interior deletes.

Signed-off-by: Derek Collison <derek@nats.io>
2023-01-21 16:01:44 -08:00
Derek Collison
d9cb1e6286 Fix for #3734
When a msg blk was not writen correctly, but the idx file was, max bytes for a stream would no longer be honored since the deletion of any messages in that empty block were not being handled properly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-31 18:50:46 -08:00
Derek Collison
a85318bc76 Under the right circumstances Compact could leave a danling last msg block leading to error opening msg block file [""]: open : no such file or directory
When the compaction would not uplevel to a normal purge, but after completion all msg blocks were empty the mb.lmb was not cleared or reset properly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-15 06:59:54 -08:00
Derek Collison
1a55eb5a7a Fix for condition after restart where first seq was wrong and reported zero timestamp and expiration stopped working.
Signed-off-by: Derek Collison <derek@nats.io>
2022-12-14 07:19:05 -08:00
Marco Primi
f8a030bc4a Use testing.TempDir() where possible
Refactor tests to use go built-in temporary directory utility for tests.

Also avoid binding to default port (which may be in use)
2022-12-12 13:18:44 -08:00
Derek Collison
1fa5e73177 Honor MaxMsgsPerSubject when a stream config is updated, including enforcing a lower limit.
Signed-off-by: Derek Collison <derek@nats.io>
2022-10-31 17:25:20 -07:00
Derek Collison
6128b83507 On abnormal server exit, for streams or KV with max msgs per subject set we could recover more than N msgs per subject.
This fix allows for recover of correct state on restart when index files are missing or not current.

Signed-off-by: Derek Collison <derek@nats.io>
2022-10-26 16:00:57 -07:00
Derek Collison
d97abf0b61 On a write error we rebuild, so update accounting after flush write attempt.
If we had to flush while loading in subject info we would fail there and not properly rebuild state as we did on a failure for flushPending.

Signed-off-by: Derek Collison <derek@nats.io>
2022-10-18 11:39:42 -07:00
Ivan Kozlovic
57a3594355 [FIXED] JetStream: Purge with additional options may leave some messages
While going over message blocks, when some blocks were removed there
was a risk that a block is ignored, leaving some messages around.

Resolves #3528

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-10-06 13:21:20 -06:00
Derek Collison
7e1bc54389 Fix for #3848.
When a block's subject meta state was swapped out and subsequently loaded back in with only one subject present, but other messages with different subjects were added later, a filtered get could return the wrong result.

Signed-off-by: Derek Collison <derek@nats.io>
2022-09-22 04:57:05 -07:00
Ivan Kozlovic
29224c8ea9 Split more tests to speed up Travis run
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-09 12:45:48 -06:00
Ivan Kozlovic
b69ffe244e Fixed some tests
Code change:
- Do not start the processMirrorMsgs and processSourceMsgs go routine
if the server has been detected to be shutdown. This would otherwise
leave some go routine running at the end of some tests.
- Pass the fch and qch to the consumerFileStore's flushLoop otherwise
in some tests this routine could be left running.

Tests changes:
- Added missing defer NATS connection close
- Added missing defer server shutdown

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-08 11:28:23 -06:00
Derek Collison
a9790aa0fa Merge pull request #3409 from nats-io/cc-secure
[IMPROVED] Secure consumer create
2022-08-30 09:48:06 -07:00
Derek Collison
aa94a0bc0f New consumer create that allows elevation of stream and consumer names, and optional filter subject to the request subject.
Similar to changes in direct get allows proper security if needed for filter subject selection.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 09:29:38 -07:00
Derek Collison
5f0ecef6f3 When writing a msg after the fss state was expired we would count the msg twice.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 05:38:16 -07:00
Derek Collison
e837a255cf FSS state could skew on expire on recover with no msgs left.
Also added in sanity check on server start.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-29 17:34:28 -07:00
Derek Collison
7c1618f91c Try to dump any cached state including fss on recovery to avoid memory bloat on restart.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-24 17:41:57 -07:00
Derek Collison
ef71087d56 Fixed a bug that would not track per subject info for streams that were mirrors or sources.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-23 15:46:57 -07:00
Derek Collison
d48ccf4c5a When filestore is used for raft layer do not attempt to track subject metadata.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-17 13:46:13 -07:00
Derek Collison
827b34a77a Add support for AES cipher encryption for filestore.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-15 14:21:37 -07:00
Derek Collison
d7534dff5f Make sure when SubjectState is called we have loaded fss state.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-12 07:14:39 -05:00
Derek Collison
8c04adc009 Improvements to filestore for large KVs.
Use better indexing for lookups, we used to do simple linear scan backwards, now track first and last block.
Will expire the fss cache at will to reduce memory usage.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-09 15:51:13 -05:00
Ivan Kozlovic
3c9a7cc6e5 Move to Go 1.19, remote io/util, fix data race and a flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-05 09:55:37 -06:00
Derek Collison
717969510d Make sure to reset block encryption counter when clearing block but holding state for tracking sequences.
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-31 07:59:19 -07:00
Derek Collison
8dc1e4b6de When compact would reclaim head of block space, we needed to update block key for counter for new writes.
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-30 13:05:41 -07:00
Derek Collison
5e98263de8 General stability improvements
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-29 16:02:31 -07:00
Derek Collison
e120bb86a9 Update tests to check last seq
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-28 07:23:39 -07:00
Derek Collison
52f7765322 When msgs were expired on restart recovery we could lose track on subsequent restart of starting sequence with no additional activity.
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-23 17:15:16 -07:00
Derek Collison
3a10456e68 Short index write could lead to loss of stream sequence for empty stream
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-22 06:37:19 -07:00
Derek Collison
4291433a46 General improvements to accounting for the filestore. This in response to tracking issue #3114.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-12 15:43:11 -07:00
Derek Collison
b35988adf9 Remember the last timestamp by not removing last msgBlk when empty and during purge pull last timestamp forward until new messages arrive.
When a downstream stream uses retention modes that delete messages, fallback to timebased start time for the new source consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-09 09:04:19 -07:00
Ivan Kozlovic
34650e9dd5 Fixed data race and some flappers
Data race that has been seen:
```
Read at 0x00c00134bec0 by goroutine 159:
  github.com/nats-io/nats-server/v2/server.(*client).msgHeaderForRouteOrLeaf()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:2935 +0x254
  github.com/nats-io/nats-server/v2/server.(*client).processMsgResults()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4364 +0x2147
(...)
Previous write at 0x00c00134bec0 by goroutine 201:
  github.com/nats-io/nats-server/v2/server.(*Server).addRoute()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1475 +0xdb4
  github.com/nats-io/nats-server/v2/server.(*client).processRouteInfo()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:641 +0x1704
```

Also fixed some flappers and removed use of `s.js.` since we have
already captured `js` in Jsz monitoring.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-31 10:05:34 -06:00
Derek Collison
607858f213 Improved consumer snapshot logic in clustered mode and disk usage.
Also fixed a bug that could cause memory based replicated consumers to no longer work after snapshots and server restarts.

The snapshot logic would allow non-state changing updates to continously grow the raft logs. We also were too conservative on when we snapshotted and why.
Also added in ability to have FileStore.Compact() reclaim space from the block file from the head of last changed block.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-29 18:02:49 -07:00
Derek Collison
bd0a0b28c7 When recycling blocks we could potentially place partials into a tier. This would possibly cause the load code to thrash since it would not be big enough for a full block and we would need to recycle again and make a new one.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 16:46:46 -07:00
Derek Collison
5e5aab378e Additional improvements to memory pooling and management. Also logic fix for firstMatching that did unnecessary work when matching all.
During contention to the head write blk, the system could perform worse memory wise compared to simple go runtime.
Also had some references for the subject of messages bloating memory.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 10:15:23 -07:00