This aims to optimize a bit, more work to be done on streams with a very large (> 200k) number of msg blks.
Signed-off-by: Derek Collison <derek@nats.io>
We needed to make sure at the lowest level that the state was read from disk and not depend on upper layer consumer.
Signed-off-by: Derek Collison <derek@nats.io>
1. If reset ignore Applied() that are greater then our commit.
2. Improved StepDown() by placing at back of queue if preferred.
3. Improved handling of leadership transfer during StepDown().
4. Do not store EntryLeaderTransfer records on disk.
5. Remove un-needed processing of older terms.
6. If append entry has higher term, also inherit pterm.
7. Only inherit a candidate's term if we decide to vote for them.
Signed-off-by: Derek Collison <derek@nats.io>
This could lead to instability in the system.
The bug would manifest in replicated consumers when certain messages could be acked out of order, and, the pending list would never go to zero.
Signed-off-by: Derek Collison <derek@nats.io>
Optimized and fixed a bug in filestore filteredPending().
Optimized memstore FilteredState().
Added comprehensive tests for NumPending() and FilteredState().
Signed-off-by: Derek Collison <derek@nats.io>
Due to bug, in rare circumstances could write an empty snapshot for aplied == 0. This would cause a spinning at the raft layer.
1. Allow Truncate() to also properly do a reset of the store when terms were only mismatch.
2. During testing fixed memstore truncate and also made sure per subject info was also cleaned up.
3. Then added fix to detect a bad snapshot on initialization and remove.
4. Do not allow snapshots for applied == 0.
Signed-off-by: Derek Collison <derek@nats.io>
When a msg blk was not writen correctly, but the idx file was, max bytes for a stream would no longer be honored since the deletion of any messages in that empty block were not being handled properly.
Signed-off-by: Derek Collison <derek@nats.io>
When the compaction would not uplevel to a normal purge, but after completion all msg blocks were empty the mb.lmb was not cleared or reset properly.
Signed-off-by: Derek Collison <derek@nats.io>
If we had to flush while loading in subject info we would fail there and not properly rebuild state as we did on a failure for flushPending.
Signed-off-by: Derek Collison <derek@nats.io>
While going over message blocks, when some blocks were removed there
was a risk that a block is ignored, leaving some messages around.
Resolves#3528
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a block's subject meta state was swapped out and subsequently loaded back in with only one subject present, but other messages with different subjects were added later, a filtered get could return the wrong result.
Signed-off-by: Derek Collison <derek@nats.io>
Code change:
- Do not start the processMirrorMsgs and processSourceMsgs go routine
if the server has been detected to be shutdown. This would otherwise
leave some go routine running at the end of some tests.
- Pass the fch and qch to the consumerFileStore's flushLoop otherwise
in some tests this routine could be left running.
Tests changes:
- Added missing defer NATS connection close
- Added missing defer server shutdown
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Use better indexing for lookups, we used to do simple linear scan backwards, now track first and last block.
Will expire the fss cache at will to reduce memory usage.
Signed-off-by: Derek Collison <derek@nats.io>
When a downstream stream uses retention modes that delete messages, fallback to timebased start time for the new source consumers.
Signed-off-by: Derek Collison <derek@nats.io>
Data race that has been seen:
```
Read at 0x00c00134bec0 by goroutine 159:
github.com/nats-io/nats-server/v2/server.(*client).msgHeaderForRouteOrLeaf()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:2935 +0x254
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4364 +0x2147
(...)
Previous write at 0x00c00134bec0 by goroutine 201:
github.com/nats-io/nats-server/v2/server.(*Server).addRoute()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1475 +0xdb4
github.com/nats-io/nats-server/v2/server.(*client).processRouteInfo()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:641 +0x1704
```
Also fixed some flappers and removed use of `s.js.` since we have
already captured `js` in Jsz monitoring.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Also fixed a bug that could cause memory based replicated consumers to no longer work after snapshots and server restarts.
The snapshot logic would allow non-state changing updates to continously grow the raft logs. We also were too conservative on when we snapshotted and why.
Also added in ability to have FileStore.Compact() reclaim space from the block file from the head of last changed block.
Signed-off-by: Derek Collison <derek@nats.io>
During contention to the head write blk, the system could perform worse memory wise compared to simple go runtime.
Also had some references for the subject of messages bloating memory.
Signed-off-by: Derek Collison <derek@nats.io>