Commit Graph

6926 Commits

Author SHA1 Message Date
Derek Collison
c194047caf Bump to 2.9.16-RC.2
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 21:23:51 -07:00
Derek Collison
5e85889790 [IMPROVED] Improvements to preAcks. (#4006)
Better handling of multiple consumers so as to not delete messages too
early.
Better cleanup handling.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 21:08:34 -07:00
Derek Collison
8c0a45edf9 Make sure to lock on clearing if not removing.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 20:42:28 -07:00
Derek Collison
937ef0d2a6 Improvements to preAcks.
Better handling of multiple consumers so as to not delete too early.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 20:29:15 -07:00
Ivan Kozlovic
7862881351 Fixed some tests (#4005)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-03-30 15:45:16 -06:00
Ivan Kozlovic
a4df4f8727 Fixed some tests
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-03-30 15:02:59 -06:00
Derek Collison
5359d2323a [FIXED] Do not allow JetStream leaders to be placed on a lameduck server. (#4002)
Set existing and any new raft assets to observer mode while in a
lameduck state.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 20:32:44 -07:00
Derek Collison
4646f4af5d Do not allow any JetStream leaders to be placed on a lameduck server
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 20:15:41 -07:00
Derek Collison
873ab0f6b9 Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 18:55:41 -07:00
Derek Collison
fbc90adf93 Bump to 2.9.16-RC.1
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 17:21:57 -07:00
Derek Collison
02702e4620 [IMPROVEMENT] General stability and bug fixes. (#3999)
This PR has general improvements and fixes to filestore, raft, and the
clustering layer.

Summary

1. Additional support for preAck handling for interest based streams
when replicated acks arrive before the message itself.
2. Better handling when checking state to determine whether to remove an
interest based message.
3. Improved StepDown() and leadership transfer handling after restarts.
4. Improved voting logic for high load systems.
5. Various improvements and fixes for filestore Compact(), which is used
heavily in the raft layer when updating snapshots and the raft wal.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 17:09:44 -07:00
Derek Collison
c546828359 Moved log running test to NoRace suite
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 16:56:04 -07:00
Derek Collison
ade0e9d295 Snapshot meta for this function to use in case it gets removed out from underneath of us.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 16:51:17 -07:00
Derek Collison
9a714e7d7d Update based on review feedback
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 15:47:54 -07:00
Derek Collison
152b25c314 Update server/stream.go
Pre-allocate

Co-authored-by: Neil <neil@nats.io>
2023-03-29 15:29:51 -07:00
Derek Collison
c77872b519 Update server/jetstream_cluster.go
Pre-allocate

Co-authored-by: Neil <neil@nats.io>
2023-03-29 15:29:38 -07:00
Derek Collison
2b89fea9b0 Double check here if the jetstream cluster was shutdown when we released the lock
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 14:46:49 -07:00
Derek Collison
e274693490 On bad or corrupt message load during commit, reset WAL vs mark write error
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 14:07:14 -07:00
Derek Collison
6c3e64b83b Always make sure cluster and meta raft node available when needed
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 13:56:04 -07:00
Derek Collison
ddfa5cdfec Additional protection for bad state when rebuilding a message block
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:11 -07:00
Derek Collison
a9a4df859f Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:08 -07:00
Derek Collison
35d1a7747a Snapshots of no length can hold state as well
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:04 -07:00
Derek Collison
c4da37ecc7 Make sure consumer is valid and state was returned
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:01 -07:00
Derek Collison
e97ddcd14f Tweak tests due to changes, make test timeouts uniform.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:59 -07:00
Derek Collison
52fbac644c Since we no longer store leaderTransfers, which is proper, some tests were getting and advantage on that after server restart.
This change speeds up raft layer more to avoid timeouts.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:57 -07:00
Derek Collison
0d9f707b4b Additional tests to stress interest based streams with pull subscribers during rolling restarts.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:55 -07:00
Derek Collison
71af150448 General improvements to interest based stream processing when acks arrive before the actual msgs.
1. If we are retention based, make sure our consumers are running before entering into monitorStream logic.
2. If we skip messages and are interest based, make sure we check for a preAck state.
3. On finalization of recovery for consumers have them check against the interest based stream.
4. Do not process ack state updates if consumer is closed and shutting down.
5. When processing final state for a stream after upper layer catchup, check all attached consumers for ack skew.
6. During catchup of stream messages consult preAck state and skip messages as needed.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:53 -07:00
Derek Collison
5cabc365df General improvements around handling interest retention.
1. During ackMsg processing hold write lock to block concurrent access.
2. Check for presence of preAcks before and force removal if present.
3. Rework check for orphan msgs on startup to use checkStateForInterestStream().

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:51 -07:00
Derek Collison
e516c47a4b Improvements to consumers attached to an interest retention stream.
1. Do not process an ack if we are closed.
2. When checking for needing an ack for a given consumer, hold lock entire time.
3. During recovery and restarts we check if we need to replay acks to the parent stream.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:49 -07:00
Derek Collison
182bf6cbae Bug fixes and general stability improvements.
1. If reset ignore Applied() that are greater then our commit.
2. Improved StepDown() by placing at back of queue if preferred.
3. Improved handling of leadership transfer during StepDown().
4. Do not store EntryLeaderTransfer records on disk.
5. Remove un-needed processing of older terms.
6. If append entry has higher term, also inherit pterm.
7. Only inherit a candidate's term if we decide to vote for them.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:46 -07:00
Derek Collison
6d4304146f Bug fixes and general stability improvements.
1. Fixed a bug that would process a removal of a message after the message block was closed.
2. Improved removal of non-existant message when we know the store is empty.
3. Improved last write index size tracking when opening the file descriptor after being closed.
4. Improved Compact() by not loading messages for last block twice.
5. Improved Compact() determination of calling purge by determing last sequence under write lock.
6. Improved Compact() by only compacting underlying message block if over certain size threshold.
7. Improved Compact() by writing the index file if needed while still holding lock avoiding an unecessary re-lock.
8. Improved Compact() by not calling out to upper layers on no messages being purged.
9. Fixed a bug in Compact() that would not delete members from a block's delete map.
10. Fixed a bug in reset() when a callback was not registered (raft logs) which avoiding msg block cleanup.
11. Improved consumer store Update() call for when to avoid an outdated update.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:42 -07:00
Waldemar Quevedo
9cc66c0f32 Add vcsinfo when building with goreleaser (#3993)
Currently in Go, a release that is built via `go build main.go` will always be
labeled with its version as being `(devel)` (even if building from the
[git tag commit](https://github.com/golang/go/issues/50603)):

```sh
go version -m /usr/local/bin/nats-server  | grep nats-server/v2
	dep	github.com/nats-io/nats-server/v2	(devel)	

```

And in order to include the release version of the module in the binary it has to be
built using `go install`:

```sh
go install github.com/nats-io/nats-server/v2@v2.9.15 | grep nats-server/v2
	path	github.com/nats-io/nats-server/v2
	mod	github.com/nats-io/nats-server/v2	v2.9.15	h1:MuwEJheIwpvFgqvbs20W8Ish2azcygjf4Z0liVu2I4c=
```

This changes to build the package with `go build .` which is going to be enough to fix the trivy / grype issues.

This also adds the `trimpath` build flag to remove the filesystem paths
from where the binary was built.

This should help reducing some of the false positives from vulnerability
scanners which are not matching with the proper version of the binary as
in #3992 with a `malformed version` warning.

Fixes #3992
2023-03-28 09:00:59 -07:00
Waldemar Quevedo
1281ca690c Remove gomod proxy, build by installing package instead
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-03-28 08:16:33 -07:00
Waldemar Quevedo
f537b3e667 Use go mod proxy when building the release
This makes sure that the correct package version metadata
is included when inspected via `go version -m`.

Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-03-27 08:47:49 -07:00
Derek Collison
57daedafa8 Merge pull request #3986 from nats-io/neil/shutdownraftgroups
Shut down RAFT groups when disabling JetStream
2023-03-23 11:09:35 -07:00
Neil Twigg
8d5519356e Shut down RAFT groups when disabling JetStream
Signed-off-by: Neil Twigg <neil@nats.io>
2023-03-23 16:54:01 +00:00
Derek Collison
61556d90bd Merge pull request #3985 from nats-io/oor-raft
Only process out of resources condition from raft layer if err matches explicitly.
2023-03-23 08:39:15 -07:00
Derek Collison
ec89823e1c Only process out of resources condition from raft layer if err matches condition
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-23 08:13:22 -07:00
Derek Collison
cec1e15c4b Merge pull request #3983 from nats-io/pre-acks-test
Test for preAcks
2023-03-21 13:16:09 -07:00
Derek Collison
9ccd7abdf8 Test for preAcks
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-21 12:08:24 -07:00
Derek Collison
2551e6f6a8 Merge pull request #3981 from nats-io/f3-2
Improved publisher performance under some instances of asymmetric network latency clusters.
2023-03-20 21:32:30 -07:00
Derek Collison
ed9de4b0a1 Improved publisher performance under some instances of asymmetric network latency clusters on interest based streams.
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.

In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-20 20:53:45 -07:00
Derek Collison
3e0ce6e639 Merge pull request #3980 from nats-io/f3
[FIXED] Fixed an issue with consumer states growing and causing instability.
2023-03-19 11:55:26 -07:00
Derek Collison
0c1301ec14 Fix for data race
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-19 10:52:52 -07:00
Derek Collison
5a16f98427 Fixed an off by one bug that under certain circumstances could cause large consumer replica states.
This could lead to instability in the system.

The bug would manifest in replicated consumers when certain messages could be acked out of order, and, the pending list would never go to zero.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-19 10:41:59 -07:00
Derek Collison
3e8e0ea44a Merge pull request #3979 from nats-io/cores-snap
Remove snapshotting of cores and maxprocs.
2023-03-18 08:18:24 -07:00
Derek Collison
027f2e42c8 Remove snapshot of cores and maxprocs
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-17 15:09:50 -07:00
Derek Collison
f0e1585490 Fix flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-17 13:14:43 -07:00
Neil
1cfb1b0c3c Merge pull request #3978 from nats-io/neil/encfix
Don't recycle buffer more than once
2023-03-17 09:42:19 +00:00
Neil Twigg
4647e14b3e Don't recycle buffer more than once 2023-03-17 09:25:17 +00:00