Commit Graph

516 Commits

Author SHA1 Message Date
Derek Collison
1ccc6dbf30 Bumped inflight updates to 16 and move one lock to rlock.
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 15:01:34 -07:00
Derek Collison
dba03dbc2f Optimizations to reduce contention for high connections in a JetStream enabled account with high API usage.
Several strategies which are listed below.

1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses an atomic.
3. Accessing the JetStream context no longer requires a server lock, uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does not.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-30 14:52:15 -07:00
Derek Collison
aeef0eff53 Add in warnings for filestore recover state if happy path fails.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 16:22:15 -07:00
Derek Collison
c5b98f5c79 Make server shutdown an atomic and check inside unsubscribe to avoid unnecessary work.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-26 17:53:58 -07:00
Derek Collison
7ce47fd182 Move server running state to atomic to avoid contention at NRG layer.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-25 11:18:15 -07:00
Derek Collison
e46f49f5d5 Make sure to issue warning on reset for bad state
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-25 09:04:54 -07:00
Waldemar Quevedo
89d33d960b Skip enabling direct gets if no commits
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-09-22 17:08:46 -07:00
Derek Collison
65e0fbfa51 Make install snapshot errors rate limited for when catching up
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-22 10:23:02 -07:00
Neil Twigg
1f9ddf2bbd Add Raft goroutine labels, tweak logging
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-16 11:15:06 +01:00
Derek Collison
3f80348a16 Fix for data race in accessing rg.node
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-12 07:43:14 -07:00
Derek Collison
9531611feb Add in utility to detect and delete any NRG orphans.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-11 19:15:12 -07:00
Derek Collison
7d041da3c8 Fix for datarace on clfs
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-10 11:07:27 -07:00
Neil Twigg
487f58f16e Consumers inherit limits for max_ack_pending and inactive_threshold from stream
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-01 10:54:11 +01:00
Derek Collison
8544cb7adf Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-22 20:04:59 -07:00
Derek Collison
ddb7f9f9d5 Fix for a peer-remove of an R1 that would brick the stream.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-22 17:45:19 -07:00
Derek Collison
bcf5da04e3 Merge branch 'main' into dev 2023-08-22 06:50:36 -07:00
Derek Collison
e5d208bf33 When moving streams, we could check too soon and be in a gap where the replica peer has not registered a catchup request.
This would cause us to think the replica was caughtup incorrectly and drop our leadership, which would cancel any cacthup requests.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 20:07:48 -07:00
Derek Collison
fb8525b713 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 15:55:00 -07:00
Neil Twigg
c437157c1f Recover in consumer assignment when asset already existed
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-17 23:22:10 +01:00
Neil Twigg
c0636d117f Tweak consumer replica scaling, add unit test for orphaned consumer subjects
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-17 15:27:29 +01:00
Neil Twigg
d7f76da597 Allow switching from limits-based to interest-based retention in stream update
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-09 11:46:49 +01:00
Tomasz Pietrek
54fe8cb14f Fix race in consumer create
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-08-08 09:16:44 +02:00
Tomasz Pietrek
d105e68c96 Add consumer api action for create and update
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-08-07 08:28:21 +02:00
Derek Collison
8079495903 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-04 10:15:35 -07:00
Derek Collison
081140ee67 When taking over make sure to sync and reset clfs for clustered streams.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-03 10:41:10 -07:00
Derek Collison
42752ec551 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-01 21:46:54 -07:00
Derek Collison
5c8db89506 Make sure we do not drift on accounting.
Three issues were found and resolved.

1. Purge replays after recovery could execute full purge.
2. Callback was registered without lock, which could lead to skew.
3. Cluster reset could stop stream store and recreate it, which could lead to double accounting.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-01 18:35:20 -07:00
Derek Collison
2696320207 When we encounter a bad snapshot, reset our raft state if we own it and return proper error.
Signed-off-by: Derek Collison <derek@nats.io>
2023-07-30 11:15:10 -07:00
Neil
b22cdf18fe Add support for re-encrypting streams with new key (#4296)
This adds a new `prev_key` field to the configuration file to allow
transitioning from one encryption key to another.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-27 10:10:08 +01:00
Derek Collison
9a8f846dbb Merge branch 'main' into dev 2023-07-26 22:22:34 -07:00
R.I.Pienaar
60e67ff9a5 Report correct consumer count in paged list response
Previously the Total in paged responses would always equal the
size of the first response this would stall paged clients after
the first page.

Now correctly sets the total so paging continues, improves the
test to verify these aspects of the report

Signed-off-by: R.I.Pienaar <rip@devco.net>
2023-07-27 07:52:24 +03:00
Neil Twigg
3df08c3f89 Add support for re-encrypting streams with new key
Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-26 14:04:28 +01:00
Waldemar Quevedo
bbfeb2a887 Fix typo on internal function
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-07-22 20:40:26 -07:00
Neil Twigg
1527000d1f Use crypto/rand.Read instead of math/rand.Read
As of Go 1.20, `math/rand.Read` is deprecated. In addition to that, it also
isn't recommended for use in combination with anything cryptographic.

I haven't replaced all `math/rand` with `crypto/rand` imports because there
are still some legitimate uses for the `math/rand` package in some places.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-13 12:04:58 +01:00
Derek Collison
cda7bcd389 Merge branch 'main' into dev 2023-07-12 09:06:44 -07:00
Derek Collison
9e9a9a082b When restoring a filestore with no key generator but it was encrypted, fail to restore.
Signed-off-by: Derek Collison <derek@nats.io>
2023-07-11 16:27:50 -07:00
Derek Collison
4d7cd26956 Add in support for segmented binary stream snapshots.
Streams with many interior deletes was causing issues due to the fact that the interior deletes were represented as a sorted []uint64.
This approach introduces 3 sub types of delete blocks, avl bitmask tree, a run length encoding, and the legacy format above.
We also take into account large interior deletes such that on receiving a snapshot we can skip things we already know about.

Signed-off-by: Derek Collison <derek@nats.io>
2023-07-03 08:41:33 -07:00
Neil Twigg
d2615b76f2 Annotate CPU and goroutine profiles with account/stream/consumer info
Signed-off-by: Neil Twigg <neil@nats.io>
2023-06-20 19:02:40 +01:00
Derek Collison
f342f6a758 Merge branch 'main' into dev 2023-06-05 14:13:18 -07:00
Derek Collison
2e2ac33920 [IMPROVED] When R1 consumers were recreated with the same name when they became inactive. (#4216)
When consumers were R1 and the same name was reused, server restarts
could try to cleanup old ones and effect the new ones. These changes
allow consumer name reuse more effectively during server restarts.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-05 14:04:53 -07:00
Derek Collison
df5df3ce99 Panic fixes (#4214)
- [ ] Link to issue, e.g. `Resolves #NNN`
 - [ ] Documentation added (if applicable)
 - [ ] Tests added
- [ ] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [ ] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [x] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

Resolves panics in the code.

### Changes proposed in this pull request:

 - This PR fixes some of the panics in the code
2023-06-05 13:02:05 -07:00
Derek Collison
4ac45ff6f3 When consumers were R1 and the same name was reused, server restarts could try to cleanup old ones and effect the new ones.
These changes allow consumer name reuse more effectively during server restarts.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-05 12:48:18 -07:00
Derek Collison
30d9dfd305 Merge branch 'main' into dev 2023-06-03 18:17:28 -07:00
Derek Collison
dee532495d Make sure to process extended purge operations correctly when being replayed on a restart.
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-03 17:49:45 -07:00
Derek Collison
238282d974 Fix some data races detected in internal testing
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-03 13:58:15 -07:00
Artem Seleznev
27a8b96ee3 different panic fixes
Signed-off-by: Artem Seleznev <seleznyov.artyom@gmail.com>
2023-06-02 13:19:22 +03:00
R.I.Pienaar
c24547eb4e Record the stream and consumer info timestamps (#4133)
This records the server time when info for streams and consumers are
created so that tools such as the nats cli can calculate time deltas for
last ack, last delivered and so forth in the context of the server
clock.

This will help aleviate problems with client devices experiencing clock
jitter that can show up in user interfaces as negative seconds since
last ack etc
2023-06-02 08:53:28 +03:00
Derek Collison
7e3f3f4908 Make health checks more consistent with stream health checks.
Check for closed state on leader change for consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-18 08:18:53 -07:00
Derek Collison
a8d7d3886e Make sure to delete the stream assignment node here
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-17 16:19:39 -07:00
Derek Collison
f3553791b1 Updates to stream reset logic.
1. When catching up do not try forever and if needed reset cluster state.
2. In checking if a stream is healthy check for node drift.
3. When restarting a stream make sure the current node is stopped.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-17 13:14:33 -07:00