Commit Graph

7263 Commits

Author SHA1 Message Date
Neil Twigg
e879a9fa0c Test MaxMsgs and MaxMsgsPer in combination
Signed-off-by: Neil Twigg <neil@nats.io>
2023-06-19 09:45:17 +01:00
Derek Collison
f5b06100d3 Added in another DQ test across leafnodes. (#4250)
This test has multiple leafnode connections to different accounts and to
a shared account to make sure behavior is correct.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-16 13:55:45 -07:00
Derek Collison
e8094c9f33 Make utility funcs helpers
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-16 12:07:21 -07:00
Derek Collison
b3f913237c Added in another DQ test across leafnodes.
This test has multiple leafnode connections to different accounts and to a shared account to make sure behavior is correct.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-16 11:54:09 -07:00
Derek Collison
4a1b281412 [IMPROVED] High CPU and Memory usage on replicated mirrors with very high starting sequence. (#4249)
When creating replicated mirrors where the source stream had a very
large starting sequence number, the server would use excessive CPU and
Memory.

This is due to the mirroring functionality trying to skip messages when
it detects a gap. In a replicated stream this puts excessive stress on
the raft system.

This step is not needed at all if the mirror stream has no messages, we
can simply jump ahead.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-15 17:48:32 -07:00
Derek Collison
087a28a13e When creating replicated mirrors where the source stream had a very large starting sequence number, the server would use excessive CPU and Memory.
This is due to the mirroring functionality trying to skip messages when it detects a gap. In a replicated stream this puts excessive stress on the raft system.
This step is not needed at all if the mirror stream has no messages, we can simply jump ahead.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-15 17:20:15 -07:00
Derek Collison
367d857612 Bump v2.9.19-beta.1 (#4245) 2023-06-13 13:50:13 -07:00
Byron Ruth
bbf24a6d98 Bump v2.9.19-beta.1
Signed-off-by: Byron Ruth <byron@nats.io>
2023-06-13 15:48:05 -04:00
Derek Collison
166eeb243c Release v2.9.18 (#4242) 2023-06-13 12:41:10 -07:00
Byron Ruth
af805b57a4 Release v2.9.18
Signed-off-by: Byron Ruth <byron@nats.io>
2023-06-13 15:19:20 -04:00
Derek Collison
876cb6d837 [UPDATED] go mod tidy to update go.sum (#4240)
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-12 17:31:16 -07:00
Derek Collison
232294c3af go mod tidy to update go.sum
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-12 17:16:35 -07:00
Derek Collison
378d5a944e Bump client version to v1.27.0 (#4239)
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-06-12 14:11:24 -07:00
Tomasz Pietrek
13bf12ce64 Bump client version to v1.27.0
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-06-12 22:06:08 +02:00
Derek Collison
1d00ea4fa0 Bump to 2.9.18-beta.3
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-11 13:11:32 -07:00
Derek Collison
ab2ac20fe1 [UPDATED] Dependencies (#4236)
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-11 13:09:03 -07:00
Derek Collison
0980384c97 Update dependencies
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-11 12:25:44 -07:00
Derek Collison
860c481f0f [IMPROVED] Optimize statsz locking and sending in standalone mode. (#4235)
If we know we are in stand alone mode only send out statsz updates if we
know we have external interest.

Signed-off-by: Derek Collison <derek@nats.io>

Resolves: #4234
2023-06-10 20:55:43 -07:00
Derek Collison
11963e51fe Optimize statsz locking and only send if we know we have external interest.
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-10 20:25:05 -07:00
Derek Collison
aae218fe77 [IMPROVED] Only enable JetStream account updates in clustered mode. (#4233)
If we know we are in stand alone mode we do not need to run the updates
for JetStream account resources updates.

Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4227 (Partial)
2023-06-10 17:03:11 -07:00
Derek Collison
783e9491b1 [IMPROVED] Last msg lookup (KV Get) when subject is a literal subject (#4232)
When messages were very small and the key space was very large the
performance of last message gets in the store layer (both file and
memory) would degrade.

If the subject is literal we can optimize and avoid sequence scans that
are needed when multiple subject states need to be considered.

Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4221
2023-06-10 15:23:11 -07:00
Derek Collison
a5de25f213 Only enable JetStream account updates in clustered mode.
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-10 15:21:55 -07:00
Derek Collison
f81bc80676 Improve last msg lookup (KV Get) when subject is a literal subject.
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-10 13:16:13 -07:00
Derek Collison
8c513ad2c2 [FIXED] DQ weighted subscribers across separate leafnode connections. (#4231)
Fix for properly distributed queue requests over multiple leafnode
connections.
When a leafnode server joins two accounts in a supercluster, we want to
make sure that each connection properly takes into account the weighted
number of subscribers in each account.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-09 17:50:17 -07:00
Derek Collison
2765e534eb Fix test and update copyright
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-09 15:09:15 -07:00
Derek Collison
ce2dcd3394 Fix for properly distributed queue requests over multiple leafnode connections.
When a leafnode server joins two accounts in a supercluster, we want to make sure that each connection properly takes into account the weighted number of subscribers in each account.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-09 14:43:59 -07:00
Derek Collison
13b6e2613c Reduce messages in chaos tests (#4229)
It doesn't really appear as though, for what these tests are trying to
prove, that an excessively large number of messages is required. Instead
let's drop the count a little in the hope that they run a bit faster.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-06-09 13:59:35 -07:00
Derek Collison
81154c40f5 Bump to 2.9.18-beta.2
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-09 09:29:48 -07:00
Neil Twigg
7de3568f39 Reduce messages in chaos tests
It doesn't really appear as though, for what these tests are trying to
prove, that an excessively large number of messages is required. Instead
let's drop the count a little in the hope that they run a bit faster.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-06-09 17:07:53 +01:00
Neil
9d2ae42956 [FIXED] Performance issues with checkAckFloor with large first stream sequence. (#4226)
Bail early if new consumer, meaning stream sequence floor is 0.
Decide which linear space to scan.
Do no work if no pending and we just need to adjust which we do at the
end.

Also realized some tests were named wrong and were not being run, or
were in wrong file.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-09 09:27:52 +01:00
Derek Collison
9eeffbcf56 Fix performance issues with checkAckFloor.
Bail early if new consumer, meaning stream sequence floor is 0.
Decide which linear space to scan.
Do no work if no pending and we just need to adjust which we do at the end.

Also realized some tests were named wrong and were not being run, or were in wrong file.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-08 18:45:03 -07:00
Derek Collison
d3bde2c6e1 Send peer state when adding peers (#4224)
Currently `UpdateKnownPeers` doesn't send a peer state when a single
peer add operation is taking place, but it seems like this can
potentially race when there are lots of changes to the replica count
happening in rapid succession. Sending the peer state in all cases seems
to fix this issue and, so far in my testing, fixes the failground stream
update replicas test.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-06-08 09:31:49 -07:00
Neil Twigg
6d9955d212 Send peer state when adding peers
Signed-off-by: Neil Twigg <neil@nats.io>
2023-06-08 15:25:18 +01:00
Derek Collison
c85db15a2c Update Go to 1.19.10 (#4223) 2023-06-08 04:39:20 -07:00
Byron Ruth
3ca99adcca Update Go to 1.19.10
Signed-off-by: Byron Ruth <byron@nats.io>
2023-06-08 07:10:27 -04:00
Derek Collison
779978d817 Extended replay leafnode test to confirm mirror functionality
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-07 14:01:43 -07:00
Derek Collison
822ad00d50 Bump to 2.9.18-beta.1
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-05 14:14:35 -07:00
Derek Collison
2e2ac33920 [IMPROVED] When R1 consumers were recreated with the same name when they became inactive. (#4216)
When consumers were R1 and the same name was reused, server restarts
could try to cleanup old ones and effect the new ones. These changes
allow consumer name reuse more effectively during server restarts.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-05 14:04:53 -07:00
Derek Collison
df5df3ce99 Panic fixes (#4214)
- [ ] Link to issue, e.g. `Resolves #NNN`
 - [ ] Documentation added (if applicable)
 - [ ] Tests added
- [ ] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [ ] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [x] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

Resolves panics in the code.

### Changes proposed in this pull request:

 - This PR fixes some of the panics in the code
2023-06-05 13:02:05 -07:00
Derek Collison
4ac45ff6f3 When consumers were R1 and the same name was reused, server restarts could try to cleanup old ones and effect the new ones.
These changes allow consumer name reuse more effectively during server restarts.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-05 12:48:18 -07:00
Nikita Mochalov
5141b87dff Refactor code 2023-06-05 22:42:28 +03:00
Nikita Mochalov
4c181bc99a Use sentinel error 2023-06-05 22:41:09 +03:00
Nikita Mochalov
f71c49511b Fix client panic on absent server field 2023-06-05 15:27:45 +03:00
Derek Collison
64e3bf82ed Fix PurgeEx replay with sequence & keep succeeds (#4213)
PR https://github.com/nats-io/nats-server/pull/4212 fixed the issue I
reported in https://github.com/nats-io/nats-server/issues/4196.

However, I believe there might be a bug when both `sequence` and `keep`
are set during recovery.
In the `PurgeEx` the following check is done (for both `filestore.go`
and `memstore.go`):
```go
	if sequence > 1 && keep > 0 {
		return 0, ErrPurgeArgMismatch
	}
```

The `TestJetStreamClusterPurgeExReplayAfterRestart` also triggers this
case, meaning that during the test this error is returned but it
succeeds because the purge was already performed. Is this intended
behaviour?

To elaborate a bit more, I believe the following happens:
- when running the purge normally it will properly run the `keep` (since
it's not combined with `sequence` yet)
- when replaying the purge though, the `sequence` is added to the
`keep`, which errors out in the above if

Which means that during normal operation all will be well, but purges
with `keep` will be ignored upon replaying.

I'm proposing to remove the `sequence > 1 && keep > 0` check and
subsequent error. Which, for reference, was introduced in
https://github.com/nats-io/nats-server/pull/3121.
Hoping this ensures that during recovery, purges that haven't executed
yet will still be executed.

An alternative approach, which wouldn't remove the error: not allow
combining `sequence` and `keep` normally and only allowing it during
recovery. Which would preserve the current behaviour, and correctly
apply `sequence+keep` during recovery still. However, not sure if it's
possible to know if we're in "recovery mode" from within `PurgeEx`.

Resolves https://github.com/nats-io/nats-server/issues/4196
2023-06-04 13:29:53 -07:00
Maurice van Veen
132567de39 Fix PurgeEx replay with sequence & keep succeeds 2023-06-04 11:56:28 +02:00
Derek Collison
e1f8064e9e [FIXED] Make sure to process extended purge operations correctly when being replayed. (#4212)
This is an extension to the excellent work by @MauriceVanVeen and his
original PR #4197 to fully resolve for all use cases.

Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4196
2023-06-03 18:12:22 -07:00
Derek Collison
dee532495d Make sure to process extended purge operations correctly when being replayed on a restart.
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-03 17:49:45 -07:00
Derek Collison
eb09ddd73a [FIXED] Killed server on restart could render encrypted stream unrecoverable (#4210)
When a server was killed on restart before an encrypted stream was
recovered the keyfile was removed and could cause the stream to not be
recoverable.

We only needed to delete the key file when converting ciphers and right
before we add the stream itself.

Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4195
2023-06-03 17:36:10 -07:00
Derek Collison
449b429b58 [FIXED] Data races detected in internal testing (#4211)
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-03 16:20:56 -07:00
Derek Collison
238282d974 Fix some data races detected in internal testing
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-03 13:58:15 -07:00