The reason would be that we were not accounting for gaps as mb.first.seq can move. The behavior should always return a valid index and mb if seq is inclusive of range from first to last.
The panic could orphan held locks for filestore, consumer and possibly stream.
Signed-off-by: Derek Collison <derek@nats.io>
The default timeout for JetStream API calls is 10s, so in the case where
we determine that we are the leader, but the stream info endpoint has
not registered with the server we are connected to, the stream info call
could fail and we would exhaust the whole checkFor since we would stay
in one call for 10s.
Fix is to override and make multiple attempts possible for the checkFor
loops.
Signed-off-by: Derek Collison <derek@nats.io>
The default timeout for JetStream API calls is 10s, so in the case where we determine that we are the leader, but the stream info endpoint has not registered with the server we are connected to, the stream info call could fail and we would exhaust the whole checkFor since we would stay in one call for 10s. Fix is to override and make multiple attempts possible.
Signed-off-by: Derek Collison <derek@nats.io>
TestMQTTQoS2RetriesPublish to 100ms, and in TestMQTTQoS2RetriesPubRel to 50ms.
A lesser value caused another PUBLISH to be fired while the test was still processing the final QoS2 flow. Reduced the number of retries we wait for to make the test a little quicker.
Tracing the connect (ack?) read times in `TestMQTTSubPropagation` showed
that they come in the 2-3s range during normal execution, and it appears
that they occasionally exceed the 4s timeout.
I am not sure exactly why MQTT CONNECT takes such a long time, but as
the name of the test suggests, perhaps it has to do with session
propagation in a cluster.
This should hopefully de-flake `TestFileStoreNumPendingLargeNumBlks` a
bit. By adding a new `require_LessThan` helper function, we also will
produce a more meaningful log line that tells us what the bad values
were if the test fails.
Signed-off-by: Neil Twigg <neil@nats.io>
We do not expect this condition at all but did see it in the wild, so
this will detect and cleanup. Will run on same timer (2m) as general
health checks run from `monitorCluster()`
Signed-off-by: Derek Collison <derek@nats.io>
`TestMQTTQoS2InflightMsgsDeliveredAfterUnsubscribe` was flaky on
TravisCI.
At this point it was there to illustrate a design issue with MQTT 3.1.1
QoS support, and much of it is disabled since it'd be failing anyway.
Disabling the test until it can be _fully_ applicable/enabled.
`Resolves #4479 `
In AuthCallout server-config system, a spurious error message would be
generated on server reload in the edge case that the AuthCallout user is
an NKEY and both User and NKEY account members are defined in the
config.
Calls to `filteredStateLocked` can mutate the per-subject state by
recalculating the first sequences. Doing so under a read-only lock can
lead to race conditions, so switch the call sites out to write locks
instead.
Signed-off-by: Neil Twigg <neil@nats.io>
The `streamSnapshot` function was not holding the filestore lock while
resetting and using the hasher. This created a data race and also
resulted in flakes in `TestFileStoreSnapshot`.
Signed-off-by: Neil Twigg <neil@nats.io>
- Use Go 1.21 in nightlies
- Both rand.Seed and rand.Read are both deprecated, remove its use to
fix staticcheck errors
```go
server/client.go:95:2: SA1019: rand.Seed has been deprecated since Go 1.20 and an alternative
has been available since Go 1.0: As of Go 1.20 there is no reason to call Seed with a random value.
Programs that call Seed with a known value to get a specific sequence of results should use
New(NewSource(seed)) to obtain a local random generator. (staticcheck)
server/jetstream_test.go:20399:2: SA1019: rand.Read has been deprecated since Go 1.20
because it shouldn't be used: For almost all use cases, crypto/rand.Read is more appropriate. (staticcheck)
```