Repeated calls to `scheduleSetSourceConsumerRetry` could end up creating
multiple timers for the same source, which would eventually schedule
even more timers, which would result in runaway CPU usage. This PR
instead bounds to one timer per source per stream.
Signed-off-by: Neil Twigg <neil@nats.io>
This adds a new `waitForAccount` test helper that ensures that an
account exists across the cluster, and updates
`TestJetStreamClusterAccountPurge` to use it after submitting new JWTs.
This should prevent `require no error, but got: nats: JetStream not
enabled for account` errors.
Signed-off-by: Neil Twigg <neil@nats.io>
This unprotected access allowed the cache to most likely be flushed and
after a subsequent writeMsgRecord would have the offset > slot value
which can't happen if lock is held due to us loading cache properly at
beginning of the function.
Signed-off-by: Derek Collison <derek@nats.io>
Resolves#4529
TestMQTTQoS2RetriesPublish to 100ms, and in TestMQTTQoS2RetriesPubRel to
50ms.
A lesser value caused another PUBLISH to be fired while the test was
still processing the final QoS2 flow. Reduced the number of retries we
wait for to make the test a little quicker.
When num blocks > 32 and we used new binary search in NumPending() we
could return -1, nil. If sequence is inclusive this should always return
valid index and mb.
The reason we could return -1 would be that we were not accounting for
gaps as mb.first.seq can move ahead as first is removed. The panic could
orphan held locks for filestore, consumer and possibly stream which
would lock up a system, leading to memory growth and unstable behaviors.
Signed-off-by: Derek Collison <derek@nats.io>
The reason would be that we were not accounting for gaps as mb.first.seq can move. The behavior should always return a valid index and mb if seq is inclusive of range from first to last.
The panic could orphan held locks for filestore, consumer and possibly stream.
Signed-off-by: Derek Collison <derek@nats.io>
The default timeout for JetStream API calls is 10s, so in the case where
we determine that we are the leader, but the stream info endpoint has
not registered with the server we are connected to, the stream info call
could fail and we would exhaust the whole checkFor since we would stay
in one call for 10s.
Fix is to override and make multiple attempts possible for the checkFor
loops.
Signed-off-by: Derek Collison <derek@nats.io>
The default timeout for JetStream API calls is 10s, so in the case where we determine that we are the leader, but the stream info endpoint has not registered with the server we are connected to, the stream info call could fail and we would exhaust the whole checkFor since we would stay in one call for 10s. Fix is to override and make multiple attempts possible.
Signed-off-by: Derek Collison <derek@nats.io>
TestMQTTQoS2RetriesPublish to 100ms, and in TestMQTTQoS2RetriesPubRel to 50ms.
A lesser value caused another PUBLISH to be fired while the test was still processing the final QoS2 flow. Reduced the number of retries we wait for to make the test a little quicker.
Tracing the connect (ack?) read times in `TestMQTTSubPropagation` showed
that they come in the 2-3s range during normal execution, and it appears
that they occasionally exceed the 4s timeout.
I am not sure exactly why MQTT CONNECT takes such a long time, but as
the name of the test suggests, perhaps it has to do with session
propagation in a cluster.
This should hopefully de-flake `TestFileStoreNumPendingLargeNumBlks` a
bit. By adding a new `require_LessThan` helper function, we also will
produce a more meaningful log line that tells us what the bad values
were if the test fails.
Signed-off-by: Neil Twigg <neil@nats.io>