For some older R1 streams created by previous servers we could have no
cluster for the stream assignment group which would prevent scale up
with newer servers.
This will inherit cluster if detected as absent from the placement tags
or client cluster designation.
Signed-off-by: Derek Collison <derek@nats.io>
When doing leadership transfer stepdown as soon as we know we have sent
the EntryLeaderTransfer entry.
Delaying could allow something to be sent from the old leader which
would cause the new leader to bail on being a candidate even though it
would have gotten all the votes.
Signed-off-by: Derek Collison <derek@nats.io>
Delaying could allow something to be sent from the old leader which would cause the new leader to bail on being a candidate even though it would have gotten all the votes.
Signed-off-by: Derek Collison <derek@nats.io>
Added a leafnode lock to allow better traversal without copying of large
leafnodes in a single hub account.
Signed-off-by: Derek Collison <derek@nats.io>
Added a leafnode lock to allow better traversal without copying of large leafnodes in a single hub account.
Signed-off-by: Derek Collison <derek@nats.io>
In #1943 it was adopted to use `UTC()` in some timestamps, but an
unintended side effect from this is that it strips the monotonic time
(e5646b23de),
so it can be prone to clock skews when subtracting time in other areas
of the code.
In #1943 it was adopted to use `UTC()` in some timestamps,
but an unintended side effect from this is that it strips
the monotonic time, so it can be prone to clock skews when
subtracting time in other areas of the code.
e5646b23de
This would impact only cases with accounts defined in configuration file
(as opposed to operator mode). During the configuration reload, new
accounts and sublists were created to later be replaced with existing
ones. That left a window of time where a subscription could have been
added (or attempted to be removed) from the "wrong" sublist. This could
lead to route subscriptions seemingly not being forwarded.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
I have seen cases, maybe due to previous issue with configuration
reload that would miss subscriptions in the sublist because
of the sublist swap, where we would attempt to remove subscriptions
by batch but some were not present. I would have expected that
all present subscriptions would still be removed, even if the
call overall returned an error.
This is now fixed and a test has been added demonstrating that
even on error, we remove all subscriptions that were present.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
One should not access s.opts directly but instead use s.getOpts().
Also, server lock needs to be released when performing an account
lookup (since this may result in server lock being acquired).
A function was calling s.LookupAccount under the client lock, which
technically creates a lock inversion situation.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When removing a msg and we need to load the msg block and incur IO,
unlock fs lock to avoid stalling other activity on other blocks. E.g
removing and adding msgs at the same time.
Signed-off-by: Derek Collison <derek@nats.io>
This can happen when we reset a stream internally and the stream had a
prior snapshot.
Also make sure to always release resources back to the account
regardless if the store is no longer present.
Signed-off-by: Derek Collison <derek@nats.io>
This can happen when we reset a stream internally and the stream had a prior snapshot.
Also make sure to always release resources back to the account regardless if the store is no longer present.
Signed-off-by: Derek Collison <derek@nats.io>
When a fleet of leafnodes are isolated (not routed but using same
cluster) we could do better at optimizing how we update the other
leafnodes since if they are all in the same cluster and we know we are
isolated we can skip.
We can improve further in 2.10.
Signed-off-by: Derek Collison <derek@nats.io>
Under certain scenarios we have witnessed healthz() that will never
return healthy due to a stream or consumer being missing or stopped.
This will now allow the healthz() call to attempt to restart those
assets.
We will also periodically call this in clustered mode from the
monitorCluster routine.
Signed-off-by: Derek Collison <derek@nats.io>