Maybe that is the place it could be set and not in NewServer(), but
want to minimize risk of breaking something close to 2.9.0
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
A test TestJetStreamClusterLeafNodeSPOFMigrateLeaders was added at
some point that needed the remotes to stop (re)connecting. It made
use of existing leafNodeEnabled that was used for GW/Leaf interest
propagation races to disable the reconnect, but that may not be
the best approach since it could affect users embedding servers
and adding leafnodes "dynamically".
So this PR introduced a specific boolean specific for that test.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The interest moved across the leafnode would be for the mapping, and not the actual qsub.
So when received if we did detect that we are mapped and do not have a queue filter present make sure to ignore.
This will allow queue subscriber processing on the local server that received the message from the leafnode.
Signed-off-by: Derek Collison <derek@nats.io>
'Chaos' is a new a group of test that validates behavior in presence of
random failures.
Overview:
- Introduce a 'Chaos Monkey' controller which can unleash a monkey
against a test cluster.
- Introduce a monkey of type 'ClusterBouncer' which stops and restarts
nodes according to some configuration
- Add 2 example tests, they ensure a cluster can survive some amount of
nodes bouncing
- Configure the build to skip chaos tests unless explicitly requested
- Add some test utility functions
If a client with a given client ID is connected and while connected
another client tries to reuse the same client ID, the spec says that
the old client be closed and the new one accepted.
However, the server protects from this flapping happening all the time
by rejecting new clients that try to connect at a very fast pace.
However, the server was closing a misbehaving client after a second
delay (to prevent immediate reconnect if the client library does that)
but was not blocking the read loop and the compounding issue was that
if that misbehaving client is REALLY misbehaving and not waiting for
the CONNACK to send more protocols (for instance SUB) the server would
panic because the client was not fully configured.
To prevent that, the server will now "block" this misbehaving client
in its readLoop before closing the connection, preventing processing
of possible protocols that follow the CONNECT.
Resolves#3313
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
A message block is checking the filestore's cfg.Subjects to see
if it can "intern" the subject or not. The problem is that this
is done under the message block's lock, but not the filestore.
However, during a stream configuration update, the filestore's
cfg field is switched to a new one, causing the datarace.
By making sure we do the switch under all message blocks lock,
we remove the data race (that could be reproduce by running th
test TestJetStreamClusterMoveCancel with -count=10).
We investigating the use of a string interning library but it
showed a little performance degradation that this approach does
not suffer from.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Send snapshot only if leader
- When processing snapshot, start with a smaller inactivity interval
that will double up to 10sec or use 10sec directly once we get a
message. Reason for that is that it is possible that the request
for snapshot is sent while the leader has not yet setup the subscription
that receives the requests (or subscription has not fully reached the
cluster).
- Don't remember snapfile on err.
- Do not consider current if we have not had any activity.
- Stabilize stream scale up under active heavy publishing.
- Due to the publish pressure move the check for followers direct subs spinning up til after we stop publishing.
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>