This could happen when a consumer had not sent anything to the
attached NATS subscription and there was a consumer leader
step down or server restart.
Signed-off-by: Derek Collison <derek@nats.io>
This enables lightweight distribution of messages to very large number of NATS subscribers.
We add in metadata as headers that allows for gap detection which enables initial value (via JetStream, maybe KV) and realtime NATS core updates but all globally ordered.
Signed-off-by: Derek Collison <derek@nats.io>
With inlining election timeout updates we double the lock contention and most likely introduced head of line issues for routes under heavy load.
Also slowing down heartbeats with so many assets being deployed in our user ecosystem, also moved the normal follower to candidate timing further out, similar to the lost quorum.
Note that the happy path transfer will still be very quick.
Signed-off-by: Derek Collison <derek@nats.io>
- Remove code coverage from Travis and add it to a GitHub Action
that will be run as a nightly.
- Use tag builds to exclude some tests, such as the "norace" or
JS tests. Since "go test" does not support "negative" regexs, there
is no other way.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is a continuation of PR #3060, but extends to clustering.
Verified with manual test that a mirror created with v2.7.4 has
the duplicates window set and on restart with main would still
complain about use of dedup in cluster mode. The mirror stream
was recovered but showing as R1.
With this fix, a restart of the cluster - with existing data -
will properly recover the stream as an R3 and messages that
were published while in a bad state are synchronized.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Matthias Hanel mh@synadia.com
* [fix] on queue sub, a consumers delivery subject, was not changed
to the original publish subject the stream received
the code added is a copy of what regular subs do
* [fixed] subject renaming for leaf node connections as well
also updated multi server test to test for queue and non queue scenarios
Signed-off-by: Matthias Hanel <mh@synadia.com>
* [Fixed] limits enforcement issues
stream create had checks that stream restore did not have.
Moved code into commonly used function checkStreamCfg.
Also introduced (cluster/non clustered) StreamLimitsCheck functions to
perform checks specific to clustered /non clustered data structures.
Checking for valid stream config and limits/reservations before
receiving all the data. Now fails the request right away.
Added a jetstream limit "max_request_batch" to limit fetch batch size
Shortened max name length from 256 to 255, more common file name limit
Added check for loop in cyclic source stream configurations
features related to limits
Signed-off-by: Matthias Hanel <mh@synadia.com>
- Wait of some sort of routing to be in place before starting
the raft run loop
- Remove use of lock in apiDispatch that was not necessary but
could have cause a route to block, causing memory growth, etc..
Unrelated rename of some tests so that they start with TestJetStream
and TestJetStreamCluster for cluster tests, fixed some flappers
and ensure that tests that change RAFT timeouts put them back
to default values on exit.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
During elected stepdown and transfer allow the new leader to take over before we stepdown.
We could receive a leader change, so make sure to also check migration state.
Signed-off-by: Derek Collison <derek@nats.io>
* [Adding] max_ha_assets to limit placement on server with more ha assets
server running more than max_ha_assets #raft nodes will not be used to
place new streams and fail if not enough free server can be found.
Durable Consumer creation on such server will fail as their peer size is
bound to the same set as their stream.
This also avoids updating placement where no new placement is needed.
This is the case when, on update, placement tags get removed.
Signed-off-by: Matthias Hanel <mh@synadia.com>
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
basically a gw subject propagation issue could be hidden behind a leaf
node.
also change error text when this was the case
Signed-off-by: Matthias Hanel <mh@synadia.com>
This needs testing because stream move adjusts the replication factor
Because adjusting replication factor and moving is illegal, this case
does not need to be tested
In order to support one off configurations, added same modification
callout to super cluster as is used with cluster
Signed-off-by: Matthias Hanel <mh@synadia.com>
This broke cross-account functionality. Ported the test from the
Go client that showed the failure after PR#2997 was merged.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The system will allow an update to a stream, and subsequently all attached consumers, to be placed in another cluster either directly or via tag placement.
The meta layer will scale the underlying peerset appropriately to straddle the two clusters for both the stream and consumers, taking into account the consumer type.
Control will then pass to the current leaders of the assets who will monitor the catchup status of the new peers.
(Note we can optimize this later to only traverse once across a GW for any given asset, but for now this is simpler)
Once the original leaders have determined the assets are synched it will pass leadership to a member of the new peerset.
Once the new leader has been elected, it will forward a request for the meta layer to shrink the peerset by removing the old peers.
Signed-off-by: Derek Collison <derek@nats.io>
* Add a config modification callback to createJetStreamCluster
named createJetStreamClusterAndModHook allowing the generated config to
be altered prior to server start
Signed-off-by: Matthias Hanel <mh@synadia.com>
An example was a "consumer info" request with a consumer name that
had tokens, which is illegal. This results in the request being
dropped in apiDispatch() because there was no interest.
The server will now return a "bad request" error in such case.
Resolves#2995
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>