Commit Graph

345 Commits

Author SHA1 Message Date
Derek Collison
e08f6d863d Allow for republish to be headers only
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-30 12:05:17 -07:00
Derek Collison
5592315e89 Suppress consumer create and R1 stream update advisories on server restart.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-30 09:58:35 -07:00
Ivan Kozlovic
53e3c53d96 [FIXED] JetStream: consumer with deliver new may miss messages
This could happen when a consumer had not sent anything to the
attached NATS subscription and there was a consumer leader
step down or server restart.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-23 12:01:48 -06:00
Derek Collison
ef3eea4d73 Speed up raft for tests
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-18 16:28:58 -07:00
Derek Collison
938caf9963 Merge pull request #3131 from nats-io/consumer-memory-override
Make sure consumer store is memory based when selected.
2022-05-18 13:04:57 -07:00
Derek Collison
41cca8d6c4 Allow proper mix and match of consumer stores and stream stores.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-18 12:51:48 -07:00
Derek Collison
1f4d4cf0ed Merge pull request #3132 from nats-io/raft-stable
With use cases bringing us more data I wanted to suggest these changes.
2022-05-18 09:30:04 -07:00
Derek Collison
906eb332fc Make sure consumer store is memory based when selected
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-17 18:48:27 -07:00
Derek Collison
c166c9b199 Enable republishing of messages once stored in a stream.
This enables lightweight distribution of messages to very large number of NATS subscribers.
We add in metadata as headers that allows for gap detection which enables initial value (via JetStream, maybe KV) and realtime NATS core updates but all globally ordered.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-17 15:18:54 -07:00
Derek Collison
50be0a6599 Allow explicit configuration of consumer's replica count and allow a consumer to force memory storage.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-16 19:03:56 -07:00
Derek Collison
ccd2290355 With use cases bringing us more data I wanted to suggest these changes.
With inlining election timeout updates we double the lock contention and most likely introduced head of line issues for routes under heavy load.
Also slowing down heartbeats with so many assets being deployed in our user ecosystem, also moved the normal follower to candidate timing further out, similar to the lost quorum.
Note that the happy path transfer will still be very quick.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-15 09:55:22 -07:00
Derek Collison
f702e279ab Fix for a consumer recovery issue.
Also update healthz to check all assets that are assigned, not just running.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-26 19:22:19 -07:00
Ivan Kozlovic
06ff4b2b29 Split JS cluster and super clusters tests and compile only on 1.16
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-26 16:24:05 -06:00
Ivan Kozlovic
0e2ab5eeea Changes to tests that run on Travis
- Remove code coverage from Travis and add it to a GitHub Action
that will be run as a nightly.
- Use tag builds to exclude some tests, such as the "norace" or
JS tests. Since "go test" does not support "negative" regexs, there
is no other way.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-26 14:11:31 -06:00
Ivan Kozlovic
363849bf3a [FIXED] JetStream: Mirrors would fail to be recovered
This is a continuation of PR #3060, but extends to clustering.

Verified with manual test that a mirror created with v2.7.4 has
the duplicates window set and on restart with main would still
complain about use of dedup in cluster mode. The mirror stream
was recovered but showing as R1.
With this fix, a restart of the cluster - with existing data -
will properly recover the stream as an R3 and messages that
were published while in a bad state are synchronized.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Matthias Hanel mh@synadia.com
2022-04-21 10:59:23 -06:00
Ivan Kozlovic
b9463b322f [FIXED] JetStream: stream mirror issues in mixed mode clusters
Similar to PR #3061

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-20 23:21:15 -06:00
Matthias Hanel
254c970876 Fix subject renaming for leaf connections and queue subs (#3062)
* [fix] on queue sub, a consumers  delivery subject, was not changed

to the original publish subject the stream received
the code added is a copy of what regular subs do

* [fixed] subject renaming for leaf node connections as well

also updated multi server test to test for queue and non queue scenarios

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-20 19:23:21 -04:00
Derek Collison
9f9732ad97 When we polled too quickly on migration we could check before catchup logic had even kicked in.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-20 16:56:18 +01:00
Ivan Kozlovic
af1a80d7c4 Merge pull request #3047 from nats-io/move-back
If we moved a stream back to a cluster we were once at, the stream would be empty.
2022-04-18 13:46:05 -06:00
Matthias Hanel
79b4374d01 [Fixed] limits enforcement issues (#3046)
* [Fixed] limits enforcement issues

stream create had checks that stream restore did not have.
Moved code into commonly used function checkStreamCfg.
Also introduced (cluster/non clustered) StreamLimitsCheck functions to
perform checks specific to clustered /non clustered data structures.

Checking for valid stream config and limits/reservations before
receiving all the data. Now fails the request right away.

Added a jetstream limit "max_request_batch" to limit fetch batch size

Shortened max name length from 256 to 255, more common file name limit

Added check for loop in cyclic source stream configurations

features related to limits

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-18 01:53:48 -04:00
Derek Collison
19392ceebb If we moved a stream back to a cluster we were once at, the stream would appear to have no messages.
The old raft node assignment would prevent proper catchup.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-17 20:36:18 -07:00
Matthias Hanel
7752a5becc Fixed mixed mode server without JS dropping js export on jwt update (#3044)
* Fixed mixed mode server without JS dropping JS export on jwt update

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-16 15:09:36 -04:00
Derek Collison
55f8982e33 Fix flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-16 05:08:12 -07:00
Derek Collison
c9cb27228e Merge pull request #3041 from nats-io/mirror-move
Raft improvements
2022-04-15 13:38:53 -07:00
Derek Collison
2a8b123706 Don't quickly declare lost quorum after scale up
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-15 13:28:34 -07:00
Derek Collison
10c877d942 Make sure if we recreate something after deleting that we do not wipe valid state
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-15 12:22:10 -07:00
Derek Collison
12730af5c2 Make sure mirror re-syncs after origin is moved.
Speed up mirror and sources heartbeats.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-15 07:01:55 -07:00
Ivan Kozlovic
bd61d51a1c [IMPROVED] JetStream: reduce unnecessary leader election
- Wait of some sort of routing to be in place before starting
the raft run loop
- Remove use of lock in apiDispatch that was not necessary but
could have cause a route to block, causing memory growth, etc..

Unrelated rename of some tests so that they start with TestJetStream
and TestJetStreamCluster for cluster tests, fixed some flappers
and ensure that tests that change RAFT timeouts put them back
to default values on exit.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-14 10:47:14 -06:00
Derek Collison
9748925f13 Improvements to stream and consumer move.
During elected stepdown and transfer allow the new leader to take over before we stepdown.
We could receive a leader change, so make sure to also check migration state.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-14 07:27:29 -07:00
Matthias Hanel
ec3f9258af [Adding] max_ha_assets to limit placement on server with more ha assets (#3032)
* [Adding] max_ha_assets to limit placement on server with more ha assets

server running more than max_ha_assets #raft nodes will not be used to
place new streams and fail if not enough free server can be found.
Durable Consumer creation on such server will fail as their peer size is
bound to the same set as their stream.

This also avoids updating placement where no new placement is needed.
This is the case when, on update, placement tags get removed. 

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-14 01:53:41 -04:00
Derek Collison
3c0bced76e Move test to no race, rename others
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-12 16:23:36 -07:00
Ivan Kozlovic
50c3986863 [FIXED] JetStream stream catchup issues
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 16:05:12 -06:00
Derek Collison
b7718e2b7a First pass support for stream alternates
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 18:47:19 -06:00
Derek Collison
04cce6df68 Merge pull request #3020 from nats-io/move-updates
[IMPROVED] Raft layer for general stability and leader election.
2022-04-11 17:33:13 -07:00
Matthias Hanel
02d25cc640 [FIXED] Consumer deliver subject incorrect when imported and crossing gateway (#3025)
followup to #3017

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-11 20:27:25 -04:00
Derek Collison
3ed1ecc032 Remove old code
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 12:00:29 -07:00
Matthias Hanel
13e5ab10bd fix js nex interest check where leaf node masked gw subj propagation (#3016)
basically a gw subject propagation issue could be hidden behind a leaf
node.
also change error text when this was the case

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-11 14:04:09 -04:00
Derek Collison
95f3a3f919 Resolved conflicts with main
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 06:24:47 -07:00
Derek Collison
c3612b57c7 Fixes for some flapping tests
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-10 13:02:03 -07:00
Derek Collison
37cbac99e7 Improvements to the raft layer for general stability and support of scale up and down and asset move.
Also fixed a bug that would allow a leadership transfer when catching up.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-10 08:59:39 -07:00
Derek Collison
2510d671de Skip flapper for now, will fix in separate PR
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-09 11:55:04 -07:00
Derek Collison
cd7f16f28a Tweak timing for test to prevent flapping
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-09 11:13:49 -07:00
Derek Collison
331c2faaa6 When using a stream import for a push consumer's messages, if the message crossed a route we dropped the delivered subject.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-09 06:42:22 -07:00
Derek Collison
3663d595fc Disallow moving a stream that is already being moved
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-07 17:09:55 -07:00
Matthias Hanel
5662141932 Adding unique_tag to ensure matching tags are not used twice (#3011)
Allows to not place a stream in the same availability zone twice.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-07 18:11:00 -04:00
Matthias Hanel
2db7d9fe2f unit test to make sure tiered limits and stream moves work together (#3007)
This needs testing because stream move adjusts the replication factor

Because adjusting replication factor and moving is illegal, this case
does not need to be tested

In order to support one off configurations, added same modification
callout to super cluster as is used with cluster

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-05 18:11:04 -04:00
Ivan Kozlovic
9b5797f63c Undo sending bad request on no-interest in apiDispatch
This broke cross-account functionality. Ported the test from the
Go client that showed the failure after PR#2997 was merged.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-05 08:51:28 -06:00
Derek Collison
7e38ebcb6e Allow assets such as streams and their associated consumers to migrate between clusters.
The system will allow an update to a stream, and subsequently all attached consumers, to be placed in another cluster either directly or via tag placement.
The meta layer will scale the underlying peerset appropriately to straddle the two clusters for both the stream and consumers, taking into account the consumer type.
Control will then pass to the current leaders of the assets who will monitor the catchup status of the new peers.
(Note we can optimize this later to only traverse once across a GW for any given asset, but for now this is simpler)
Once the original leaders have determined the assets are synched it will pass leadership to a member of the new peerset.
Once the new leader has been elected, it will forward a request for the meta layer to shrink the peerset by removing the old peers.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-04 18:28:36 -07:00
Matthias Hanel
b7bc842c8b Add a config modification callback to createJetStreamCluster (#2998)
* Add a config modification callback to createJetStreamCluster

named createJetStreamClusterAndModHook allowing the generated config to
be altered prior to server start

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-04 17:39:58 -04:00
Ivan Kozlovic
29ea280fe7 [FIXED] JetStream: send "bad request" response for malformed API requests
An example was a "consumer info" request with a consumer name that
had tokens, which is illegal. This results in the request being
dropped in apiDispatch() because there was no interest.
The server will now return a "bad request" error in such case.

Resolves #2995

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-04 11:25:55 -06:00