Commit Graph

112 Commits

Author SHA1 Message Date
Ivan Kozlovic
7ff0ea449a Fixed issues with leafnode compression negotiation
When a server would send an asynchronous INFO to a remote server
it would incorrectly contain compression information that could
cause issues with one side thinking that the connection should
be compressed while the other side was not.

It also caused the authentication timer to be incorrectly set
which would cause a disconnect.

Signed-off-by: Ivan Kozlovic <ijkozlovic@gmail.com>
2023-06-09 13:20:44 -06:00
Derek Collison
a1f03513d8 Merge branch 'main' into dev 2023-06-09 09:29:13 -07:00
Derek Collison
9eeffbcf56 Fix performance issues with checkAckFloor.
Bail early if new consumer, meaning stream sequence floor is 0.
Decide which linear space to scan.
Do no work if no pending and we just need to adjust which we do at the end.

Also realized some tests were named wrong and were not being run, or were in wrong file.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-08 18:45:03 -07:00
Derek Collison
b5c0170527 Turn off leaf compression to stabilize test for now
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-08 04:37:07 -07:00
Derek Collison
fd082ee8a5 Merge branch 'main' into dev 2023-06-07 14:31:53 -07:00
Derek Collison
779978d817 Extended replay leafnode test to confirm mirror functionality
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-07 14:01:43 -07:00
Derek Collison
f342f6a758 Merge branch 'main' into dev 2023-06-05 14:13:18 -07:00
Derek Collison
4ac45ff6f3 When consumers were R1 and the same name was reused, server restarts could try to cleanup old ones and effect the new ones.
These changes allow consumer name reuse more effectively during server restarts.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-05 12:48:18 -07:00
Derek Collison
af318be5db Merge branch 'main' into dev 2023-06-04 13:30:15 -07:00
Maurice van Veen
132567de39 Fix PurgeEx replay with sequence & keep succeeds 2023-06-04 11:56:28 +02:00
Derek Collison
30d9dfd305 Merge branch 'main' into dev 2023-06-03 18:17:28 -07:00
Derek Collison
dee532495d Make sure to process extended purge operations correctly when being replayed on a restart.
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-03 17:49:45 -07:00
Derek Collison
df901dc1aa Merge branch 'main' into dev 2023-06-02 16:45:07 -07:00
Derek Collison
1bce79750e When we were optimizing for single cluster but large number of leafnodes we inadvertently broke a daisy chained scenarion where a server was a spoke and a hub with a single hub cluster.
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-02 15:16:36 -07:00
Derek Collison
7760aa5107 Merge branch 'main' into dev 2023-05-16 14:01:57 -07:00
Derek Collison
734895ae47 Fix test flapper
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-16 12:20:18 -07:00
Derek Collison
b0340ce598 Make sure to wait properly until we believe we are caught up to enable direct gets.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-16 11:02:06 -07:00
Derek Collison
4c26cbb3de Merge branch 'main' into dev 2023-05-12 12:38:20 -07:00
Derek Collison
5e029d08d5 For older R1 streams created by previous servers we could have no cluster for the stream assignment group which would prevent scale up with newer servers.
This will inherit cluster if detected from placement tags or client cluster designation.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 17:59:28 -07:00
Derek Collison
9fa724cd7b Merge branch 'main' into dev 2023-05-03 21:00:35 -07:00
Derek Collison
da8aeac91b Fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-03 21:00:17 -07:00
Derek Collison
68f6b59fc7 Merge branch 'main' into dev 2023-05-03 19:51:24 -07:00
Derek Collison
21239022bd Protect against usage drift for any unforseen reason and if detected correct.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-03 17:09:06 -07:00
Derek Collison
eb1eb3c49e Merge branch 'main' into dev 2023-05-01 16:29:35 -07:00
Derek Collison
f098c253aa Make sure we adjust accounting reservations when deleting a stream with any issues.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-01 15:54:37 -07:00
Derek Collison
f5ac5a4da0 Fix for a bug that could leave a raft node running when stopping a stream.
This can happen when we reset a stream internally and the stream had a prior snapshot.

Also make sure to always release resources back to the account regardless if the store is no longer present.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-01 13:22:06 -07:00
Derek Collison
0321eb6484 Merge branch 'main' into dev 2023-04-29 19:52:57 -07:00
Derek Collison
546dd0c9ab Make sure we can recover an underlying node being stopped.
Do not return healthy if the node is closed, and wait a bit longer for forward progress.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-29 07:42:23 -07:00
Derek Collison
d107ba3549 Under certain scenarios we have witnessed healthz() that never retrun healthy due to a stream or consumer being missing or stopped.
This will now allow the healthy call to attempt to restart those assets.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-28 17:11:08 -07:00
Derek Collison
d573b78aee Merge branch 'main' into dev 2023-04-26 18:42:31 -07:00
Derek Collison
7f06d6f5a7 When Jsz() was asked for consumer details, would report incorrect data if not a consumer leader.
This is due to the way state is maintained for leaders vs followers for consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-26 15:03:15 -07:00
Derek Collison
4ebdb69daf Merge branch 'main' into dev 2023-04-26 11:34:37 -07:00
Derek Collison
c0f5b71a8f Test that makes sure that assets that have been created under a certain cluster can be upgraded to a new cluster.
This is specifically when a cluster is reconfigured and the servers are restarted with a new cluster name.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-24 20:06:20 -07:00
Derek Collison
1de9a1cf3b Merge branch 'main' into dev 2023-04-21 14:09:35 -07:00
Derek Collison
8b7c2d12aa Run a check for ack floor drift when taking over as a leader and the ack go routine is spun up.
Also periodically check. If all normal will be very cheap.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-21 11:59:35 -07:00
Derek Collison
1ae51b23a9 [ADDED] Multiple routes and ability to have per-account routes (#4001)
New configuration fields:
```
cluster {
   ...
   pool_size: 5
   accounts: ["A", "B"]
}
```

The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.

Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.

Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 15:33:46 -07:00
Derek Collison
7d3ec51d79 Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 14:46:59 -07:00
Ivan Kozlovic
105237cba8 [ADDED] Multiple routes and ability to have per-account routes
New configuration fields:
```
cluster {
   ...
   pool_size: 5
   accounts: ["A", "B"]
}
```

The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.

Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.

Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 09:32:25 -06:00
Ivan Kozlovic
a4df4f8727 Fixed some tests
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-03-30 15:02:59 -06:00
Derek Collison
4646f4af5d Do not allow any JetStream leaders to be placed on a lameduck server
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 20:15:41 -07:00
Derek Collison
02702e4620 [IMPROVEMENT] General stability and bug fixes. (#3999)
This PR has general improvements and fixes to filestore, raft, and the
clustering layer.

Summary

1. Additional support for preAck handling for interest based streams
when replicated acks arrive before the message itself.
2. Better handling when checking state to determine whether to remove an
interest based message.
3. Improved StepDown() and leadership transfer handling after restarts.
4. Improved voting logic for high load systems.
5. Various improvements and fixes for filestore Compact(), which is used
heavily in the raft layer when updating snapshots and the raft wal.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 17:09:44 -07:00
Derek Collison
182bf6cbae Bug fixes and general stability improvements.
1. If reset ignore Applied() that are greater then our commit.
2. Improved StepDown() by placing at back of queue if preferred.
3. Improved handling of leadership transfer during StepDown().
4. Do not store EntryLeaderTransfer records on disk.
5. Remove un-needed processing of older terms.
6. If append entry has higher term, also inherit pterm.
7. Only inherit a candidate's term if we decide to vote for them.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:46 -07:00
Neil Twigg
8d5519356e Shut down RAFT groups when disabling JetStream
Signed-off-by: Neil Twigg <neil@nats.io>
2023-03-23 16:54:01 +00:00
Derek Collison
9ccd7abdf8 Test for preAcks
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-21 12:08:24 -07:00
Derek Collison
5a16f98427 Fixed an off by one bug that under certain circumstances could cause large consumer replica states.
This could lead to instability in the system.

The bug would manifest in replicated consumers when certain messages could be acked out of order, and, the pending list would never go to zero.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-19 10:41:59 -07:00
Derek Collison
f0e1585490 Fix flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-17 13:14:43 -07:00
Derek Collison
5bb6f167b9 Make sure to cleanup messages on a follower consumer for an interest based stream when the consumer leader sends a state snapshot.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-15 20:11:16 -07:00
Derek Collison
8dbfbbe577 Fix test
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-15 17:23:51 -07:00
Derek Collison
5a1878b015 Fix for workqueue stream scaling up and not removing acked messages.
Make sure when scaling up streams that are workqueue or interest policy that consumers scale as well.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-13 17:13:49 -07:00
Derek Collison
724160ebac Fix flapping tests
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 14:30:23 -08:00