Commit Graph

244 Commits

Author SHA1 Message Date
Derek Collison
2d21bc7008 Fix datarace
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 15:35:20 -07:00
Derek Collison
dba03dbc2f Optimizations to reduce contention for high connections in a JetStream enabled account with high API usage.
Several strategies which are listed below.

1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses an atomic.
3. Accessing the JetStream context no longer requires a server lock, uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does not.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-30 14:52:15 -07:00
Derek Collison
f95ef63ae1 In lameduck mode shutdown jetstream at start, do not leave running during connection drain.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-24 14:42:59 -07:00
Neil Twigg
1f9ddf2bbd Add Raft goroutine labels, tweak logging
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-16 11:15:06 +01:00
Derek Collison
f1bf4127c5 Merge branch 'main' into dev 2023-08-25 11:03:54 -07:00
Derek Collison
e5625b9d9b If a leader is asked for an item and we have no items left, make sure to also step-down.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-25 10:20:07 -07:00
Derek Collison
fd50bc2918 Merge branch 'main' into dev 2023-08-24 21:10:22 -07:00
Derek Collison
2669f77190 Make sure to reset election timer on catching up
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-24 19:58:08 -07:00
Derek Collison
19eba1b8c8 Merge branch 'main' into dev 2023-06-08 09:34:41 -07:00
Neil Twigg
6d9955d212 Send peer state when adding peers
Signed-off-by: Neil Twigg <neil@nats.io>
2023-06-08 15:25:18 +01:00
Derek Collison
30d9dfd305 Merge branch 'main' into dev 2023-06-03 18:17:28 -07:00
Derek Collison
238282d974 Fix some data races detected in internal testing
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-03 13:58:15 -07:00
Derek Collison
ee87df250c Merge branch 'main' into dev 2023-05-17 19:27:58 -07:00
Derek Collison
8e825001d2 When we receive a catchup request for an item beyond our current state, we should stepdown.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-17 17:30:35 -07:00
Derek Collison
990ac56557 Merge branch 'main' into dev 2023-05-10 15:31:54 -07:00
Derek Collison
a17357c6ae When doing leadership transfer stepdown as soon as we know we have sent the EntryLeaderTransfer entry.
Delaying could allow something to be sent from the old leader which would cause the new leader to bail on being a candidate even though it would have gotten all the votes.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 12:27:33 -07:00
Derek Collison
717afae9ef When doing a leader transfer clear vote state on leader and when non-chosen peers receive the update
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 07:49:22 -07:00
Derek Collison
2f2440f270 Merge branch 'main' into dev 2023-05-09 20:11:53 -07:00
Derek Collison
b9af0d0294 Only do no-leader stepdown on transfer after a delay if we are still the leader
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-09 17:19:14 -07:00
Ivan Kozlovic
311e3feb5f Merge branch 'main' into dev 2023-05-03 17:38:40 -06:00
Derek Collison
ae73f7be55 Small raft improvements.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-02 16:44:27 -07:00
Derek Collison
0321eb6484 Merge branch 'main' into dev 2023-04-29 19:52:57 -07:00
Derek Collison
546dd0c9ab Make sure we can recover an underlying node being stopped.
Do not return healthy if the node is closed, and wait a bit longer for forward progress.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-29 07:42:23 -07:00
Derek Collison
4ebdb69daf Merge branch 'main' into dev 2023-04-26 11:34:37 -07:00
Derek Collison
3c964a12d7 Migration could be delayed due to transferring leadership while the new leader was still paused.
Also check quicker but slow down if the state we need to have is not there yet.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-25 18:58:49 -07:00
Waldemar Quevedo
d9cc8b0363 fix formatting of raft debug log
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-04-22 07:07:08 +02:00
Derek Collison
3b3fac297a Merge branch 'main' into dev 2023-04-15 14:21:39 -07:00
Derek Collison
a5f5603645 Reset our WAL on edge conditions instead of trying to recover.
Also if we are timing out and trying to become a candidate but are doing a catchup check if we are stalled.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-15 12:23:44 -07:00
Derek Collison
8375ab5cde Merge branch 'main' into dev 2023-04-14 16:44:25 -07:00
Derek Collison
66ca46e145 If we see another leader with same term we should step down
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-14 16:21:40 -07:00
Derek Collison
a319d24345 Merge branch 'main' into dev 2023-04-13 21:03:05 -07:00
Waldemar Quevedo
a4833d0889 Fix raft log debug reloading
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-04-13 14:57:04 -07:00
Derek Collison
808a2e8c90 On failure to send snapshot to follower, also reset, and on reset make sure to reset term
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-12 11:48:22 -07:00
Derek Collison
a92bb9fe61 Fix bad unlock which could cause crash
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-12 11:48:22 -07:00
Derek Collison
340fcc90bc Basic raft tests
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-12 11:48:22 -07:00
Derek Collison
dfeac4a214 Merge branch 'main' into dev 2023-04-09 19:31:01 -07:00
Derek Collison
80a57a3d51 Remove peers from string intern map
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-09 08:01:36 -07:00
Derek Collison
6fa55540a7 Better us of entryPool
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-09 07:48:31 -07:00
Derek Collison
35bb7c1737 Pool CommittedEntries as well with a ReturnToPool() that will also recycle the Entry. Needs to integrate with upper layers
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-08 11:34:10 -07:00
Derek Collison
3be25fdedb Do not put an appendEntryResponse back in the pool if catching up until complete
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-07 10:30:06 -07:00
Derek Collison
2ff6f18ccd Use sync.Map for peers vs internal storage for appendEntryResponses
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-07 08:16:42 -07:00
Derek Collison
1caa56a34f Use pools for appendEntries
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-07 07:38:19 -07:00
Derek Collison
3afdb99f75 Use pools for appendEntryResponses. Also use interior space for peer name from the wire
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-07 06:43:51 -07:00
Derek Collison
ff8701b724 Merge branch 'main' into dev 2023-04-06 08:37:11 -07:00
Derek Collison
e76b0b9b96 Move check for out of resources which would want a read lock out of inline processing
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-05 20:28:19 -07:00
Derek Collison
1ae51b23a9 [ADDED] Multiple routes and ability to have per-account routes (#4001)
New configuration fields:
```
cluster {
   ...
   pool_size: 5
   accounts: ["A", "B"]
}
```

The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.

Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.

Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 15:33:46 -07:00
Derek Collison
b806a8e7e7 Do not opt-out of normal processing for leadership transfers, but make sure they are only processed if explicitly new
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 14:46:55 -07:00
Ivan Kozlovic
105237cba8 [ADDED] Multiple routes and ability to have per-account routes
New configuration fields:
```
cluster {
   ...
   pool_size: 5
   accounts: ["A", "B"]
}
```

The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.

Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.

Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 09:32:25 -06:00
Derek Collison
58ca525b3b Process replicated ack regardless of store update. Delay but still stepdown
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:16 -07:00
Derek Collison
874b2b2e02 Hold the lock while checking health since we could update catchup state.
Do not stepdown right away when executing leadership transfer, wait for the commit.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:08 -07:00