Commit Graph

60 Commits

Author SHA1 Message Date
Derek Collison
0d29b0761a Tweaked buffered channels, moved locks for snapshots.
Also placed debug for inline processing of append entries.
This is for removal of that inline.

Signed-off-by: Derek Collison <derek@nats.io>
2021-02-28 05:16:04 -08:00
R.I.Pienaar
a4817bd7b6 extend the out of space advisory
Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-02-26 11:10:05 +01:00
Derek Collison
98f98e214b Properly support memory based WALs
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-25 19:49:54 -08:00
Derek Collison
0f69e48511 Bug check err, check for out of space on catchup
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-25 18:25:16 -08:00
Derek Collison
b13ef6b9ec Track write errors. Fixed a few bugs.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-25 17:53:20 -08:00
Derek Collison
a862cc75cc Suppress raft campaigns on restart. Extend election timeout interval.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-25 04:14:14 -08:00
Derek Collison
73ba2d0b2f File writes to term and vote and peerstate were in the direct route path and could cause delays.
This moves the actual writes to a separate Go routine and also allows multiple writes to
be compressed into one write under load. We only want latest.

Signed-off-by: Derek Collison <derek@nats.io>
2021-02-24 20:47:31 -08:00
Derek Collison
78bdc34637 General stability improvements. Fixes to subscription state not cleaning up.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-24 08:44:34 -08:00
Ivan Kozlovic
1652fe62ef Updates to when do snapshot
Remove panic on runAsLeader when not able to subscribe (which happens
on shutdown)
Gateway name access does not need lock since it is immutable. Will
prevent deadlocks in some situations.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-02-23 19:06:07 -07:00
Derek Collison
8fe8b835fe Fixes for flapping tests
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-23 14:08:17 -08:00
Derek Collison
c39641c263 Tweak hb and election times, fix unsubscribe leak
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-23 10:57:05 -08:00
Derek Collison
fa8a74ceb5 Allow placement directives for metacontroller stepdown to allow placement to new clusters.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-19 10:55:22 -08:00
Derek Collison
9de18dfefe Removed unused function
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-18 18:35:44 -08:00
Derek Collison
048011d7f1 Split vote improvements
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-18 18:29:18 -08:00
Derek Collison
89fe3b05df various bug fixes, wal/snapshot stability
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-18 08:41:09 -08:00
Derek Collison
e21c7097f3 General stability improvements.
Original thought to move to memory based WALs was ill-advised and caused issues with stability around restarts.
Returned to file based but with async flush for the WAL itself.
Also the raft inline catchup has been improved.

Signed-off-by: Derek Collison <derek@nats.io>
2021-02-17 19:56:16 -08:00
Derek Collison
765b9ad57a Some stability improvements to raft lib and catchup stream processing.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-16 20:30:12 -08:00
Derek Collison
ddc4cc79d2 Make sure to not process AR when no longer leader
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-16 15:58:46 -08:00
Derek Collison
0dcb006968 Handle AppendEntry reponse inline, lower outstanding on catchup to stabilize
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-16 13:24:09 -08:00
Derek Collison
4c6e33c9c6 Restoration of streams would possibly block route and client connections.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-14 18:43:40 -08:00
Derek Collison
f0cfc187d2 Set pindex to wrong setting on snapshot restore with no WAL
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-13 06:50:50 -08:00
Derek Collison
4759560e29 Fixed raft bug on catchup logic with external snapshots
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-12 19:58:02 -08:00
Derek Collison
579737a5e1 General fixes, stability improvements
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-11 18:13:24 -08:00
Derek Collison
fa8a95a06a Improved snapshots and compactions.
Various bug fixes and stability improvements.

Signed-off-by: Derek Collison <derek@nats.io>
2021-02-11 11:16:00 -08:00
Derek Collison
92d64c2bcc Reset WAL on mismatch catchup regardless, condition ok
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-07 09:30:13 -08:00
Derek Collison
a16affedca Always reset election timeout on vote request
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-07 08:09:01 -08:00
Derek Collison
74a4c531c9 Stability improvements.
Changes to catchup logic, peer tracking, and vote responses.

Signed-off-by: Derek Collison <derek@nats.io>
2021-02-06 20:13:18 -08:00
Derek Collison
c49e3247bb Purge operations would be replayed on restart regardless if they had already been processed.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-04 07:04:50 -08:00
Derek Collison
a1e0f7dc1a First pass at supercluster enablement.
This allows metacontrollers to span superclusters. Also includes placement directives for streams. By default they select the request origin cluster.

Signed-off-by: Derek Collison <derek@nats.io>
2021-02-03 17:28:13 -08:00
Derek Collison
a8982c040f Suppress lost quorum processing if to close to raft node creation time.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-02 06:27:07 -08:00
Derek Collison
f3703a4b85 Make sure audit events have the proper subject regardless of where processed.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-02 05:03:20 -08:00
Derek Collison
e5c1d65fff Added in JS disable per server on reload. Also removing peerw from a stream and leader stepdown for streams and consumers.
Various bug fixes, stability improvments.

Signed-off-by: Derek Collison <derek@nats.io>
2021-02-01 19:39:08 -08:00
Derek Collison
2b0717bde2 Make debug not error since we recover
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-30 14:00:26 -08:00
Derek Collison
9b20d5c888 Fixed bug on raft inline cacthup when apply channel was full.
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-30 13:22:27 -08:00
Derek Collison
457ca3b9cf Suppress additional advisories on server restart and leadership changes.
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-29 15:08:22 -08:00
Derek Collison
9d4951d2bb Updated lost quorum signalling to be less fragile.
We will now alert when the old leader detects a lost quorum just as before, but also detect if a candidate is flapping and failing to get votes because of no quorum.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-28 09:27:17 -08:00
Derek Collison
8b79114168 Add in advisories for leader elected and quorum lost advisories.
Note that quorum lost only fires if the old leader steps down.
If the leader itself fails and that causes the loss of quorum currently no advisory is sent.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-28 08:37:54 -08:00
Derek Collison
a9b8948abe Add in tracking for quorum in raft and do auto stepdown.
Also added in API responses when no leader is present for meta, streams and consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-27 13:34:00 -08:00
Derek Collison
c0ae719629 Don't load entry for snapshot, fix data race
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-26 19:26:03 -08:00
Derek Collison
054319a662 Fix for split vote bug
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-26 14:59:13 -08:00
Derek Collison
3e8d295239 Make sure to not go backwards on applied or commit indexes
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-26 14:07:52 -08:00
Derek Collison
bcd38bba96 Make sure stepdown logic does not block system
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 19:20:10 -08:00
Derek Collison
d278996272 LDM trigger to move raft leaders
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 16:52:19 -08:00
Derek Collison
7eb6d07bfc On stepdown still process appendEntry
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 14:32:24 -08:00
Derek Collison
7d8c3eaa6e Don't pre-vote, causes flapping on split vote
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 13:49:20 -08:00
Derek Collison
5148bbf898 Fixes based on PR feedback, cleanup
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 10:04:21 -08:00
Derek Collison
7b1e84c086 Fixed raft bug that would cause entries to be missed on restart with leader HB trigger.
Also added in creation times to stream and consumer assignments to make them consistent.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 08:47:37 -08:00
Derek Collison
117607ef11 Fix for race and test for issue R.I. was seeing in nightly. Also fixed flappers.
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-24 21:21:02 -08:00
Derek Collison
9c858d197a Added ability to properly restore consumers from a snapshot.
This made us add forwarding proposals functionality in the raft layer.
More general cleanup and bug fixes as well.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-24 19:30:34 -08:00
Derek Collison
cad0db2aec Cleanup the consumer assignments when consumers become inactive.
This involved extending our raft implementation to forward proposals to the current leader.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-23 13:44:10 -08:00