Derek Collison
a205f8f2de
Fix for updating peers and quorum sizes.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-14 15:31:29 -07:00
Ivan Kozlovic
5072649540
Make sure to properly add peer after failure
...
Signed-off-by: Ivan Kozlovic <ivan@synadia.com >
2021-03-14 15:32:12 -06:00
Derek Collison
5f78a44191
Fixed several bugs.
...
1. With snapshots being installed under heavy load.
2. Running catchup and missing responses due to bug in chan size for catchup.
3. various other tweaks.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-14 11:38:22 -07:00
Derek Collison
3c85df0a44
Truncate up to entry, no need for previous
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-14 05:18:52 -07:00
Derek Collison
a3a35c0ddb
Updated raft processing and dealing with remove peer.
...
Made sure to not remove us if we were remapped after the peer removal.
Fixed some raft behaviors.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-13 16:28:24 -05:00
Derek Collison
e1dd41e326
Do not fail to start with small cluster sizes.
...
Short ciruit full wait for leaders if we were a leadership xfer of preferred.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-13 16:26:26 -05:00
Derek Collison
a4e84ad781
Merge pull request #1995 from wallyqs/cluster-size-check
...
raft: Fixes to cluster size check for streams
2021-03-12 05:59:51 -06:00
Waldemar Quevedo
60817932a6
raft: Fixes to cluster size check for streams
...
Signed-off-by: Waldemar Quevedo <wally@synadia.com >
2021-03-11 23:28:57 -08:00
Matthias Hanel
b316cccfd1
Fixed a quorum formation issue that caused truncation
...
When a new leader is elected it has to give everyone a chance to reply,
so that we can observe rejections with higher term.
The maximum election timeout is 7.5 seconds.
The new behavior of waiting for the election timeout caused unit tests
to fail. Hence upping the timeout there as well.
Signed-off-by: Matthias Hanel <mh@synadia.com >
2021-03-11 19:44:47 -05:00
Derek Collison
e5e8205fac
Need to make sure order of clseq as stamped also make it to the propose chan.
...
However we do not want to hold the actual stream lock.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-09 00:34:33 -06:00
Ivan Kozlovic
57af977548
Stabilized stream sources under restart
...
Signed-off-by: Ivan Kozlovic <ivan@synadia.com >
2021-03-08 16:41:02 -07:00
Derek Collison
467614ea87
Make sure to check for old messages during processsing.
...
Also changed the way we detect old messages.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-08 15:42:03 -06:00
Derek Collison
673543c180
Modified flow control for clustered mode.
...
Set channels into and out of RAFT layers to block.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-08 12:58:57 -06:00
Derek Collison
c22627d4c1
Make sure to look in the WAL if not found in pae
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-06 18:01:00 -06:00
Derek Collison
31df41700b
Protect against divzero
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-06 08:54:19 -08:00
Derek Collison
d31fda5dac
Added code to constrain size of WAL under most scenarios.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-06 08:38:56 -08:00
Derek Collison
63b620972d
Under heavy load retreiving the append entry from the WAL
...
while trying to also send new append entries was causing contention.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-06 07:50:11 -08:00
Derek Collison
0b3c686430
Fixes for data races and some locking.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-05 17:19:51 -08:00
Derek Collison
7e2b2a1033
Allow an option to push based consumers to have idle heartbeats delivered.
...
This allows an endpoint to know the consumer is still alive.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-05 05:48:00 -08:00
Derek Collison
bfb8e3432e
Move RAFT comms off internal sendq.
...
Move route and gateway msgs our of fast path for inbound stream msgs.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-04 14:45:34 -08:00
Derek Collison
d7201a110b
Better handling on out of disk.
...
Suppress some stream and consumer bad results since they delete the asset.
Allow rehup to re-enable JetStream.
Various bug fixes and improvements.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-03 20:12:10 -08:00
Ivan Kozlovic
0f53bf6580
Fixed data race with nodeInfo
...
Took the approach of storing struct instead of pointer. Of course,
when changing the offline bool from false to true, it means that
we need to call Store again (with same key).
This is based on the assumption that those Load/Store are not too
frequent. Otherwise, we may need to use locking (and keep *nodeInfo)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com >
2021-03-03 13:28:45 -07:00
Derek Collison
49cd38c064
Enable cross account behaviors for mirrors and sources.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-02 06:36:57 -08:00
Derek Collison
e0353479ad
Progress updates could potentially block on channels, this cleans that up.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-01 13:52:49 -08:00
Derek Collison
c0729a1309
Move processing of append entry response out of route path.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-01 05:57:15 -08:00
Derek Collison
a8db1d7322
Write snapshots without lock held
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-01 04:42:11 -08:00
Derek Collison
74b416afa1
Moved back to channel handling of append entry to avoid inline processing with disk IO in route path.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-28 18:34:24 -08:00
Derek Collison
0d29b0761a
Tweaked buffered channels, moved locks for snapshots.
...
Also placed debug for inline processing of append entries.
This is for removal of that inline.
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-28 05:16:04 -08:00
R.I.Pienaar
a4817bd7b6
extend the out of space advisory
...
Signed-off-by: R.I.Pienaar <rip@devco.net >
2021-02-26 11:10:05 +01:00
Derek Collison
98f98e214b
Properly support memory based WALs
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-25 19:49:54 -08:00
Derek Collison
0f69e48511
Bug check err, check for out of space on catchup
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-25 18:25:16 -08:00
Derek Collison
b13ef6b9ec
Track write errors. Fixed a few bugs.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-25 17:53:20 -08:00
Derek Collison
a862cc75cc
Suppress raft campaigns on restart. Extend election timeout interval.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-25 04:14:14 -08:00
Derek Collison
73ba2d0b2f
File writes to term and vote and peerstate were in the direct route path and could cause delays.
...
This moves the actual writes to a separate Go routine and also allows multiple writes to
be compressed into one write under load. We only want latest.
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-24 20:47:31 -08:00
Derek Collison
78bdc34637
General stability improvements. Fixes to subscription state not cleaning up.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-24 08:44:34 -08:00
Ivan Kozlovic
1652fe62ef
Updates to when do snapshot
...
Remove panic on runAsLeader when not able to subscribe (which happens
on shutdown)
Gateway name access does not need lock since it is immutable. Will
prevent deadlocks in some situations.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com >
2021-02-23 19:06:07 -07:00
Derek Collison
8fe8b835fe
Fixes for flapping tests
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-23 14:08:17 -08:00
Derek Collison
c39641c263
Tweak hb and election times, fix unsubscribe leak
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-23 10:57:05 -08:00
Derek Collison
fa8a74ceb5
Allow placement directives for metacontroller stepdown to allow placement to new clusters.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-19 10:55:22 -08:00
Derek Collison
9de18dfefe
Removed unused function
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-18 18:35:44 -08:00
Derek Collison
048011d7f1
Split vote improvements
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-18 18:29:18 -08:00
Derek Collison
89fe3b05df
various bug fixes, wal/snapshot stability
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-18 08:41:09 -08:00
Derek Collison
e21c7097f3
General stability improvements.
...
Original thought to move to memory based WALs was ill-advised and caused issues with stability around restarts.
Returned to file based but with async flush for the WAL itself.
Also the raft inline catchup has been improved.
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-17 19:56:16 -08:00
Derek Collison
765b9ad57a
Some stability improvements to raft lib and catchup stream processing.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-16 20:30:12 -08:00
Derek Collison
ddc4cc79d2
Make sure to not process AR when no longer leader
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-16 15:58:46 -08:00
Derek Collison
0dcb006968
Handle AppendEntry reponse inline, lower outstanding on catchup to stabilize
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-16 13:24:09 -08:00
Derek Collison
4c6e33c9c6
Restoration of streams would possibly block route and client connections.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-14 18:43:40 -08:00
Derek Collison
f0cfc187d2
Set pindex to wrong setting on snapshot restore with no WAL
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-13 06:50:50 -08:00
Derek Collison
4759560e29
Fixed raft bug on catchup logic with external snapshots
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-12 19:58:02 -08:00
Derek Collison
579737a5e1
General fixes, stability improvements
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-02-11 18:13:24 -08:00