Derek Collison
f13fa767c2
Remove the swapping of accounts during processing of service imports.
...
When processing service imports we would swap out the accounts during processing.
With the addition of internal subscriptions and internal clients publishing in JetStream we had an issue with the wrong account being used.
This was specific to delyaed pull subscribers trying to unsubscribe due to max of 1 while other JetStream API calls were running concurrently.
2021-07-26 07:57:10 -07:00
Derek Collison
6eef31c0fc
Fixed peer info reports that had large last active values.
...
Also put in safety for lag going upside down as well.
Signed-off-by: Derek Collison <derek@nats.io >
2021-07-06 10:14:43 -07:00
Derek Collison
5ec0f291a6
When we got into certain situations where we are catching up but the first entry matches the index but not the term, we would not update term.
...
This would cause CPU spikes and catchup cycles that could spin.
Signed-off-by: Derek Collison <derek@nats.io >
2021-06-11 15:02:46 -07:00
Derek Collison
9ccc843382
Removing peers should wait for RemovePeer entry replication.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-05-19 18:58:19 -07:00
Derek Collison
57395bba02
Fixed bug that could cause raft group to spin trying to catchup.
...
When a raft group was trying to catch up a consumer but the log is empty and we do have a snapshot but the requested sequence was the first sequence.
Signed-off-by: Derek Collison <derek@nats.io >
2021-05-07 09:13:18 -07:00
Derek Collison
db402cc444
Under heavy load and a leader change we could warn about not processing responses.
...
This also adjust the min election timeout to 2 seconds vs just 1 for very large networks.
Signed-off-by: Derek Collison <derek@nats.io >
2021-05-03 19:40:40 -07:00
scottf
486df98373
close tempfiles, fix path print
2021-04-22 12:47:21 -04:00
Waldemar Quevedo
c9ab7ce8a1
Fix for data race when disabling JS running out of resources
...
Signed-off-by: Waldemar Quevedo <wally@synadia.com >
2021-04-21 14:26:52 -07:00
Derek Collison
902b9dec12
Merge pull request #2131 from nats-io/updates
...
General Updates and Stability Improvements
2021-04-20 13:52:39 -07:00
Matthias Hanel
b73be52862
[fixed] only become observer if the leaf config has raft not restricted ( #2125 )
...
If a subject in the system accounts leafnode deny_imports matches $NRG.>
then jetstream is explicitly disconnected and the server can become
leader.
Signed-off-by: Matthias Hanel <mh@synadia.com >
2021-04-19 13:10:49 -04:00
Derek Collison
1dd7e8c7d1
Increase apply channel size
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-04-16 14:00:46 -07:00
Derek Collison
8e82f36c5b
Track removed peers properly
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-04-14 20:29:09 -07:00
Derek Collison
cf34514f9f
Do not limit expansion of new peers
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-04-14 18:47:11 -07:00
Derek Collison
755ef74855
When a cluser of leafnodes connects to a cluster or supercluster hub and they share the system account make the leafnode servers observers.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-04-12 17:00:55 -07:00
Derek Collison
69269c5653
Merge pull request #2095 from nats-io/mixed
...
Mixed mode improvements.
2021-04-09 16:56:41 -07:00
Jaime Piña
27e9628c3a
Run gofmt -s to simplify code
2021-04-09 15:18:06 -07:00
Derek Collison
e438d2f5fa
Mixed mode improvements.
...
1. When in mixed mode and only running the global account we now will check the account for JS.
2. Added code to decrease the cluster set size if we guessed wrong in mixed mode setup.
Signed-off-by: Derek Collison <derek@nats.io >
2021-04-09 14:58:35 -07:00
Derek Collison
14a826fb60
Check for entries going negative. Shutdown in place on server exit
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-30 11:46:15 -07:00
Derek Collison
327d913ae1
Under rare scenarios we could fail to load, but this should not be a panic.
...
We should recover on the lines below.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-29 07:34:28 -07:00
Derek Collison
0f71c260fb
Durable consumers with R>1 had performance challenges.
...
This code changes the way we handle raft based proposals for consumers.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-26 12:53:49 -07:00
Derek Collison
5d5de5925f
Introduce a previous leader state in the raft layer to allow quicker responses when leaderless.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-25 17:08:29 -07:00
Derek Collison
e53caee5e8
Enforce server limits even when dynamic limits for accounts in play.
...
We were not properly enforcing server limits. This commit will allow a server to enforce limits but still remain functional even at the JetStream level.
Also fixed a bug for RAFT replay that could cause instability.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-25 16:06:27 -07:00
Derek Collison
a75e8f8c80
Fix for an issue with multiple restarts that showed stalled and sometimes lost streams.
...
The issue was when a state was removed from a server and restarted it would catch up properly.
However upon cluster restart the system could exhibit strange behaviors. This was due to on
catchup not properly creating a meta snapshot when one was received, leaving no meta state to recover.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-22 20:06:38 -07:00
Derek Collison
0f548edcc6
Reduce sliding window for direct consumers and catchup stream windows.
...
Remove another possible wire blocking operation in raft.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-21 09:24:27 -07:00
Derek Collison
04a9d51035
Fix for data race
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-20 07:15:36 -07:00
Derek Collison
a205f8f2de
Fix for updating peers and quorum sizes.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-14 15:31:29 -07:00
Ivan Kozlovic
5072649540
Make sure to properly add peer after failure
...
Signed-off-by: Ivan Kozlovic <ivan@synadia.com >
2021-03-14 15:32:12 -06:00
Derek Collison
5f78a44191
Fixed several bugs.
...
1. With snapshots being installed under heavy load.
2. Running catchup and missing responses due to bug in chan size for catchup.
3. various other tweaks.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-14 11:38:22 -07:00
Derek Collison
3c85df0a44
Truncate up to entry, no need for previous
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-14 05:18:52 -07:00
Derek Collison
a3a35c0ddb
Updated raft processing and dealing with remove peer.
...
Made sure to not remove us if we were remapped after the peer removal.
Fixed some raft behaviors.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-13 16:28:24 -05:00
Derek Collison
e1dd41e326
Do not fail to start with small cluster sizes.
...
Short ciruit full wait for leaders if we were a leadership xfer of preferred.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-13 16:26:26 -05:00
Derek Collison
a4e84ad781
Merge pull request #1995 from wallyqs/cluster-size-check
...
raft: Fixes to cluster size check for streams
2021-03-12 05:59:51 -06:00
Waldemar Quevedo
60817932a6
raft: Fixes to cluster size check for streams
...
Signed-off-by: Waldemar Quevedo <wally@synadia.com >
2021-03-11 23:28:57 -08:00
Matthias Hanel
b316cccfd1
Fixed a quorum formation issue that caused truncation
...
When a new leader is elected it has to give everyone a chance to reply,
so that we can observe rejections with higher term.
The maximum election timeout is 7.5 seconds.
The new behavior of waiting for the election timeout caused unit tests
to fail. Hence upping the timeout there as well.
Signed-off-by: Matthias Hanel <mh@synadia.com >
2021-03-11 19:44:47 -05:00
Derek Collison
e5e8205fac
Need to make sure order of clseq as stamped also make it to the propose chan.
...
However we do not want to hold the actual stream lock.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-09 00:34:33 -06:00
Ivan Kozlovic
57af977548
Stabilized stream sources under restart
...
Signed-off-by: Ivan Kozlovic <ivan@synadia.com >
2021-03-08 16:41:02 -07:00
Derek Collison
467614ea87
Make sure to check for old messages during processsing.
...
Also changed the way we detect old messages.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-08 15:42:03 -06:00
Derek Collison
673543c180
Modified flow control for clustered mode.
...
Set channels into and out of RAFT layers to block.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-08 12:58:57 -06:00
Derek Collison
c22627d4c1
Make sure to look in the WAL if not found in pae
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-06 18:01:00 -06:00
Derek Collison
31df41700b
Protect against divzero
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-06 08:54:19 -08:00
Derek Collison
d31fda5dac
Added code to constrain size of WAL under most scenarios.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-06 08:38:56 -08:00
Derek Collison
63b620972d
Under heavy load retreiving the append entry from the WAL
...
while trying to also send new append entries was causing contention.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-06 07:50:11 -08:00
Derek Collison
0b3c686430
Fixes for data races and some locking.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-05 17:19:51 -08:00
Derek Collison
7e2b2a1033
Allow an option to push based consumers to have idle heartbeats delivered.
...
This allows an endpoint to know the consumer is still alive.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-05 05:48:00 -08:00
Derek Collison
bfb8e3432e
Move RAFT comms off internal sendq.
...
Move route and gateway msgs our of fast path for inbound stream msgs.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-04 14:45:34 -08:00
Derek Collison
d7201a110b
Better handling on out of disk.
...
Suppress some stream and consumer bad results since they delete the asset.
Allow rehup to re-enable JetStream.
Various bug fixes and improvements.
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-03 20:12:10 -08:00
Ivan Kozlovic
0f53bf6580
Fixed data race with nodeInfo
...
Took the approach of storing struct instead of pointer. Of course,
when changing the offline bool from false to true, it means that
we need to call Store again (with same key).
This is based on the assumption that those Load/Store are not too
frequent. Otherwise, we may need to use locking (and keep *nodeInfo)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com >
2021-03-03 13:28:45 -07:00
Derek Collison
49cd38c064
Enable cross account behaviors for mirrors and sources.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-02 06:36:57 -08:00
Derek Collison
e0353479ad
Progress updates could potentially block on channels, this cleans that up.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-01 13:52:49 -08:00
Derek Collison
c0729a1309
Move processing of append entry response out of route path.
...
Signed-off-by: Derek Collison <derek@nats.io >
2021-03-01 05:57:15 -08:00