Commit Graph

146 Commits

Author SHA1 Message Date
Derek Collison
2ac05785c3 Do not persist or snapshot consumer state after a restore.
This can lead to a data race and is not needed after being applied.

Signed-off-by: Derek Collison <derek@nats.io>
2021-04-21 18:50:38 -07:00
Derek Collison
c9c70dea33 Fix race
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-21 16:17:58 -07:00
Derek Collison
3418847881 Merge pull request #2146 from nats-io/chblock
Make sure to not have the raft layer block on apply channel on exit.
2021-04-21 15:58:50 -07:00
Derek Collison
0678e649d3 Make sure to not have the raft layer block on apply channel on exit.
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-21 15:52:54 -07:00
Derek Collison
50fabe261d Check for overlapping subjects on stream update.
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-21 15:38:38 -07:00
Derek Collison
a181238cf0 Fix for consumer on restore being deleted
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-21 06:54:54 -07:00
Derek Collison
518ff9be14 Concurrent multiple durable subscribers would cause unpredictable behaviors.
Upgraded to current Go client.

Signed-off-by: Derek Collison <derek@nats.io>
2021-04-20 19:50:24 -07:00
Derek Collison
902b9dec12 Merge pull request #2131 from nats-io/updates
General Updates and Stability Improvements
2021-04-20 13:52:39 -07:00
Derek Collison
68ddd519d2 Process upstream missing messages for mirrors better.
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-19 20:15:21 -07:00
Matthias Hanel
b73be52862 [fixed] only become observer if the leaf config has raft not restricted (#2125)
If a subject in the system accounts leafnode deny_imports matches $NRG.>
then jetstream is explicitly disconnected and the server can become
leader.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-04-19 13:10:49 -04:00
Derek Collison
542adc4bc3 Make sure clseq does not fall below lseq
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-18 18:47:33 -07:00
Derek Collison
6a7f3a3153 Cleanup error handling, fix deadlock in test
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-16 13:56:54 -07:00
Derek Collison
f6a82a7c98 When messages were no longer available in an upstream stream a mirror could wedge and not resolve.
This fixes that scenario by detecting the situation and inserting skip msgs to catch up.

Signed-off-by: Derek Collison <derek@nats.io>
2021-04-13 11:46:03 -07:00
Derek Collison
755ef74855 When a cluser of leafnodes connects to a cluster or supercluster hub and they share the system account make the leafnode servers observers.
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-12 17:00:55 -07:00
Derek Collison
0cee993e3b When checking cluster size we need to make sure we have heard from all peers before making adjustments.
Also check back periodically.

Signed-off-by: Derek Collison <derek@nats.io>
2021-04-10 15:55:51 -07:00
Derek Collison
27d8b939b5 Updated based on comments that the one fix was actually a misconfiguration.
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-09 16:49:24 -07:00
Derek Collison
e438d2f5fa Mixed mode improvements.
1. When in mixed mode and only running the global account we now will check the account for JS.
2. Added code to decrease the cluster set size if we guessed wrong in mixed mode setup.

Signed-off-by: Derek Collison <derek@nats.io>
2021-04-09 14:58:35 -07:00
Derek Collison
1ea4a430da If we fail to load an account while processing a stream assignment, send error back to metaleader.
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-07 14:23:12 -07:00
Derek Collison
44ada49b16 During repeated server restarts or failures consumer state could drift between replicas.
We now make sure to sync state of the replicas when a new leader takes over. We also update ack floors regardless of detection on pending list.

Signed-off-by: Derek Collison <derek@nats.io>
2021-04-02 08:20:29 -07:00
Matthias Hanel
cd602231ac [Fixed] missing unlock and added a warning trace (#2054)
* [Fixed] missing unlock and added a warning trace

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-03-31 19:22:19 -04:00
Derek Collison
bb7a8a5f79 Introduced default max ack pending for ack explicit.
Fixed a bug that would introduce performance degradation for durable consumers R>1.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-30 11:47:24 -07:00
Derek Collison
5a48369b4b Make sure to not delete streams on bad updates.
If an update was asssigned but failed at the stream group server we would send back the result which would always delete the stream.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-29 07:35:30 -07:00
Derek Collison
c564b18482 Protect against negative
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-26 05:28:00 -07:00
Derek Collison
5d6fe9e4b0 Check for subject overlaps after check for pre-existing
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-25 19:00:15 -07:00
Derek Collison
5d5de5925f Introduce a previous leader state in the raft layer to allow quicker responses when leaderless.
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-25 17:08:29 -07:00
Derek Collison
e53caee5e8 Enforce server limits even when dynamic limits for accounts in play.
We were not properly enforcing server limits. This commit will allow a server to enforce limits but still remain functional even at the JetStream level.
Also fixed a bug for RAFT replay that could cause instability.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-25 16:06:27 -07:00
Derek Collison
a627db9fc8 Do not request streaminfo from streams that are completely offline.
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-24 10:26:09 -07:00
Derek Collison
06803dafbf Tweak seq tracking for flow control, also fixup code
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-24 09:46:54 -07:00
Derek Collison
2ed53035ed Reworked flow control for sources and mirrors.
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-24 07:07:33 -07:00
Derek Collison
a75e8f8c80 Fix for an issue with multiple restarts that showed stalled and sometimes lost streams.
The issue was when a state was removed from a server and restarted it would catch up properly.
However upon cluster restart the system could exhibit strange behaviors. This was due to on
catchup not properly creating a meta snapshot when one was received, leaving no meta state to recover.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-22 20:06:38 -07:00
Derek Collison
0f548edcc6 Reduce sliding window for direct consumers and catchup stream windows.
Remove another possible wire blocking operation in raft.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-21 09:24:27 -07:00
Derek Collison
faa6dc85eb Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-20 11:16:40 -07:00
Derek Collison
8eefff2b3b Make sure the jetstream accounts use the name as the key to the map.
This prevents possible double adds under reload or restart scenarios.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-18 17:29:26 -07:00
Derek Collison
ee92cc9a5b Properly print when a stream is doing out of band catchup. Print node banner consistently
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-14 07:29:36 -07:00
Derek Collison
cbbe6dc9c5 Make API access determing system not available consistent.
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-14 06:18:04 -07:00
Derek Collison
2fa8668dd9 Only snap if needed
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-13 16:30:58 -05:00
Derek Collison
a3a35c0ddb Updated raft processing and dealing with remove peer.
Made sure to not remove us if we were remapped after the peer removal.
Fixed some raft behaviors.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-13 16:28:24 -05:00
Derek Collison
2fb2ced712 Removed unused functions
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-13 16:28:24 -05:00
Derek Collison
299f44cddf This changes our behaviors for streams and peer removals in several ways.
First we no longer try to auto-remap stream assignments on peer removals from the system.
We also now can always respond to stream info requests if at least a member is running.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-11 06:52:28 -05:00
Derek Collison
01404b3dc9 Protect against cluster and meta being gone
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-10 22:33:39 -05:00
Derek Collison
e5e8205fac Need to make sure order of clseq as stamped also make it to the propose chan.
However we do not want to hold the actual stream lock.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-09 00:34:33 -06:00
Derek Collison
673543c180 Modified flow control for clustered mode.
Set channels into and out of RAFT layers to block.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-08 12:58:57 -06:00
Derek Collison
d31fda5dac Added code to constrain size of WAL under most scenarios.
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-06 08:38:56 -08:00
Ivan Kozlovic
4e3b79f62b monitorConsumer perform snapshot similar to monitorStream
Changed the stream min size default value back to 32MB and removed
the one for consumer since we don't use it anymore but set the
count size same than for stream (8192).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-03-05 19:02:41 -07:00
Derek Collison
0b3c686430 Fixes for data races and some locking.
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-05 17:19:51 -08:00
Derek Collison
dd8acb1a99 Fixed a bug where we were not determing clustered state so were straight processing msgs from routes.
Cleaned up lseq and clseq code.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-05 12:00:19 -08:00
Derek Collison
7b1b9a7946 Snapshot on peer state change, e.g. removal
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-04 18:52:57 -08:00
Derek Collison
207ebd3b3d Changed stream sendq to linked list outq.
Made consumer share streams outq.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-04 17:19:50 -08:00
Derek Collison
d7201a110b Better handling on out of disk.
Suppress some stream and consumer bad results since they delete the asset.
Allow rehup to re-enable JetStream.
Various bug fixes and improvements.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-03 20:12:10 -08:00
Ivan Kozlovic
0f53bf6580 Fixed data race with nodeInfo
Took the approach of storing struct instead of pointer. Of course,
when changing the offline bool from false to true, it means that
we need to call Store again (with same key).

This is based on the assumption that those Load/Store are not too
frequent. Otherwise, we may need to use locking (and keep *nodeInfo)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-03-03 13:28:45 -07:00