Some operations could cause the route to block due to lock being
held during store operations. On macOS, having lots of streams/consumers
and restarting the cluster would cause lots of concurrent IO that
would cause lock to be held for too long, causing head-of-line
blocking in processing of messages from a route.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If a node falled behind, when catching up with the rest of the
cluster, it is possible that a lot of append entries accumulate
and the server would print warnings such as:
```
[WRN] RAFT [jZ6RvVRH - S-R3F-CQw2ImK6] <some number> append entries pending
```
It would then continously print the following warning:
```
AppendEntry failed to be placed on internal channel
```
When that happens, this node would always be shown with be running the
same number of operations behind (using `nats s info`) if there are
no new messages added to the stream, or an increasing number of
operations if there is still activity.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Along a leaf node connection, unless the system account is shared AND the JetStream domain name is identical, the default JetStream traffic (without a domain set) will be denied.
As a consequence, all clients that wants to access a domain that is not the one in the server they are connected to, a domain name must be specified.
Affected from this change are setups where: a leaf node had no local JetStream OR the server the leaf node connected to had no local JetStream.
One of the two accounts that are connected via a leaf node remote, must have no JetStream enabled.
The side that does not have JetStream enabled, will loose JetStream access and it's clients must set `nats.Domain` manually.
For workarounds on how to restore the old behavior, look at:
https://github.com/nats-io/nats-server/pull/2693#issuecomment-996212582
New config values added:
`default_js_domain` is a mapping from account to domain, settable when JetStream is not enabled in an account.
`extension_hint` are hints for non clustered server to start in clustered mode (and be usable to extend)
`js_domain` is a way to set the JetStream domain to use for mqtt.
Signed-off-by: Matthias Hanel <mh@synadia.com>
When a consumer is configured with "meta-only" option, and the
stream was backed by a memory store, a memory corruption could
happen causing the application to receive corrupted headers.
Also replaced most of usage of `append(a[:0:0], a...)` to make
copies. This was based on this wiki:
https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F
But since Go 1.15, it is actually faster to call make+copy instead.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When encountering errors for sequence mismatches that were benign we were returning an error and not processing the rest of the entries.
This would lead to more severe sequence mismatches later on that would cause stream resets.
Also added code to deal with server restarts and the clfs fixup states which should have been reset properly.
Signed-off-by: Derek Collison <derek@nats.io>
When we stored a message in the raft layer in a wrong position (state corrupt), we would panic, leaving the message there.
On restart we would truncate the WAL and try to repair, but we truncated to the wrong index of the bad entry.
This change also includes additional changes to truncateWAL and also reduces the conditional for panic on storeMsg.
Signed-off-by: Derek Collison <derek@nats.io>
Also if we do not have room trap add peer and process there.
Fixed a bug that would treat ephemerals same as durables during remapping after peer removal.
Signed-off-by: Derek Collison <derek@nats.io>
When processing service imports we would swap out the accounts during processing.
With the addition of internal subscriptions and internal clients publishing in JetStream we had an issue with the wrong account being used.
This was specific to delyaed pull subscribers trying to unsubscribe due to max of 1 while other JetStream API calls were running concurrently.
When a raft group was trying to catch up a consumer but the log is empty and we do have a snapshot but the requested sequence was the first sequence.
Signed-off-by: Derek Collison <derek@nats.io>
If a subject in the system accounts leafnode deny_imports matches $NRG.>
then jetstream is explicitly disconnected and the server can become
leader.
Signed-off-by: Matthias Hanel <mh@synadia.com>
1. When in mixed mode and only running the global account we now will check the account for JS.
2. Added code to decrease the cluster set size if we guessed wrong in mixed mode setup.
Signed-off-by: Derek Collison <derek@nats.io>
We were not properly enforcing server limits. This commit will allow a server to enforce limits but still remain functional even at the JetStream level.
Also fixed a bug for RAFT replay that could cause instability.
Signed-off-by: Derek Collison <derek@nats.io>
The issue was when a state was removed from a server and restarted it would catch up properly.
However upon cluster restart the system could exhibit strange behaviors. This was due to on
catchup not properly creating a meta snapshot when one was received, leaving no meta state to recover.
Signed-off-by: Derek Collison <derek@nats.io>
1. With snapshots being installed under heavy load.
2. Running catchup and missing responses due to bug in chan size for catchup.
3. various other tweaks.
Signed-off-by: Derek Collison <derek@nats.io>