During elected stepdown and transfer allow the new leader to take over before we stepdown.
We could receive a leader change, so make sure to also check migration state.
Signed-off-by: Derek Collison <derek@nats.io>
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The system will allow an update to a stream, and subsequently all attached consumers, to be placed in another cluster either directly or via tag placement.
The meta layer will scale the underlying peerset appropriately to straddle the two clusters for both the stream and consumers, taking into account the consumer type.
Control will then pass to the current leaders of the assets who will monitor the catchup status of the new peers.
(Note we can optimize this later to only traverse once across a GW for any given asset, but for now this is simpler)
Once the original leaders have determined the assets are synched it will pass leadership to a member of the new peerset.
Once the new leader has been elected, it will forward a request for the meta layer to shrink the peerset by removing the old peers.
Signed-off-by: Derek Collison <derek@nats.io>
Some warnings, especially when dealing with JS limits that were
printed on a per-message basis, are now limited to ~1 per second
if the content of the warning is already found in a map.
This is also for "client" warnings, but the client porting of the
warning is not taken into account so that helps with reducing logging
for similar content, but coming from different clients.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
adds unit test to test this scenario
improves reporting of correct error
only show info for non existing tiers where streams exist
Signed-off-by: Matthias Hanel <mh@synadia.com>
* Adding server limits (max ack pending/dedupe window) to js config
Also shifting consumer config check to jsConsumerCreate as in clustered
mode this was enforced in the wrong place
Signed-off-by: Matthias Hanel <mh@synadia.com>
Also fixed a bug where we were incorrectly not spining up the monitoring loop for a stream when going from 3->1->3.
Signed-off-by: Derek Collison <derek@nats.io>
Would possibly show up when a consumer leader changes for a consumer
that had redelivered messages and for instance messages were inbound
on the stream.
Resolves#2912
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.
The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.
Also fixed some flappers and a bug where de-dupe might not be reformed correctly.
Signed-off-by: Derek Collison <derek@nats.io>
This was introduced by the change for ipQueues in #2931.
The (*ipQueue).unregister() was written with a protection for
the ipQueue to be nil, however, mset.outq is actually not a bare
ipQueue but a jsOutQ that embeds a pointer to an ipQueue. So we
need to implement register() for jsOutQ.
Added a test that reproduced the issue, but found it with a flapping
test (TestJetStreamLongStreamNamesAndPubAck) that failed due to
a file name too long.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Removal of a stream source that was external was not working properly,
allowing messages to still flow after the removal and until the
server hosting the stream to which the source was removed was
restarted.
Resolves#2920
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.
Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The "deleted" advisory was missing because the stream's send loop
was closed before the advisory was pushed to the queue to be sent.
Added tests, both for single and clustered mode to test all stream
advisories.
Resolves#2886
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This change allows a bit better logging on startup to more easily map a RAFT log directory etc to the stream/consumer.
Signed-off-by: Derek Collison <derek@nats.io>
This should help with GC pressure, however, it may have an effect
on performance (based on some benchmark). Calling sync.Pool.Get/Put
too often has a performance impact...
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Under load we could have a message committed to the underlying store when a consumer was being created and then it increase num pending again when the stream signals the consumers.
This fix just remembers the last seq of the state when we calculate sgap and test before adding in the stream code.
Signed-off-by: Derek Collison <derek@nats.io>
This allows stream placement to overflow to adjacent clusters.
We also do more balanced placement based on resources (store or mem). We can continue to expand this as well.
We also introduce an account requirement that stream configs contain a MaxBytes value.
We now track account limits and server limits more distinctly, and do not reserver server resources based on account limits themselves.
Signed-off-by: Derek Collison <derek@nats.io>
Actually faster to not track at all and generate on the fly. Saves lots of memory too.
When we update the stream state to include runs, etc will update this as well.
Signed-off-by: Derek Collison <derek@nats.io>
This will patch them on the fly during recovery. Specifically subjects with leading or trailing spaces and mirror streams with any subjects at all.
Signed-off-by: Derek Collison <derek@nats.io>
When a consumer is configured with "meta-only" option, and the
stream was backed by a memory store, a memory corruption could
happen causing the application to receive corrupted headers.
Also replaced most of usage of `append(a[:0:0], a...)` to make
copies. This was based on this wiki:
https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F
But since Go 1.15, it is actually faster to call make+copy instead.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
1. When a snapshot did not yield actionable data, we were not setting new last sequence if we have to readjust based on snapshot. This could lead to spinning on stream reset for followers.
2. When a stream has lots of failures by design, like KV abstraction, if we cleared the clfs state we would endlessly spin trying to reset the stream.
Signed-off-by: Derek Collison <derek@nats.io>
When encountering errors for sequence mismatches that were benign we were returning an error and not processing the rest of the entries.
This would lead to more severe sequence mismatches later on that would cause stream resets.
Also added code to deal with server restarts and the clfs fixup states which should have been reset properly.
Signed-off-by: Derek Collison <derek@nats.io>
There was a bug that would erase the sync subject for upper level catchup for streams.
Raft layer repair was ok but if that was compacted it gets kicked up to the upper layers which would fail.
Users would see "Catchup stalled" messages repeatedly and consumers that had their leaders attached to that replica would also stop working.
Changes were put in to repair the corrupt state after the fact as well, regardless of presence of fix.
Signed-off-by: Derek Collison <derek@nats.io>
Call to mset.unsubscribe() need to use the version that uses
locking when invoked from the subscription callback or from the
go routine when the 10secs have elapsed.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>