If the consumer's sequence was not the same than the stream's sequence,
then the redelivery would always use the first duration from the
BackOff list.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This allows a consumer to have exponential backoffs vs static AckWait and MaxDeliver.
When BackOff is set it will overridde AckWait to BackOff[0] and MaxDeliver will be len(BackOff)+1.
Signed-off-by: Derek Collison <derek@nats.io>
We will only send if all peers in our group are >= 2.7.1 and we will check for updates.
When a consumer follower takes over it will notify all pending requests that those requests are invalid now.
Signed-off-by: Derek Collison <derek@nats.io>
This should help with GC pressure, however, it may have an effect
on performance (based on some benchmark). Calling sync.Pool.Get/Put
too often has a performance impact...
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Under load we could have a message committed to the underlying store when a consumer was being created and then it increase num pending again when the stream signals the consumers.
This fix just remembers the last seq of the state when we calculate sgap and test before adding in the stream code.
Signed-off-by: Derek Collison <derek@nats.io>
Cleaned up code, made more consistent, utilize loopAndGather.
Allow pull consumers to have AckAll as well as AckExplicit.
Signed-off-by: Derek Collison <derek@nats.io>
This allows stream placement to overflow to adjacent clusters.
We also do more balanced placement based on resources (store or mem). We can continue to expand this as well.
We also introduce an account requirement that stream configs contain a MaxBytes value.
We now track account limits and server limits more distinctly, and do not reserver server resources based on account limits themselves.
Signed-off-by: Derek Collison <derek@nats.io>
Under certain situations large number of consumers that are racing to update state or delete their stores during a delete
would start taking up OS threads due to blocking disk IO. When this happened and their were a bunch of Go routines becoming
runnable the Go runtime would create extra OS threads to fill in the runnable pool and would exhaust the max thread setting.
This code places a channel as a simple semaphore to limit the number of disk IO blocking OS threads.
Signed-off-by: Derek Collison <derek@nats.io>
The filestore would release a msgBlock lock while trying to load a cache block if it thought it needed to flush pending data.
With async false, this should be very rare but was possible after careful inspection.
I constructed an artificial test with sleeps throughout the filestore code to reproduce.
It involved having 2 Go routines that were through and waiting on the last msg block, and another one that was writing.
After the write, but before we flushed after releasing the lock we would also artificially sleep.
This would lead to the second read seeing the cache load was already in progress and return no error.
If the load was for a sequence before the current write sequence, and async was false, the cache fseq would be higher than what was requested.
This would cause the errPartialCache to be returned.
Once returned to the consumer level in loopAndGather, it would exit that Go routine and the consumer would cease to function.
This change removed the unlock of a msgBlock to perform and flush, ensuring that two cacheLoads would not yield the errPartialCache.
I also updated the consumer in the case this does happen in the future to not exit the loopAndGather Go routine.
Signed-off-by: Derek Collison <derek@nats.io>
If the interest existed prior to the initial creation of the
consumer, the gateway "watcher" would not be started, which means
that interest moving across the super-cluster after that would
not be detected.
The watcher runs every second and not sure if this is costly or
not, so we may want to go a different approach of having a separate
interest change channel that would be specific to gateways. But this
means adding a new sublist where the interest would be registered
and that sublist would need to be updated when processing GW RSub
and RUnsub?
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a consumer is configured with "meta-only" option, and the
stream was backed by a memory store, a memory corruption could
happen causing the application to receive corrupted headers.
Also replaced most of usage of `append(a[:0:0], a...)` to make
copies. This was based on this wiki:
https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F
But since Go 1.15, it is actually faster to call make+copy instead.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Replaced use of eventsEnabled() with EventsEnabled() that will
check under server lock. Also found another reference when
creating templates.
Resolves#2588
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Under load and pressure from concurrent publishing and consuming with multiple consumers the filestore would
return a partial or no cache error to the upper layers. For consumers this could result in us skipping a stream sequence when we should not.
This change stabilizes the filestore and removes the flush state for msg blocks. I also found some bugs that did not track last sequence properly
after snapshots / restore.
Signed-off-by: Derek Collison <derek@nats.io>
When expiring requests, the server would send 408 if interest was
still present, which can happen for pull subscribe implementations
that maintain interest for the duration of the pull subscription.
Let's keep the 408 for when a request is "force expired", that
is, a request was removed from the queue because it queue was
full but interest is still found.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This change was made in a previous PR wit this commit:
9405b77e46
After some discussions, we agreed that the original approach
is best, so using a dedicated object SequenceInfo for ConsumerInfo.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>