Commit Graph

265 Commits

Author SHA1 Message Date
Derek Collison
b38ced51b2 A true no wait pull request was not considering redeliveries.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-30 11:52:35 -08:00
Derek Collison
275d42628b Fix for #2828. The original design of the consumer and the subsequent store did not allow updates.
Now that we do, we need to store the new config into our storage layer.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-30 09:45:05 -08:00
Derek Collison
a57bd96def Updating a push consumer to be pull would succeed but cause a panic if used.
This disallows that upgrade. We had a check in place for pull to push, but not the reverse.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-28 13:11:58 -08:00
Derek Collison
6be9925127 Update config error
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 15:02:41 -08:00
Derek Collison
65b168aa8b Updates based on feedback. MaxDeliver needs to be set properly now to be > len(BackOff) but if larger we will reuse last value in BackOff array.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 15:02:39 -08:00
Derek Collison
bd78b1a99b Formal json version for NAK delay
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 15:01:52 -08:00
Derek Collison
d486c24199 Allow a consumer to be configured with BackOffs.
This allows a consumer to have exponential backoffs vs static AckWait and MaxDeliver.
When BackOff is set it will overridde AckWait to BackOff[0] and MaxDeliver will be len(BackOff)+1.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 14:57:36 -08:00
Derek Collison
579bf336ad Allow NAK to take a delay parameter to delay redelivery for a certain amount of time.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 14:57:28 -08:00
Derek Collison
d332684322 Fixed data race and fuxed bug that we would not clear our waiting queue when a leader stepped down.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 13:01:25 -08:00
Derek Collison
6fd41e5ea4 Updates based on review feedback
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 10:23:47 -08:00
Derek Collison
d962500827 Track reply subjects for pending pull requests across clustered consumers.
We will only send if all peers in our group are >= 2.7.1 and we will check for updates.
When a consumer follower takes over it will notify all pending requests that those requests are invalid now.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-21 16:31:59 -08:00
Derek Collison
7f572983ac The 2.7 update broke the one-shot pull consumer fetch behavior due to change and a fix to a bug that allowed it to work before.
This change tries to lock down all expected behaviors, and now does out of order timeouts for requests.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-20 18:06:29 -08:00
Ivan Kozlovic
84f6cbb760 Pooling pubMsg and jsPubMsg objects
This should help with GC pressure, however, it may have an effect
on performance (based on some benchmark). Calling sync.Pool.Get/Put
too often has a performance impact...

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:14:25 -07:00
Ivan Kozlovic
d74dba2df9 Replaced RAFT's append entry response channel
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:06:48 -07:00
Ivan Kozlovic
23ebf9d2f8 Adapted jsOutQ
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:05:27 -07:00
Derek Collison
103f710479 Fixed consumer info num pending bug.
Under load we could have a message committed to the underlying store when a consumer was being created and then it increase num pending again when the stream signals the consumers.
This fix just remembers the last seq of the state when we calculate sgap and test before adding in the stream code.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-12 20:03:26 -08:00
Derek Collison
32c3c9ecfb Track interest properly across accounts for pull consumers
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-12 12:16:53 -08:00
Derek Collison
279f31ecb5 Add in ability to have ephemeral pull based consumers
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-10 20:42:39 -08:00
Derek Collison
e12c8cda92 Add in ability to limit aspects of a pull request, specifically batch size and expiration.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-10 17:29:04 -08:00
Derek Collison
5592d923c4 Updated pull consumers.
Cleaned up code, made more consistent, utilize loopAndGather.
Allow pull consumers to have AckAll as well as AckExplicit.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-10 16:59:01 -08:00
Derek Collison
52da55c8c6 Implement overflow placement for JetStream streams.
This allows stream placement to overflow to adjacent clusters.
We also do more balanced placement based on resources (store or mem). We can continue to expand this as well.
We also introduce an account requirement that stream configs contain a MaxBytes value.

We now track account limits and server limits more distinctly, and do not reserver server resources based on account limits themselves.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-06 19:33:08 -08:00
Derek Collison
c5fbb63614 JetStream ephemeral consumers could create a situation where the server would exhaust the OS thread limit - default 10k.
Under certain situations large number of consumers that are racing to update state or delete their stores during a delete
would start taking up OS threads due to blocking disk IO. When this happened and their were a bunch of Go routines becoming
runnable the Go runtime would create extra OS threads to fill in the runnable pool and would exhaust the max thread setting.

This code places a channel as a simple semaphore to limit the number of disk IO blocking OS threads.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-29 07:05:34 -08:00
Derek Collison
b7c61cd0bf Stabilize filstore to eliminate sporadic errPartialCache errors under certain situations. Related to #2732
The filestore would release a msgBlock lock while trying to load a cache block if it thought it needed to flush pending data.
With async false, this should be very rare but was possible after careful inspection.

I constructed an artificial test with sleeps throughout the filestore code to reproduce.
It involved having 2 Go routines that were through and waiting on the last msg block, and another one that was writing.
After the write, but before we flushed after releasing the lock we would also artificially sleep.
This would lead to the second read seeing the cache load was already in progress and return no error.
If the load was for a sequence before the current write sequence, and async was false, the cache fseq would be higher than what was requested.
This would cause the errPartialCache to be returned.

Once returned to the consumer level in loopAndGather, it would exit that Go routine and the consumer would cease to function.

This change removed the unlock of a msgBlock to perform and flush, ensuring that two cacheLoads would not yield the errPartialCache.

I also updated the consumer in the case this does happen in the future to not exit the loopAndGather Go routine.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-27 09:54:02 -08:00
Ivan Kozlovic
3053039ff3 [FIXED] JetStream: interest across gateways
If the interest existed prior to the initial creation of the
consumer, the gateway "watcher" would not be started, which means
that interest moving across the super-cluster after that would
not be detected.

The watcher runs every second and not sure if this is costly or
not, so we may want to go a different approach of having a separate
interest change channel that would be specific to gateways. But this
means adding a new sublist where the interest would be registered
and that sublist would need to be updated when processing GW RSub
and RUnsub?

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-16 17:20:16 -07:00
Ivan Kozlovic
9f30bf00e0 [FIXED] Corrupted headers receiving from consumer with meta-only
When a consumer is configured with "meta-only" option, and the
stream was backed by a memory store, a memory corruption could
happen causing the application to receive corrupted headers.

Also replaced most of usage of `append(a[:0:0], a...)` to make
copies. This was based on this wiki:
https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F

But since Go 1.15, it is actually faster to call make+copy instead.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-01 10:50:15 -07:00
R.I.Pienaar
cf097bfab4 Merge pull request #2717 from ripienaar/stream_valid_subjects
Stream valid subjects
2021-12-01 17:43:41 +01:00
R.I.Pienaar
4f1bfa969f ensure streams have only valid interest subjects
Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-12-01 17:03:28 +01:00
Derek Collison
e65f3d4a30 [FIXED #2706] - Only utilize full state with deleted details when really needed. Otherwise fast state will suffice.
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-29 10:50:28 -08:00
Derek Collison
5ead954fee [ADDED] Allow certain consumer attributes to be updated #2670, #2603
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-04 13:43:11 -07:00
Derek Collison
003b6996f1 If AckWait less then restart check interval use AckWait
Signed-off-by: Derek Collison <derek@nats.io>
2021-10-28 11:00:06 -07:00
Derek Collison
3a14a984fc Fix for a bug that did not properly decode redelivered state for consumers from a filestore.
This also caused state abnormalities in a user's setup so added code to clean up bad state as needed.

Signed-off-by: Derek Collison <derek@nats.io>
2021-10-28 08:33:48 -07:00
Derek Collison
678469b40b Fix for #2644
Signed-off-by: Derek Collison <derek@nats.io>
2021-10-25 13:12:37 -07:00
Derek Collison
cbbab295ec Merge pull request #2616 from nats-io/fix_2611
[FIXED] #2611
2021-10-12 08:46:42 -07:00
Derek Collison
8951bc54f4 Fix for stream purge where purge would remove all pending messages for a filtered consumer.
Signed-off-by: Derek Collison <derek@nats.io>
2021-10-12 07:26:13 -07:00
Derek Collison
f254c110c4 Ignore the 'not filtered' shortcut if deliver last policy is set
Signed-off-by: Derek Collison <derek@nats.io>
2021-10-12 06:15:40 -07:00
Derek Collison
9efa11ba43 Don't make consumers go backwards on purge
Signed-off-by: Derek Collison <derek@nats.io>
2021-10-07 07:06:52 -07:00
Derek Collison
58e5c7c681 Allow consumers to request only headers to be delivered
Signed-off-by: Derek Collison <derek@nats.io>
2021-10-05 18:54:52 -07:00
Ivan Kozlovic
d34b8fdd7d [FIXED] JetStream: data race on shutdown
Replaced use of eventsEnabled() with EventsEnabled() that will
check under server lock. Also found another reference when
creating templates.

Resolves #2588

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-09-30 11:17:12 -06:00
Derek Collison
ebb24006c2 Direct consumers used for mirroring should not be affected by max consumer limits
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-22 15:01:51 -07:00
Derek Collison
eab45b404a Fix for deadlock with stream mirrors or sources where origin is interest or workqueue policy.
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-22 10:59:02 -07:00
Derek Collison
40a4d40337 Make large batch requests expire more efficiently.
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-16 15:00:18 -07:00
Derek Collison
dadc3b9fae Fixed a bug when an interest retention stream with noack consumers is in clustered mode.
We were not properly propagating the ack state and proper cleanup of the stream messages.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-08 15:02:09 -07:00
Derek Collison
29eaa9c614 Fixed bug that could lead to perceived message loss.
Under load and pressure from concurrent publishing and consuming with multiple consumers the filestore would
return a partial or no cache error to the upper layers. For consumers this could result in us skipping a stream sequence when we should not.

This change stabilizes the filestore and removes the flush state for msg blocks. I also found some bugs that did not track last sequence properly
after snapshots / restore.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-05 16:36:23 -07:00
Ivan Kozlovic
44c57a5702 [FIXED] Pull requests: don't send 408 when request expires
When expiring requests, the server would send 408 if interest was
still present, which can happen for pull subscribe implementations
that maintain interest for the duration of the pull subscription.

Let's keep the 408 for when a request is "force expired", that
is, a request was removed from the queue because it queue was
full but interest is still found.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-09-01 16:57:13 -06:00
Derek Collison
57ef5fd528 Set consumer config defaults early on to avoid a race condition.
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-01 13:06:08 -07:00
Ivan Kozlovic
1308c73273 [CHANGED] ConsumerInfo's SequencePair replaced with SequenceInfo
This change was made in a previous PR wit this commit:
9405b77e46

After some discussions, we agreed that the original approach
is best, so using a dedicated object SequenceInfo for ConsumerInfo.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-08-23 12:28:23 -06:00
Derek Collison
9d7123213a Keep SequencePair vs SequenceInfo
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-14 12:01:29 -07:00
Derek Collison
9405b77e46 Added in last active reporting for consumers for delivered and ack floor.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-14 11:36:27 -07:00
Derek Collison
969cf60def Add in push bound status for consumer info.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-14 08:35:40 -07:00
Derek Collison
c4a78d91b7 Remove PushActive
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-13 15:11:31 -07:00