Commit Graph

411 Commits

Author SHA1 Message Date
Derek Collison
087a28a13e When creating replicated mirrors where the source stream had a very large starting sequence number, the server would use excessive CPU and Memory.
This is due to the mirroring functionality trying to skip messages when it detects a gap. In a replicated stream this puts excessive stress on the raft system.
This step is not needed at all if the mirror stream has no messages, we can simply jump ahead.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-15 17:20:15 -07:00
Derek Collison
df5df3ce99 Panic fixes (#4214)
- [ ] Link to issue, e.g. `Resolves #NNN`
 - [ ] Documentation added (if applicable)
 - [ ] Tests added
- [ ] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [ ] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [x] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

Resolves panics in the code.

### Changes proposed in this pull request:

 - This PR fixes some of the panics in the code
2023-06-05 13:02:05 -07:00
Nikita Mochalov
5141b87dff Refactor code 2023-06-05 22:42:28 +03:00
Derek Collison
238282d974 Fix some data races detected in internal testing
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-03 13:58:15 -07:00
Artem Seleznev
27a8b96ee3 different panic fixes
Signed-off-by: Artem Seleznev <seleznyov.artyom@gmail.com>
2023-06-02 13:19:22 +03:00
Derek Collison
21239022bd Protect against usage drift for any unforseen reason and if detected correct.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-03 17:09:06 -07:00
Derek Collison
f098c253aa Make sure we adjust accounting reservations when deleting a stream with any issues.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-01 15:54:37 -07:00
Derek Collison
f5ac5a4da0 Fix for a bug that could leave a raft node running when stopping a stream.
This can happen when we reset a stream internally and the stream had a prior snapshot.

Also make sure to always release resources back to the account regardless if the store is no longer present.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-01 13:22:06 -07:00
Derek Collison
b27ce6de80 Add in a few more places to check on jetstream shutting down.
Add in a helper method.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-29 11:27:18 -07:00
Derek Collison
db972048ce Detect when we are shutting down or if a consumer is already closed when removing a stream.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-29 11:18:10 -07:00
Derek Collison
ac27fd046a Fix data race
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-28 17:57:03 -07:00
Tomasz Pietrek
a66c67baa5 Fix stream sourcing & mirroring overlap errors
When adding or updating sources/mirrors, server was checking if the stream with
a given name exists to check for subject overlaps, among other things.
However, if sourced/mirrored stream was `External`, checks should
not be executed, as not only stream would never be found,
but also, if `External` stream had the same name as the sourcing stream,
the check would be wrongly performed against itself.

Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-04-14 21:00:11 +02:00
Derek Collison
bafd585ce4 Fix data race
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-13 18:40:06 -07:00
Derek Collison
313dd424a3 Optimize to not allocate converting strings to []byte
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-08 20:46:05 -07:00
Derek Collison
83f08999a7 Fix data race
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-07 07:30:03 -07:00
Derek Collison
2da50512e2 Optimize non-inline direct gets to not use simple go routines
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-06 07:50:57 -07:00
Derek Collison
ebe4f8957f Spelling based on review feedback
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 21:08:59 -07:00
Derek Collison
6b01a21965 No inline jetstream msg processing, always queue inbound
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 08:27:52 -07:00
Derek Collison
64b22011dc Better use of LoadAndStore based on review feedback
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 14:50:22 -07:00
Derek Collison
1fb1efd748 Make sure to remove any inflight entries when done
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 14:41:49 -07:00
Derek Collison
e6447c982a Protect against concurrent creation of streams and consumers.
Also make sure we have exited monotoring routines when doing resets for both streams and consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 14:29:52 -07:00
Derek Collison
872a9e7927 Add in monitor status similar to consumer
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:52:59 -07:00
Derek Collison
ad5bb366a0 Updates to preacks when multiple consumers are present but mutually exlusive (filtered).
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-31 10:43:28 -07:00
Derek Collison
8c0a45edf9 Make sure to lock on clearing if not removing.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 20:42:28 -07:00
Derek Collison
937ef0d2a6 Improvements to preAcks.
Better handling of multiple consumers so as to not delete too early.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 20:29:15 -07:00
Derek Collison
152b25c314 Update server/stream.go
Pre-allocate

Co-authored-by: Neil <neil@nats.io>
2023-03-29 15:29:51 -07:00
Derek Collison
5cabc365df General improvements around handling interest retention.
1. During ackMsg processing hold write lock to block concurrent access.
2. Check for presence of preAcks before and force removal if present.
3. Rework check for orphan msgs on startup to use checkStateForInterestStream().

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:51 -07:00
Derek Collison
ed9de4b0a1 Improved publisher performance under some instances of asymmetric network latency clusters on interest based streams.
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.

In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-20 20:53:45 -07:00
Derek Collison
9f1580686a Revert behavior for JetStream published directly from client to be handled inline.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 04:35:52 -08:00
Derek Collison
3807441fd7 Always process inbound messages in separate execution context.
Do not duplicate work on leader, sealed and clustered state.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 11:45:31 -08:00
Derek Collison
b19fe508c4 Do not block routes/gws on internal stream and consumer info requests
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 11:17:29 -08:00
Derek Collison
2642a8c03d Optimize locking for when under heavy loads.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 18:56:55 -08:00
Neil Twigg
68961ffedd Refactor ipQueue to use generics, reduce allocations 2023-02-21 14:50:09 +00:00
Derek Collison
e270e9538f Do not warn if consumer replicas condigured to 0
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-18 11:50:26 -08:00
Derek Collison
efa3bcc49d Parallel consumer creation could drop responses (create and info) and could also run monitorConsumer twice.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-18 05:16:05 -08:00
Derek Collison
11b0f214d0 Do not re-calculate NumPending on consumer info calls.
We noticed this was being called alot in user environments.
When the consumer was filtered with a wilcard and the stream had a high cardinality of subjects and was falling behind this could take a substantial amount of time.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-16 16:30:14 -08:00
Derek Collison
b611e37e95 For updating a consumer filter subject make sure locking ordere correct and that our sublist is present.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-06 21:34:48 +04:00
Derek Collison
b22ed47a26 Use fast state in case many interior deletes and small fix for staticcheck
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-04 13:55:41 -08:00
Tomasz Pietrek
46af979871 Fix current consumers not getting messages after purge
Until now, purge updated all consumers sequences
even if purge subject was only a subset of given consumer filter.
Because of that, even messages from not purged subjects were not fetched
or properly accounted for existing consumers.

Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-02-02 16:27:32 +01:00
Derek Collison
1252653c16 Merge pull request #3829 from nats-io/jarema/fix-message-after-update
Fix Consumer not getting messages after filter update
2023-01-30 19:59:32 -08:00
Derek Collison
6058056e3b Minor fixes and optimizations for snapshots.
We were snappshotting more then needed, so double check that we should be doing this at the stream and consumer level.
At the raft level, we should have always been compacting the WAL to last+1, so made that consistent. Also fixed bug that would not skip last if more items behind the snapshot.

Signed-off-by: Derek Collison <derek@nats.io>
2023-01-30 17:54:18 -08:00
Tomasz Pietrek
836848ca64 Fix Consumer not getting messages after filter update
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-01-30 20:47:17 +01:00
Derek Collison
f4e6481ce7 Allow report cycles between source streams if subjects truly form a cycle.
Signed-off-by: Derek Collison <derek@nats.io>
2023-01-27 13:03:24 -08:00
Derek Collison
f62d929018 Consumer must match replica of parent stream if interest based policy.
Signed-off-by: Derek Collison <derek@nats.io>
2023-01-23 20:16:42 -08:00
Derek Collison
f4ee6530a0 When updating a stream to Direct Gets we were not spinning up subscription endpoint properly.
Signed-off-by: Derek Collison <derek@nats.io>
2023-01-23 16:51:07 -08:00
Todd Beets
c463b398db Validate no overlapping stream subscriptions on update config (non-clustered jetstream) 2022-12-16 12:58:59 -08:00
Derek Collison
c90fe9a2fa Improve performance and latency with large number of sparse consumers.
When a stream had a large number of consumers on a server that were sparse, the signaling mechanism would do a linear scan to signal matching consumers. As usage patterns have continued to have more consumers that are filteres and sparse, meaning a message is destined for a single or small number of consumers.

This change moves selection to a sublist that tracks only active consumer leaders for selection, which optimizes selection of consumers to signal when the number of consumers is large.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-13 15:25:55 -08:00
Derek Collison
5f7c8e21a2 Fixed issues with multiple concurrent stream create requests.
First issue was applications not getting any response.
However, there was also a more serious issue that would create multiple raft groups for each concurrent request.
The servers would only run one stream monitor loop, however they would update the state to the new raft group's name, so on server restart the stream would be using a different raft group then existing servers.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-04 19:13:51 -08:00
Ivan Kozlovic
74a16b0097 Merge pull request #3640 from nats-io/fix_3639
[FIXED] JetStream: WorkQueue not preventing overlapping consumers
2022-11-16 17:22:35 -07:00
Ivan Kozlovic
49faba9e33 [FIXED] JetStream: WorkQueue not preventing overlapping consumers
A stream with a WorkQueue retention policy is supposed to allow
more than one consumer if they user filtered subjects, but those
subjects should not overlap.

There was an issue that if a new consumer had a filter subject
"wider" than an existing one, the error was not detected and
the new consumer was incorrectly accepted.

Resolves #3639

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-11-16 17:09:30 -07:00