Commit Graph

190 Commits

Author SHA1 Message Date
R.I.Pienaar
60e67ff9a5 Report correct consumer count in paged list response
Previously the Total in paged responses would always equal the
size of the first response this would stall paged clients after
the first page.

Now correctly sets the total so paging continues, improves the
test to verify these aspects of the report

Signed-off-by: R.I.Pienaar <rip@devco.net>
2023-07-27 07:52:24 +03:00
Neil Twigg
2527e11304 Increase threshold in TestNoRaceJetStreamSlowFilteredInititalPendingAndFirstMsg
Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-14 17:05:26 +01:00
Derek Collison
087a28a13e When creating replicated mirrors where the source stream had a very large starting sequence number, the server would use excessive CPU and Memory.
This is due to the mirroring functionality trying to skip messages when it detects a gap. In a replicated stream this puts excessive stress on the raft system.
This step is not needed at all if the mirror stream has no messages, we can simply jump ahead.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-15 17:20:15 -07:00
Derek Collison
9eeffbcf56 Fix performance issues with checkAckFloor.
Bail early if new consumer, meaning stream sequence floor is 0.
Decide which linear space to scan.
Do no work if no pending and we just need to adjust which we do at the end.

Also realized some tests were named wrong and were not being run, or were in wrong file.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-08 18:45:03 -07:00
Neil Twigg
d7ae2cbb5f Backport #4120 to main
Signed-off-by: Neil Twigg <neil@nats.io>
2023-05-09 11:24:35 +01:00
Ivan Kozlovic
95e4f2dfe1 Fixed accounts configuration reload
Issues could manifest with subscription interest not properly
propagated.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-03 14:35:06 -06:00
Derek Collison
c15cc0054a When a fleet of leafnodes are isolated (not routed but using same cluster) we could do better at optimizing how we update the other leafnodes.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-30 17:08:16 -07:00
Derek Collison
3340179b97 Fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-24 22:22:27 -07:00
Derek Collison
aee73a9c77 Fix flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-08 21:58:54 -07:00
Derek Collison
ffc49b8f86 Fix flapping test and data race in test
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-08 08:13:31 -07:00
Derek Collison
07b34f707f Make sure to never process next message requests inline
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 20:50:01 -07:00
Derek Collison
94278e731a More tweaks to test due to slow network proxy being more accurate
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 19:57:34 -07:00
Derek Collison
5afcb6c13c Fix for flapping test, network proxy more accurate now so rtt needed to be tweaked
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 19:06:42 -07:00
Derek Collison
d5ac4d283a Fix for flapping test, can return invalid sequence as well
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 16:18:23 -07:00
Derek Collison
1fb1efd748 Make sure to remove any inflight entries when done
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 14:41:49 -07:00
Derek Collison
e6447c982a Protect against concurrent creation of streams and consumers.
Also make sure we have exited monotoring routines when doing resets for both streams and consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 14:29:52 -07:00
Derek Collison
b5358fa4b3 Wait for shutdown and sleep to let state build up
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:05 -07:00
Derek Collison
ad5bb366a0 Updates to preacks when multiple consumers are present but mutually exlusive (filtered).
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-31 10:43:28 -07:00
Derek Collison
5e85889790 [IMPROVED] Improvements to preAcks. (#4006)
Better handling of multiple consumers so as to not delete messages too
early.
Better cleanup handling.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 21:08:34 -07:00
Derek Collison
937ef0d2a6 Improvements to preAcks.
Better handling of multiple consumers so as to not delete too early.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 20:29:15 -07:00
Ivan Kozlovic
a4df4f8727 Fixed some tests
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-03-30 15:02:59 -06:00
Derek Collison
873ab0f6b9 Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 18:55:41 -07:00
Derek Collison
c546828359 Moved log running test to NoRace suite
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 16:56:04 -07:00
Derek Collison
e97ddcd14f Tweak tests due to changes, make test timeouts uniform.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:59 -07:00
Derek Collison
0d9f707b4b Additional tests to stress interest based streams with pull subscribers during rolling restarts.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:55 -07:00
Derek Collison
9ccd7abdf8 Test for preAcks
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-21 12:08:24 -07:00
Derek Collison
ed9de4b0a1 Improved publisher performance under some instances of asymmetric network latency clusters on interest based streams.
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.

In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-20 20:53:45 -07:00
Derek Collison
5a16f98427 Fixed an off by one bug that under certain circumstances could cause large consumer replica states.
This could lead to instability in the system.

The bug would manifest in replicated consumers when certain messages could be acked out of order, and, the pending list would never go to zero.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-19 10:41:59 -07:00
Derek Collison
ebe08040e9 Attempt to fix flapper again
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 06:24:51 -08:00
Derek Collison
baca7bd751 Fix for test flapper
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 04:58:01 -08:00
Derek Collison
2642a8c03d Optimize locking for when under heavy loads.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 18:56:55 -08:00
Derek Collison
d347cb116a When becoming leader optionally send current snapshot to followers if caught up.
This can help sync on restarts and improve ghost ephemerals. Also added more code to suppress respnses and API audits when we know we are recovering.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-23 10:30:36 -08:00
Derek Collison
2972c11be6 Improve consumer create performance.
In cases where we had a large subject space, a filestore with many msg blocks, and a filtered consumer with a wildcard filtered subject, creating a consumer could take more memory and time then we wanted.
This improvement works when the consumer is DeliverAll and we used the upper layer in memory psim structure to scan but only in memory and avoid a file read for each msg block.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-22 19:42:02 -08:00
Derek Collison
f16a7d8559 Skip test for now
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-22 15:49:48 -08:00
Derek Collison
d03d8e9d93 When having a max msgs per subject (e.g. KV) under heavy concurrent usage could skew the accounting for the underlying filestore.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-22 12:50:43 -08:00
Derek Collison
11b0f214d0 Do not re-calculate NumPending on consumer info calls.
We noticed this was being called alot in user environments.
When the consumer was filtered with a wilcard and the stream had a high cardinality of subjects and was falling behind this could take a substantial amount of time.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-16 16:30:14 -08:00
Derek Collison
32b5ec16dd Fixed test to correspond to new limit of 1024.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-16 07:16:19 +04:00
Derek Collison
1e3c2810f4 Improve expireMsgs minAge calculation for when lots of messages to expire in each callback.
This happens when under extreme load as shown in the skipped test.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-13 18:39:39 +02:00
Derek Collison
e9a983c802 Do not let !NeedSnapshot() avoid snapshots and compaction.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-01 22:05:25 -07:00
Derek Collison
390fd02918 Updates to tests for updated Go client changes
Signed-off-by: Derek Collison <derek@nats.io>
2023-01-31 09:47:36 -08:00
Ivan Kozlovic
79ca0c1787 Move test to "norace_test.go"
The test TestJetStreamClusterConsumerListPaging was in the
jetstream_cluster_3_test.go and because of `-race` flag would
take more than 440 seconds (7+ minutes) as seen here:

https://app.travis-ci.com/github/nats-io/nats-server/jobs/593984385#L335

Without the `-race` flag, this test takes ~17 seconds.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-01-23 17:05:18 -07:00
Neil Twigg
14d0ba1c65 Fix some lint errors after move to golangci-lint 2022-12-30 20:00:08 +00:00
Derek Collison
c90fe9a2fa Improve performance and latency with large number of sparse consumers.
When a stream had a large number of consumers on a server that were sparse, the signaling mechanism would do a linear scan to signal matching consumers. As usage patterns have continued to have more consumers that are filteres and sparse, meaning a message is destined for a single or small number of consumers.

This change moves selection to a sublist that tracks only active consumer leaders for selection, which optimizes selection of consumers to signal when the number of consumers is large.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-13 15:25:55 -08:00
Marco Primi
f8a030bc4a Use testing.TempDir() where possible
Refactor tests to use go built-in temporary directory utility for tests.

Also avoid binding to default port (which may be in use)
2022-12-12 13:18:44 -08:00
Derek Collison
894115b82b Fix for server panic when consumer state was not decoded correctly.
The bug was when a timestamp for the pending state was exactly -1 which could happen based on timing of the redlivered pending items which would set pending.Timestamp into the future potentially and the timing on the encodeConsumerState call.

Minor fixes to raft.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-06 14:16:20 -08:00
Derek Collison
9f241f3322 Offload signaling to consumers when number is large.
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-15 11:25:07 -08:00
Derek Collison
4dab6ce92c Fix test timing
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-09 19:44:22 -08:00
Derek Collison
c6031382a1 Fix for #3499
When we deleted a consumer from an interest policy stream we would make sure to clean up any unacked messages.
However we only based start from the ack floor for the consumer and did not take into account the first sequence of the stream.

Signed-off-by: Derek Collison <derek@nats.io>
2022-11-05 13:56:45 -07:00
Ivan Kozlovic
170ff49837 [ADDED] JetStream: peer (the hash of server name) in statsz/jsz
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
    "meta_cluster": {
      "name": "local",
      "leader": "A",
      "peer": "NUmM6cRx",
      "replicas": [
        {
          "name": "B",
          "current": true,
          "active": 690369000,
          "peer": "b2oh2L6w"
        },
        {
          "name": "Server name unknown at this time (peerID: jZ6RvVRH)",
          "current": false,
          "offline": true,
          "active": 0,
          "peer": "jZ6RvVRH"
        }
      ],
      "cluster_size": 3
    }
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-16 15:31:37 -06:00
Derek Collison
6c97733bb8 Optimize needAck.
Signed-off-by: Derek Collison <derek@nats.io>
2022-09-14 16:25:50 -07:00