Commit Graph

5016 Commits

Author SHA1 Message Date
Ivan Kozlovic
73ed55ae5b Fixed flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-28 10:55:32 -06:00
Ivan Kozlovic
8d2683a062 Fixed data race
Reverts changes made in PR#4001: 105237cba8 (diff-1322a81c43dfdd05284ae128c43d9ea51c1a3b677587686561ef6de47024e14aR1340)

Since a fix was made here: b78ec39b1f
the changes made in PR need to be reverted. The test
TestRoutePoolAndPerAccountWithServiceLatencyNoDataRace now passes.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-28 10:18:14 -06:00
Ivan Kozlovic
d6fe9d4c2d [ADDED] Support for route S2 compression
The new field `compression` in the `cluster{}` block allows to
specify which compression mode to use between servers.

It can be simply specified as a boolean or a string for the
simple modes, or as an object for the "s2_auto" mode where
a list of RTT thresholds can be specified.

By default, if no compression field is specified, the server
will use the s2_auto mode with default RTT thresholds of
10ms, 50ms and 100ms for the "uncompressed", "fast", "better"
and "best" modes.

```
cluster {
..
  # Possible values are "disabled", "off", "enabled", "on",
  # "accept", "s2_fast", "s2_better", "s2_best" or "s2_auto"
  compression: s2_fast
}
```

To specify a different list of thresholds for the s2_auto,
here is how it would look like:
```
cluster {
..
  compression: {
    mode: s2_auto
    # This means that for RTT up to 5ms (included), then
    # the compression level will be "uncompressed", then
    # from 5ms+ to 15ms, the mode will switch to "s2_fast",
    # then from 15ms+ to 50ms, the level will switch to
    # "s2_better", and anything above 50ms will result
    # in the "s2_best" compression mode.
    rtt_thresholds: [5ms, 15ms, 50ms]
  }
}
```

Note that the "accept" mode means that a server will accept
compression from a remote and switch to that same compression
mode, but will otherwise not initiate compression. That is,
if 2 servers are configured with "accept", then compression
will actually be "off". If one of the server had say s2_fast
then they would both use this mode.

If a server has compression mode set (other than "off") but
connects to an older server, there will be no compression between
those 2 routes.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-27 17:59:25 -06:00
Derek Collison
d573b78aee Merge branch 'main' into dev 2023-04-26 18:42:31 -07:00
Derek Collison
9999f63853 ConsumerFileStore could encode an empty state or update an empty state on startup.
We needed to make sure at the lowest level that the state was read from disk and not depend on upper layer consumer.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-26 15:48:10 -07:00
Derek Collison
7f06d6f5a7 When Jsz() was asked for consumer details, would report incorrect data if not a consumer leader.
This is due to the way state is maintained for leaders vs followers for consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-26 15:03:15 -07:00
Derek Collison
4ebdb69daf Merge branch 'main' into dev 2023-04-26 11:34:37 -07:00
Derek Collison
83293f86ff Reduce threshold for compressing messages during a catchup
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-25 19:01:06 -07:00
Derek Collison
3c964a12d7 Migration could be delayed due to transferring leadership while the new leader was still paused.
Also check quicker but slow down if the state we need to have is not there yet.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-25 18:58:49 -07:00
Neil
08d341801f Restore outbound queue coalescing (#4093)
This PR effectively reverts part of #4084 which removed the coalescing
from the outbound queues as I initially thought it was the source of a
race condition.

Further investigation has proven that not only was that untrue (the race
actually came from the WebSocket code, all coalescing operations happen
under the client lock) but removing the coalescing also worsens
performance.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-04-25 15:53:00 +01:00
Derek Collison
70b635e337 Test that makes sure that assets can change be scaled after a cluster change. (#4101)
This is specifically when a cluster is reconfigured and the servers are
restarted with a new cluster name.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-25 07:45:46 -07:00
Neil Twigg
2206f9e468 Re-add coalescing to outbound queues
Originally I thought there was a race condition happening here,
but it turns out it is safe after all and the race condition I
was seeing was due to other problems in the WebSocket code.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-04-25 12:15:11 +01:00
Derek Collison
e25f89dc4d Do not fail healthz in single server mode on failed snapshot restore. (#4100)
In single server mode healthz could mistake a snapshot staging
direct…ory during a restore as an account.
If the restore took a long time, stalled, or was aborted, would cause
healthz to fail.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-24 22:49:55 -07:00
Derek Collison
47c6bfded4 Update server/jetstream_test.go
Fix spelling

Co-authored-by: Tomasz Pietrek <tomasz@nats.io>
2023-04-24 22:29:05 -07:00
Derek Collison
3340179b97 Fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-24 22:22:27 -07:00
Derek Collison
cae91b8cad In single server mode healthz could mistake a snapshot staging directory during a restore as an account.
If the restore took a long time, stalled, or was aborted, would cause healthz to fail.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-24 22:14:04 -07:00
cui fliter
f1f5a59e9b fix some comments
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-04-25 11:28:59 +08:00
Derek Collison
c0f5b71a8f Test that makes sure that assets that have been created under a certain cluster can be upgraded to a new cluster.
This is specifically when a cluster is reconfigured and the servers are restarted with a new cluster name.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-24 20:06:20 -07:00
Waldemar Quevedo
d9cc8b0363 fix formatting of raft debug log
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-04-22 07:07:08 +02:00
Derek Collison
1de9a1cf3b Merge branch 'main' into dev 2023-04-21 14:09:35 -07:00
Derek Collison
04908962a1 Swap out flate from std library for faster one from compress. (#4087)
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-21 14:02:43 -07:00
Derek Collison
50522f117d New version of flate needed more payload at best speed to kick in
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-21 13:18:25 -07:00
Derek Collison
f9f4bf5c40 Run a check for ack floor drift. (#4086)
Also periodically check. If all normal will be very cheap.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-21 12:56:53 -07:00
Derek Collison
da9a17fd68 Spelling
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-21 12:40:19 -07:00
Derek Collison
57d06abbc9 Swap out flate from std for faster one
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-21 12:12:16 -07:00
Derek Collison
8b7c2d12aa Run a check for ack floor drift when taking over as a leader and the ack go routine is spun up.
Also periodically check. If all normal will be very cheap.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-21 11:59:35 -07:00
Neil Twigg
5f884349db Remove TestClientOutboundQueueCoalesce as no longer needed
Signed-off-by: Neil Twigg <neil@nats.io>
2023-04-21 15:40:49 +01:00
Neil Twigg
2ece00b08f Buffer re-use in WebSocket code, fix race conditions
Signed-off-by: Neil Twigg <neil@nats.io>
2023-04-21 15:33:48 +01:00
Neil Twigg
bf286744dd Remove coalescing as it races with the writev syscall
Signed-off-by: Neil Twigg <neil@nats.io>
2023-04-20 23:29:36 +01:00
Derek Collison
a93fd080f0 Merge branch 'main' into dev 2023-04-19 08:53:09 -07:00
Derek Collison
4f8e7bb77c Ability to set the minimum value for a seqset if known. (#4075)
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-19 08:52:33 -07:00
Derek Collison
a45c7106b8 Only set minimum when removing first item
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-18 22:50:57 -07:00
Derek Collison
aa66c87d53 Make sure to set node count to 1
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-18 22:34:13 -07:00
Derek Collison
a744e1d5c9 Added ability to set initial minimum value for seqset when known.
We know the minimum value when creating a new filestore msgBlk.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-18 22:23:11 -07:00
Derek Collison
f6195a5ee3 A stream could have a complicated state with interior deletes.
This is a simpler way to determine if we need to consider a snapshot that involves much less time and CPU and memory.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-18 19:11:49 -07:00
Derek Collison
c43c216415 Bump to 2.9.17-beta.1
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-18 18:55:13 -07:00
Neil Twigg
85923c4342 Apply stream compression from file store config
Signed-off-by: Neil Twigg <neil@nats.io>
2023-04-18 16:31:59 +01:00
Neil Twigg
57d888eec4 Use AVL tree for consumer redeliver map
Signed-off-by: Neil Twigg <neil@nats.io>
2023-04-18 15:53:44 +01:00
Derek Collison
333d684c86 Use encoding of avl seqset for writeIndexInfo's delete map.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-17 20:17:15 -07:00
Neil Twigg
1a24e955d0 Add size to preamble, check capacity instead of length when encoding
* If we don't encode the `size`, it is lost during an encoding-decoding round-trip
* If we don't check capacity, we might reallocate needlessly instead of just growing the slice

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-17 20:17:12 -07:00
Derek Collison
9b9159ab70 Basic swap out of the old dmap (map[uint64]struct{}) for new avl.SequenceSet.
No other optimizations yet.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-17 20:17:10 -07:00
Derek Collison
1f6aa94405 SequenceSet is an AVL tree with variable bitmask nodes to contain large delete maps for streams.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-17 20:17:03 -07:00
Derek Collison
09afcee9d9 Merge branch 'main' into dev 2023-04-17 08:43:08 -07:00
Byron Ruth
202d49d069 2.9.16 release
Signed-off-by: Byron Ruth <byron@nats.io>
2023-04-17 10:05:18 -04:00
Derek Collison
9a3e0b783c Fix for a data race when setting up service import subscriptions.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-17 06:40:09 -07:00
Neil Twigg
a9aa280d06 Bump version to 2.9.16-RC.9
Signed-off-by: Neil Twigg <neil@nats.io>
2023-04-17 13:38:25 +01:00
Derek Collison
3b3fac297a Merge branch 'main' into dev 2023-04-15 14:21:39 -07:00
Derek Collison
a5f5603645 Reset our WAL on edge conditions instead of trying to recover.
Also if we are timing out and trying to become a candidate but are doing a catchup check if we are stalled.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-15 12:23:44 -07:00
Derek Collison
3f27b67791 Merge branch 'main' into dev 2023-04-15 10:47:29 -07:00
Derek Collison
034975e767 Fix for a regression in behavior, needed to make sure when we went back to 1 entry for a subject we cleared firstNeedsUpdate.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-15 10:00:44 -07:00