Derek Collison
4eb4e5496b
Do health check on startup once we have processed existing state.
...
Also do health checks in separate go routine.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-29 09:36:35 -07:00
Derek Collison
fac5658966
If we fail to create a consumer, make sure to clean up any raft nodes in meta layer and to shutdown the consumer if created but we encountered an error.
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-29 08:15:33 -07:00
Derek Collison
546dd0c9ab
Make sure we can recover an underlying node being stopped.
...
Do not return healthy if the node is closed, and wait a bit longer for forward progress.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-29 07:42:23 -07:00
Derek Collison
85f6bfb2ac
Check healthz periodically
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-28 17:58:45 -07:00
Derek Collison
ac27fd046a
Fix data race
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-28 17:57:03 -07:00
Derek Collison
d107ba3549
Under certain scenarios we have witnessed healthz() that never retrun healthy due to a stream or consumer being missing or stopped.
...
This will now allow the healthy call to attempt to restart those assets.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-28 17:11:08 -07:00
Marco Primi
82eade93b4
Merge JS Chaos tests into a single file
2023-04-27 14:56:55 -07:00
Marco Primi
7908d8c05c
Merge JS benchmarks into a single file
2023-04-27 14:56:55 -07:00
Marco Primi
df552351ec
Benchmark for interest-based stream with limits
...
Measure publish throughput with different limits (MaxBytes, MaxMessages,
MaxPerSubject, MaxAge, ...)
2023-04-27 14:56:55 -07:00
Derek Collison
f972165b0e
Bump to 2.9.17-beta.2
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-27 14:30:19 -07:00
Derek Collison
c3b07df86f
The server's Start() used to block but no longer does. ( #4111 )
...
This updates tests and the function comment.
Signed-off-by: Derek Collison <derek@nats.io >
Resolves #4110
2023-04-27 09:50:03 -07:00
Derek Collison
a66ac8cb9b
The server's Start() used to block but no longer does. This updates tests and function comment.
...
Fix for #4110
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-27 06:55:03 -07:00
Neil Twigg
e30ea34625
Add op type to panics
...
Signed-off-by: Neil Twigg <neil@nats.io >
2023-04-27 11:38:52 +01:00
Derek Collison
9999f63853
ConsumerFileStore could encode an empty state or update an empty state on startup.
...
We needed to make sure at the lowest level that the state was read from disk and not depend on upper layer consumer.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-26 15:48:10 -07:00
Derek Collison
7f06d6f5a7
When Jsz() was asked for consumer details, would report incorrect data if not a consumer leader.
...
This is due to the way state is maintained for leaders vs followers for consumers.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-26 15:03:15 -07:00
Derek Collison
83293f86ff
Reduce threshold for compressing messages during a catchup
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-25 19:01:06 -07:00
Derek Collison
3c964a12d7
Migration could be delayed due to transferring leadership while the new leader was still paused.
...
Also check quicker but slow down if the state we need to have is not there yet.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-25 18:58:49 -07:00
Neil
08d341801f
Restore outbound queue coalescing ( #4093 )
...
This PR effectively reverts part of #4084 which removed the coalescing
from the outbound queues as I initially thought it was the source of a
race condition.
Further investigation has proven that not only was that untrue (the race
actually came from the WebSocket code, all coalescing operations happen
under the client lock) but removing the coalescing also worsens
performance.
Signed-off-by: Neil Twigg <neil@nats.io >
2023-04-25 15:53:00 +01:00
Derek Collison
70b635e337
Test that makes sure that assets can change be scaled after a cluster change. ( #4101 )
...
This is specifically when a cluster is reconfigured and the servers are
restarted with a new cluster name.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-25 07:45:46 -07:00
Neil Twigg
2206f9e468
Re-add coalescing to outbound queues
...
Originally I thought there was a race condition happening here,
but it turns out it is safe after all and the race condition I
was seeing was due to other problems in the WebSocket code.
Signed-off-by: Neil Twigg <neil@nats.io >
2023-04-25 12:15:11 +01:00
Derek Collison
e25f89dc4d
Do not fail healthz in single server mode on failed snapshot restore. ( #4100 )
...
In single server mode healthz could mistake a snapshot staging
direct…ory during a restore as an account.
If the restore took a long time, stalled, or was aborted, would cause
healthz to fail.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-24 22:49:55 -07:00
Derek Collison
47c6bfded4
Update server/jetstream_test.go
...
Fix spelling
Co-authored-by: Tomasz Pietrek <tomasz@nats.io >
2023-04-24 22:29:05 -07:00
Derek Collison
3340179b97
Fix flapper
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-24 22:22:27 -07:00
Derek Collison
cae91b8cad
In single server mode healthz could mistake a snapshot staging directory during a restore as an account.
...
If the restore took a long time, stalled, or was aborted, would cause healthz to fail.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-24 22:14:04 -07:00
cui fliter
f1f5a59e9b
fix some comments
...
Signed-off-by: cui fliter <imcusg@gmail.com >
2023-04-25 11:28:59 +08:00
Derek Collison
c0f5b71a8f
Test that makes sure that assets that have been created under a certain cluster can be upgraded to a new cluster.
...
This is specifically when a cluster is reconfigured and the servers are restarted with a new cluster name.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-24 20:06:20 -07:00
Waldemar Quevedo
d9cc8b0363
fix formatting of raft debug log
...
Signed-off-by: Waldemar Quevedo <wally@nats.io >
2023-04-22 07:07:08 +02:00
Derek Collison
04908962a1
Swap out flate from std library for faster one from compress. ( #4087 )
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-21 14:02:43 -07:00
Derek Collison
50522f117d
New version of flate needed more payload at best speed to kick in
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-21 13:18:25 -07:00
Derek Collison
f9f4bf5c40
Run a check for ack floor drift. ( #4086 )
...
Also periodically check. If all normal will be very cheap.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-21 12:56:53 -07:00
Derek Collison
da9a17fd68
Spelling
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-21 12:40:19 -07:00
Derek Collison
57d06abbc9
Swap out flate from std for faster one
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-21 12:12:16 -07:00
Derek Collison
8b7c2d12aa
Run a check for ack floor drift when taking over as a leader and the ack go routine is spun up.
...
Also periodically check. If all normal will be very cheap.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-21 11:59:35 -07:00
Neil Twigg
5f884349db
Remove TestClientOutboundQueueCoalesce as no longer needed
...
Signed-off-by: Neil Twigg <neil@nats.io >
2023-04-21 15:40:49 +01:00
Neil Twigg
2ece00b08f
Buffer re-use in WebSocket code, fix race conditions
...
Signed-off-by: Neil Twigg <neil@nats.io >
2023-04-21 15:33:48 +01:00
Neil Twigg
bf286744dd
Remove coalescing as it races with the writev syscall
...
Signed-off-by: Neil Twigg <neil@nats.io >
2023-04-20 23:29:36 +01:00
Derek Collison
f6195a5ee3
A stream could have a complicated state with interior deletes.
...
This is a simpler way to determine if we need to consider a snapshot that involves much less time and CPU and memory.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-18 19:11:49 -07:00
Derek Collison
c43c216415
Bump to 2.9.17-beta.1
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-18 18:55:13 -07:00
Byron Ruth
202d49d069
2.9.16 release
...
Signed-off-by: Byron Ruth <byron@nats.io >
2023-04-17 10:05:18 -04:00
Derek Collison
9a3e0b783c
Fix for a data race when setting up service import subscriptions.
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-17 06:40:09 -07:00
Neil Twigg
a9aa280d06
Bump version to 2.9.16-RC.9
...
Signed-off-by: Neil Twigg <neil@nats.io >
2023-04-17 13:38:25 +01:00
Derek Collison
a5f5603645
Reset our WAL on edge conditions instead of trying to recover.
...
Also if we are timing out and trying to become a candidate but are doing a catchup check if we are stalled.
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-15 12:23:44 -07:00
Derek Collison
034975e767
Fix for a regression in behavior, needed to make sure when we went back to 1 entry for a subject we cleared firstNeedsUpdate.
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-15 10:00:44 -07:00
Derek Collison
66ca46e145
If we see another leader with same term we should step down
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-14 16:21:40 -07:00
Derek Collison
94e62cc30b
Add server name / remote server name to routez ( #4054 )
...
Add server name to routez info:
```js
{
"server_id": "NAXUYR7XZFPBQPJGV5HNWIXLOCLBSA2GOQEPGP47AFY62XHWPREKCXWL",
"server_name": "nats-burn-0",
"now": "2023-04-14T19:47:52.057269Z",
"num_routes": 2,
"routes": [
{
"rid": 51,
"remote_id": "NACPYGKLPWMYM7YZA3NCYQ5HL2EKGVKFKNGGE6ECD5LMO26JC3T3TARR",
"remote_name": "nats-burn-1",
"did_solicit": true,
"is_configured": true,
"ip": "127.0.0.1",
"port": 57889,
"start": "2023-04-14T12:47:43.475646-07:00",
"last_activity": "2023-04-14T19:47:51.983756Z",
"rtt": "178µs",
"uptime": "8s",
"idle": "0s",
"pending_size": 0,
"in_msgs": 210,
"out_msgs": 154,
"in_bytes": 31854,
"out_bytes": 15150,
"subscriptions": 134
},
{
"rid": 53,
"remote_id": "NCTOFS2M5IVGKYRGYOWP3Q5SQNPGCPMCIIJWOEH5SOIH3XQKRBKP7ITJ",
"remote_name": "nats-burn-2",
"did_solicit": true,
"is_configured": true,
"ip": "127.0.0.1",
"port": 57905,
"start": "2023-04-14T12:47:44.275914-07:00",
"last_activity": "2023-04-14T19:47:52.022301Z",
"rtt": "152µs",
"uptime": "7s",
"idle": "0s",
"pending_size": 0,
"in_msgs": 179,
"out_msgs": 48,
"in_bytes": 25629,
"out_bytes": 7573,
"subscriptions": 100
}
]
}
```
2023-04-14 13:09:43 -07:00
Waldemar Quevedo
d12152c48f
Add server name / remote server name to routez
...
Signed-off-by: Waldemar Quevedo <wally@nats.io >
2023-04-14 12:47:00 -07:00
Derek Collison
2699465596
Fix stream sourcing & mirroring overlap errors ( #4052 )
...
When adding or updating sources/mirrors, server was checking if the
stream with a given name exists to check for subject overlaps, among
other things.
However, if sourced/mirrored stream was `External`, checks should not be
executed, as not only stream would never be found, but also, if
`External` stream had the same name as the sourcing stream, the check
would be wrongly performed against itself.
cc @jnmoyne
Signed-off-by: Tomasz Pietrek <tomasz@nats.io >
2023-04-14 12:18:04 -07:00
Tomasz Pietrek
a66c67baa5
Fix stream sourcing & mirroring overlap errors
...
When adding or updating sources/mirrors, server was checking if the stream with
a given name exists to check for subject overlaps, among other things.
However, if sourced/mirrored stream was `External`, checks should
not be executed, as not only stream would never be found,
but also, if `External` stream had the same name as the sourcing stream,
the check would be wrongly performed against itself.
Signed-off-by: Tomasz Pietrek <tomasz@nats.io >
2023-04-14 21:00:11 +02:00
Derek Collison
0fe48fe91e
Use new server read locks now that we have them
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-14 10:11:40 -07:00
Derek Collison
89fc7e3203
Bump to 2.9.16-RC.8
...
Signed-off-by: Derek Collison <derek@nats.io >
2023-04-13 21:04:33 -07:00