Commit Graph

6137 Commits

Author SHA1 Message Date
Derek Collison
6bc82bb4e6 Fic a data race
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-09 17:42:02 -05:00
Derek Collison
9a92d10cc9 Bump to 2.9.0-RC.1
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-09 17:29:15 -05:00
Derek Collison
9a61537b1e Merge pull request #3351 from nats-io/fs-kv
[IMPROVED] DirectGet performance and memory usage for large streams.
2022-08-09 15:27:39 -07:00
Derek Collison
8c04adc009 Improvements to filestore for large KVs.
Use better indexing for lookups, we used to do simple linear scan backwards, now track first and last block.
Will expire the fss cache at will to reduce memory usage.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-09 15:51:13 -05:00
Ivan Kozlovic
a4bf4e87f6 Merge pull request #3326 from mfaizanse/health_endpoint_params
Added param options to /healthz endpoint
2022-08-09 08:49:22 -06:00
Muhammad Faizan
1634f33de7 Added param options to /healthz endpoint 2022-08-09 08:32:54 +02:00
Ivan Kozlovic
e6955be82d Merge pull request #3349 from nats-io/js_cluster_stop_catchup
[IMPROVED] JetStream catchup can be aborted and better flow control
2022-08-08 18:23:43 -06:00
Ivan Kozlovic
502e5b13f7 Declare some catchup static errors
Use `var .. = errors.New()`.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-08 17:51:31 -06:00
Ivan Kozlovic
ecddb08469 [IMPROVED] JetStream catchup can be aborted and better flow control
If the leader sends messages but the follower for any reason aborts
or retry the snapshot process, it will now send the error that
caused this and the leader can then abort the catchup instead of
waiting for its inactivity threshold of 5 seconds.

Also make the send of a batch be delayed for a bit until the number
of "acks" is 1/2 of the batch size or after reaching 100ms. This
helps avoid trickling of messages. Tested with the new test
TestJetStreamSuperClusterStreamCathupLongRTT() and see better
results both in size of batches and overall time is smaller or
similar but not longer.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-08 17:19:36 -06:00
Derek Collison
c4abba4ed5 Bump to 2.9.0-beta.22
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 10:33:09 -07:00
Ivan Kozlovic
5924cd6abc Merge pull request #3348 from nats-io/catchup_improvements
[IMPROVED] Catchup improvements
2022-08-08 11:28:13 -06:00
Derek Collison
06112d6885 Reset activity interval on catchup to default vs ramp up. Tweak test.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
906afccb8a Make a check loop based on review feedback.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
33526f4d93 Make sure empty msgs do not interfere with catchup process.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
758b733d43 Attempt to improve long RTT catchup time during stream moves.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
e635de7526 Additional stability improvements for catchup.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
3407112292 Write lock not needed
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
d54899de0a No need to hold server write lock since sendq has its own.
I noticed some contention when I was investigating a catchup bug on the server write lock.
Medium term we could have a separate lock, longer term formal client support in the server will alleviate.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
a5119008a5 Fix up some processing during account purge to fix flapping tests
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
5a050fc10b Improve handling when a snapshot represents state we no longer have.
We would send skip messages for a sync request that was completely below our current state, but this could be more traffic then we might want.
Now we only send EOF and the other side can detect the skip forward and adjust on a successful catchup.
We still send skips if we can partially fill the sync request.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:08 -06:00
Ivan Kozlovic
33c4fec75f Merge pull request #3347 from nats-io/js_cluster_replicas
[IMPROVED] Replicas ordering and info regarding unknown in stream info
2022-08-08 09:51:14 -06:00
Ivan Kozlovic
d96e801825 Change the report to something like this instead:
```
Replica: Server name unknown at this time (peerID: jZ6RvVRH), outdated, OFFLINE, not seen
```
After discussing with @ripienaar, this text convey better a sense
that this is a transient situation.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-08 09:30:37 -06:00
Ivan Kozlovic
267e6d1958 [IMPROVED] Replicas ordering and info regarding unknown in stream info
If a cluster is brought down and then partially restarted, the
replica information about the non restarted node would be completely
missing. The CLI could report replicas 3 but then only the leader
and the running replicas, but nothing about the other node.
Since this node's server name is not know, this PR adds an entry
with something similar to this:
```
<unknown (peerID: jZ6RvVRH)>, outdated, OFFLINE, not seen
```

Also, replicas array is now ordered, which will help when using
a watcher or repeating stream info commands in that the replicas
output will be stable in regards to the list of replicas.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-07 18:54:26 -06:00
Ivan Kozlovic
69feaf0627 Merge pull request #3340 from mprimi/chaos-test-3
Fix chaos tests build tags for Travis
2022-08-07 17:12:42 -06:00
Marco Primi
be460b7bf1 Exclude chaos tests from build by default
Before: build chaos tests unless `skip_js_chaos_tests` is set
After: exclude chaos tests unless `js_chaos_tests` is set
2022-08-05 15:20:09 -07:00
Marco Primi
815948f02f Exclude chaos tests helpers from default build 2022-08-05 15:20:09 -07:00
Marco Primi
896adace06 [FIXED] Wrong flag in Travis to exclude chaos tests
The `js_tests` build target was using the wrong tag to exclude chaos 
tests.
As a result, chaos tests would run as part of the default testing.
2022-08-05 15:20:09 -07:00
Matthias Hanel
52c4872666 better error when peer selection fails (#3342)
* better error when peer selection fails

It is pretty hard to diagnose what went wrong when not enough peers for
an operation where found. This change now returns counts of reasons why
peers where discarded.

Changed the error to JSClusterNoPeers as it seems more appropriate
of an error for that operation. Not having enough resources is one of
the conditions for a peer not being considered. But so is having a non
matching tag. Which is why JSClusterNoPeers seems more appropriate
In addition, JSClusterNoPeers was already used as error after one call
to selectPeerGroup already.

example:
no suitable peers for placement: peer selection cluster 'C' with 3 peers
offline: 0
excludeTag: 1
noTagMatch: 2
noSpace: 0
uniqueTag: 0
misc: 0

Examle for mqtt:
mid:12 - "mqtt" - unable to connect: create sessions stream for account "$G":
no suitable peers for placement: peer selection cluster 'MQTT' with 3 peers
        offline: 0
        excludeTag: 0
        noTagMatch: 0
        noSpace: 0
        uniqueTag: 0
        misc: 0
         (10005)

Signed-off-by: Matthias Hanel <mh@synadia.com>

* review comment

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-06 00:17:01 +02:00
Ivan Kozlovic
b927b228fc Merge pull request #3345 from nats-io/fix_flapper
Fixed flapping test
2022-08-05 15:02:43 -06:00
Ivan Kozlovic
653b739fa1 Use filepath.Join() instead of manual concatenation
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-05 14:41:23 -06:00
Ivan Kozlovic
441c09799f Fixed flapping test
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-05 14:17:43 -06:00
Ivan Kozlovic
3d68835e3c Merge pull request #3344 from nats-io/fix_ioutil
Remove io/ioutil
2022-08-05 13:29:33 -06:00
Ivan Kozlovic
88424a89ef Remove io/ioutil
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-05 13:12:13 -06:00
Ivan Kozlovic
d90854a45f Merge pull request #3341 from nats-io/go_1_19
Move to Go 1.19, remote io/util, fix data race and a flapper
2022-08-05 12:49:06 -06:00
Matthias Hanel
c56f3b9fbd Adding account purge operation (#3319)
* Adding account purge operation

The new request is available for the system account.
The subject to send the request to is $JS.API.ACCOUNT.PURGE.*
With the name of the account to purge instead of the wildcard.

Also added directory cleanup code such that server do not
end up with empty streams directories and account dirs that
only contain streams

Also adding ACCOUNT to leaf node domain rewrite table

Addresses #3186 and #3306 by providing a way to
get rid of the streams for existing and non existing accounts

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-05 18:24:19 +02:00
Ivan Kozlovic
f208b8660d Merge pull request #3335 from nats-io/fix_3331
[ADDED] LeafNode: Support for a SignatureHandler in remote config
2022-08-05 10:20:04 -06:00
Ivan Kozlovic
3c9a7cc6e5 Move to Go 1.19, remote io/util, fix data race and a flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-05 09:55:37 -06:00
Derek Collison
2120be6476 nit: Cap stats
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-05 07:52:23 -07:00
Derek Collison
be54b08afd Bump to 2.9.0-beta.21
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-04 18:01:26 -07:00
Ivan Kozlovic
b80383965a Merge pull request #3338 from nats-io/fix_qunsub_leak
[FIXED] Memory leak when unsubscribing the last queue subscription
2022-08-04 18:59:35 -06:00
Derek Collison
daaaad5eaf Merge pull request #3337 from nats-io/allow-direct-default
On stream create, change AllowDirect set test on MaxMsgsPer to > 0
2022-08-04 17:46:52 -07:00
Ivan Kozlovic
b6208c775b [FIXED] Memory leak when unsubscribing the last queue subscription
A server maintains a map for the subject+queue to know the number
of members on the same group. However, on unsubscribe when we get
to the last one being unsubscribed, we were removing from the map
but then unfortunately adding back with a value of 0, which caused
a leak. If the same subscription was coming back, then this map
entry would be reused, but if it is a never coming back queue sub,
then memory could increase continously.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-04 18:42:13 -06:00
Todd Beets
9f8b4461f3 change set test to > 0 2022-08-04 17:37:40 -07:00
Ivan Kozlovic
7baf7bd887 [ADDED] LeafNode: Support for a SignatureHandler in remote config
This would allow in embedded use-cases where the user does not
have the ability to use a credentials file. Instead, a signature
callback is specified and invoked by the server sends the CONNECT
protocol. The user is responsible to provide the JWT and sign the
nonce.

Resolves #3331

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-04 16:59:09 -06:00
Ivan Kozlovic
e03d84f704 Merge pull request #3333 from nats-io/leaf_connect_disabled
Use specific boolean for a leaf test instead of using leafNodeEnabled
2022-08-04 14:33:31 -06:00
Ivan Kozlovic
5bc03c7637 Update leafNodeEnabled value on Start()
Maybe that is the place it could be set and not in NewServer(), but
want to minimize risk of breaking something close to 2.9.0

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-04 14:13:35 -06:00
Ivan Kozlovic
d84d9f8288 Use specific boolean for a leaf test instead of using leafNodeEnabled
A test TestJetStreamClusterLeafNodeSPOFMigrateLeaders was added at
some point that needed the remotes to stop (re)connecting. It made
use of existing leafNodeEnabled that was used for GW/Leaf interest
propagation races to disable the reconnect, but that may not be
the best approach since it could affect users embedding servers
and adding leafnodes "dynamically".

So this PR introduced a specific boolean specific for that test.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-04 10:00:11 -06:00
Ivan Kozlovic
5df91797d6 Merge pull request #3332 from nats-io/fix_js_cluster_test_name
Fixed JS cluster prefix name for Travis run
2022-08-04 09:45:28 -06:00
Ivan Kozlovic
fe1feeba7d Fixed JS cluster prefix name for Travis run
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-04 09:09:52 -06:00
Derek Collison
3aeba043fc Bump to 2.9.0-beta.20
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-04 06:20:10 -07:00