As of Go 1.20, `math/rand.Read` is deprecated. In addition to that, it also
isn't recommended for use in combination with anything cryptographic.
I haven't replaced all `math/rand` with `crypto/rand` imports because there
are still some legitimate uses for the `math/rand` package in some places.
Signed-off-by: Neil Twigg <neil@nats.io>
The new field `compression` in the `cluster{}` block allows to
specify which compression mode to use between servers.
It can be simply specified as a boolean or a string for the
simple modes, or as an object for the "s2_auto" mode where
a list of RTT thresholds can be specified.
By default, if no compression field is specified, the server
will use the s2_auto mode with default RTT thresholds of
10ms, 50ms and 100ms for the "uncompressed", "fast", "better"
and "best" modes.
```
cluster {
..
# Possible values are "disabled", "off", "enabled", "on",
# "accept", "s2_fast", "s2_better", "s2_best" or "s2_auto"
compression: s2_fast
}
```
To specify a different list of thresholds for the s2_auto,
here is how it would look like:
```
cluster {
..
compression: {
mode: s2_auto
# This means that for RTT up to 5ms (included), then
# the compression level will be "uncompressed", then
# from 5ms+ to 15ms, the mode will switch to "s2_fast",
# then from 15ms+ to 50ms, the level will switch to
# "s2_better", and anything above 50ms will result
# in the "s2_best" compression mode.
rtt_thresholds: [5ms, 15ms, 50ms]
}
}
```
Note that the "accept" mode means that a server will accept
compression from a remote and switch to that same compression
mode, but will otherwise not initiate compression. That is,
if 2 servers are configured with "accept", then compression
will actually be "off". If one of the server had say s2_fast
then they would both use this mode.
If a server has compression mode set (other than "off") but
connects to an older server, there will be no compression between
those 2 routes.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
New configuration fields:
```
cluster {
...
pool_size: 5
accounts: ["A", "B"]
}
```
The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.
Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.
Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Code change:
- Do not start the processMirrorMsgs and processSourceMsgs go routine
if the server has been detected to be shutdown. This would otherwise
leave some go routine running at the end of some tests.
- Pass the fch and qch to the consumerFileStore's flushLoop otherwise
in some tests this routine could be left running.
Tests changes:
- Added missing defer NATS connection close
- Added missing defer server shutdown
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Remove code coverage from Travis and add it to a GitHub Action
that will be run as a nightly.
- Use tag builds to exclude some tests, such as the "norace" or
JS tests. Since "go test" does not support "negative" regexs, there
is no other way.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Add Go 1.17
- Fix go fmt from Go 1.17 (build directives)
- Download version of misspell and staticcheck instead of doing
"go get" since current staticcheck would be broken without go.mod
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Updated some tests based on this change but also missing defer
connection close or server shutdown.
Fixed how the OCSP run go routine would shutdown, which would
never complete because grWG was not decremented by this go routine
prior to invoking s.Shutdown()
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Currently in tests, we have calls to os.Remove and os.RemoveAll where we
don't check the returned error. This hides useful error messages when
tests fail to run, such as "too many open files".
This change checks for more filesystem related errors and calls t.Fatal
if there is an error.
I noticed that some consumer go routines were left running at the end
of the test suite.
It turns out that there was a race the way the consumer's qch was closed.
Since it was closed and then set to nil, it is possible that the go
routines that are started and then try to capture o.qch would actually
get qch==nil, wich then when doing a select on that nil channel would
block forever.
So we know pass the qch to the 2 go routines loopAndGatherMsgs() and
loopAndDeliverMsgs() so that when we close the channel there is
no risk of that race happening.
I do believe that there is still something that should be looked at:
it seems that a consumer's delivery loop can now be started/stopped
many times based on leadership acquired/lost. If that is the case,
I think that the consumer should wait for previous go routine to
complete before trying to start new ones.
Also moved 3 JetStream tests to the test/norace_test.go file because
they would consumer several GB of memory when running with the -race flag.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Since we were creating subs on the fly, sub.im would always be nil.
We were passing a client because it was needed in sendRouteSubOrUnSubProtos().
This PR simply fills the buffer with each account's subscriptions.
There is also no need to have subs sent from different go routine
based on some threshold. Routes are no longer subject to max pending.
Some code has been made into a function so that they can be shared
by sendSubsToRoute() and sendRouteSubOrUnSubProtos(). The function
is simply adding to given buffer the RS+/- protocol.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
There is a race between the time the processing of a subscription
and the init/send of subscriptions when accepting a leaf node
connection that may cause internally a subscription's subject
to be counted many times, which would then prevent the send of
an LS- when the subscription's interest goes away.
Imagine this sequence of events, each side represents a "thread"
of execution:
```
client readLoop leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist
recv CONNECT
auth, added to acc.
updateSmap
smap["foo"]++ -> 1
no LS+ because !allSubsSent
init smap
finds sub in acc sl
smap["foo"]++ -> 2
sends LS+ foo
allSubsSent == true
recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```
Equivalent result but with slightly diffent execution:
```
client readLoop leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist
recv CONNECT
auth, added to acc.
init smap
finds sub in acc sl
smap["foo"]++ -> 1
sends LS+ foo
allSubsSent == true
updateSmap
smap["foo"]++ -> 2
no LS+ because count != 1
recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```
The approach for the fix is delay the creation of the smap
until we actually initialize the map and send the subs on processing
of the CONNECT.
In the meantime, as soon as the LN connection is registered
and available in updateSmap, we check that smap is nil or
not. If nil, we do nothing.
In "init smap" we keep track of the subscriptions that have been
added to smap. This map will be short lived, just enough to
protect against races above.
In updateSmap, when smap is not nil, we need to checki, if we
are adding, that the subscription has not already been handled.
The tempory subscription map will be ultimately emptied/set to
nil with the use of a timer (if not emptied in place when
processing smap updates).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is a breaking change and will not be able to restore consumer's from a filestore when upgraded.
We are getting close to settling on the API an once that happens we will not be introducnig any
breaking changes.
Signed-off-by: Derek Collison <derek@nats.io>
Updated all tests that use "async" clients.
- start the writeLoop (this is in preparation for changes in the
server that will not do send-in-place for some protocols, such
as PING, etc..)
- Added missing defers in several tests
- fixed an issue in client.go where test was wrong possibly causing
a panic.
- Had to skip a test for now since it would fail without server code
change.
The next step will be ensure that all protocols are sent through
the writeLoop and that the data is properly flushed on close (important
for -ERR for instance).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
In updateRouteSubscriptionMap(), when a queue sub is added/removed,
the code locks the account and then the route to send the update.
However, when a route is accepted and the subs are sent, the
opposite (locking wise) occurs. The route is locked, then the account.
This lock inversion is possible because a route is registered (added
to the server's map) and then the subs are sent.
Use a special lock to protect the send, but don't hold the acc.mu
lock while getting the route's lock.
The tests that were created for the original missed queue updates
issue, namely TestClusterLeaksSubscriptions() and
TestQueueSubWeightOrderMultipleConnections() pass with this change.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).
For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Added the RTT field to each route reported in routez.
Ensure that when a route is accepted, we send a PING to compute
the first RTT and don't have to wait for the ping timer to fire.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If each server has a long list of subscriptions, when the route
is established, sending this list could result in each server
treating the peer as a slow consumer, resulting in a reconnect,
etc..
Also bumping the fan-in threshold for route connections.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>