If a JS API request is received from a non client connection, it
was processed in its own go routine. To reduce the number of
such go routine, we were limiting the number of outstanding routines
to 4096. However, in some situations, it was possible to issue
many requests at the same time that would then cause those requests
to be dropped.
(an example was an MQTT benchmark tool that would create 5000
sessions, each with one QoS1 R1 consumer (with the use of consumer_replicas=1).
On abrupt exit of the tool, the consumers and their sessions needed
to be deleted. Since would cause fast incoming delete consumer requests
which would cause the original code to drop some of them)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This enables lightweight distribution of messages to very large number of NATS subscribers.
We add in metadata as headers that allows for gap detection which enables initial value (via JetStream, maybe KV) and realtime NATS core updates but all globally ordered.
Signed-off-by: Derek Collison <derek@nats.io>
When updating usage, there is a lock inversion in that the jetStream
lock was acquired while under the stream's (mset) lock, which is
not correct. Also, updateUsage was locking the jsAccount lock, which
again, is not really correct since jsAccount contains streams, so
it should be jsAccount->stream, not the other way around.
Removed the locking of jetStream to check for clustered state since
js.clustered is immutable.
Replaced using jsAccount lock to update usage with a dedicated lock.
Originally moved all the update/limit fields in jsAccount to new
structure to make sure that I would see all code that is updating
or reading those fields, and also all functions so that I could
make sure that I use the new lock when calling these. Once that
works was done, and to reduce code changes, I put the fields back
into jsAccount (although I grouped them under the new usageMu mutex
field).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Step down timing for consumers or streams.
Signals loss of leadership and sleeps before stepping down.
This makes it less likely that messages are being processed during step
down.
When becoming leader, consumer stream seqno got reset,
even though the consumer existed already.
Proper cleanup of redelivery data structures and timer
Signed-off-by: Matthias Hanel <mh@synadia.com>
* [Fixed] limits enforcement issues
stream create had checks that stream restore did not have.
Moved code into commonly used function checkStreamCfg.
Also introduced (cluster/non clustered) StreamLimitsCheck functions to
perform checks specific to clustered /non clustered data structures.
Checking for valid stream config and limits/reservations before
receiving all the data. Now fails the request right away.
Added a jetstream limit "max_request_batch" to limit fetch batch size
Shortened max name length from 256 to 255, more common file name limit
Added check for loop in cyclic source stream configurations
features related to limits
Signed-off-by: Matthias Hanel <mh@synadia.com>
- Wait of some sort of routing to be in place before starting
the raft run loop
- Remove use of lock in apiDispatch that was not necessary but
could have cause a route to block, causing memory growth, etc..
Unrelated rename of some tests so that they start with TestJetStream
and TestJetStreamCluster for cluster tests, fixed some flappers
and ensure that tests that change RAFT timeouts put them back
to default values on exit.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This broke cross-account functionality. Ported the test from the
Go client that showed the failure after PR#2997 was merged.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The system will allow an update to a stream, and subsequently all attached consumers, to be placed in another cluster either directly or via tag placement.
The meta layer will scale the underlying peerset appropriately to straddle the two clusters for both the stream and consumers, taking into account the consumer type.
Control will then pass to the current leaders of the assets who will monitor the catchup status of the new peers.
(Note we can optimize this later to only traverse once across a GW for any given asset, but for now this is simpler)
Once the original leaders have determined the assets are synched it will pass leadership to a member of the new peerset.
Once the new leader has been elected, it will forward a request for the meta layer to shrink the peerset by removing the old peers.
Signed-off-by: Derek Collison <derek@nats.io>
An example was a "consumer info" request with a consumer name that
had tokens, which is illegal. This results in the request being
dropped in apiDispatch() because there was no interest.
The server will now return a "bad request" error in such case.
Resolves#2995
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
* added max_ack_penind setting to js account limits
because of the addition, defaults now have to be set later (depend on
these new limits now)
also re-organized the code to closer track how stream create looks
Signed-off-by: Matthias Hanel <mh@synadia.com>
* Adding server limits (max ack pending/dedupe window) to js config
Also shifting consumer config check to jsConsumerCreate as in clustered
mode this was enforced in the wrong place
Signed-off-by: Matthias Hanel <mh@synadia.com>
Also fixed a bug where we were incorrectly not spining up the monitoring loop for a stream when going from 3->1->3.
Signed-off-by: Derek Collison <derek@nats.io>
startGoRoutine will execute the closed function as a go routine,
so passing copyBytes(msg) as the argument caused a race. The
copy needs to be done before startGoRoutine, as it was before
being changed in https://github.com/nats-io/nats-server/pull/2925
Here is the race observed:
```
==================
WARNING: DATA RACE
Write at 0x00c0001dd930 by goroutine 367:
runtime.racewriterange()
<autogenerated>:1 +0x29
internal/poll.ignoringEINTRIO()
/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/internal/poll/fd_unix.go:582 +0x454
internal/poll.(*FD).Read()
/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/internal/poll/fd_unix.go:163 +0x26
net.(*netFD).Read()
/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/net/fd_posix.go:56 +0x50
net.(*conn).Read()
/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/net/net.go:183 +0xb0
net.(*TCPConn).Read()
<autogenerated>:1 +0x64
github.com/nats-io/nats-server/v2/server.(*client).readLoop()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1188 +0x8f7
github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode.func1()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:904 +0x5d
Previous read at 0x00c0001dd930 by goroutine 93:
runtime.slicecopy()
/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/slice.go:284 +0x0
github.com/nats-io/nats-server/v2/server.copyBytes()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/util.go:282 +0x10b
github.com/nats-io/nats-server/v2/server.(*Server).jsStreamListRequest.func1()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:1613 +0x26
Goroutine 367 (running) created at:
github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3017 +0x86
github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:904 +0x1b08
github.com/nats-io/nats-server/v2/server.(*Server).startLeafNodeAcceptLoop.func1()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:604 +0x4b
github.com/nats-io/nats-server/v2/server.(*Server).acceptConnections.func1()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:2122 +0x58
Goroutine 93 (running) created at:
github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3017 +0x86
github.com/nats-io/nats-server/v2/server.(*Server).jsStreamListRequest()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:1613 +0xbf1
github.com/nats-io/nats-server/v2/server.(*Server).jsStreamListRequest-fm()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:1554 +0xcc
github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:680 +0xcf0
github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch-fm()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:652 +0xcc
github.com/nats-io/nats-server/v2/server.(*client).deliverMsg()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3181 +0xbde
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4164 +0xe1e
github.com/nats-io/nats-server/v2/server.(*client).processInboundLeafMsg()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:2183 +0x7eb
github.com/nats-io/nats-server/v2/server.(*client).processInboundMsg()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3498 +0xb1
github.com/nats-io/nats-server/v2/server.(*client).parse()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x3886
github.com/nats-io/nats-server/v2/server.(*client).readLoop()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1228 +0x1669
github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode.func1()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:904 +0x5d
==================
testing.go:1152: race detected during execution of test
```
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.
The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.
Also fixed some flappers and a bug where de-dupe might not be reformed correctly.
Signed-off-by: Derek Collison <derek@nats.io>
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
We also would hang if no stream info requests were sent during a stream list due to the asset being offline.
Signed-off-by: Derek Collison <derek@nats.io>
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.
Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
In clustering mode, the number of consumers in stream info may be
wrong in presence of non durable consumers. Ephemeral are handled
by specific nodes. The StreamInfo response would contain only the
consumer count that the stream leader is handling.
This fix overrides the stream's state consumers count with the
number of consumers from the stream assignment record.
Resolves#2895
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This should help with GC pressure, however, it may have an effect
on performance (based on some benchmark). Calling sync.Pool.Get/Put
too often has a performance impact...
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This allows stream placement to overflow to adjacent clusters.
We also do more balanced placement based on resources (store or mem). We can continue to expand this as well.
We also introduce an account requirement that stream configs contain a MaxBytes value.
We now track account limits and server limits more distinctly, and do not reserver server resources based on account limits themselves.
Signed-off-by: Derek Collison <derek@nats.io>
Along a leaf node connection, unless the system account is shared AND the JetStream domain name is identical, the default JetStream traffic (without a domain set) will be denied.
As a consequence, all clients that wants to access a domain that is not the one in the server they are connected to, a domain name must be specified.
Affected from this change are setups where: a leaf node had no local JetStream OR the server the leaf node connected to had no local JetStream.
One of the two accounts that are connected via a leaf node remote, must have no JetStream enabled.
The side that does not have JetStream enabled, will loose JetStream access and it's clients must set `nats.Domain` manually.
For workarounds on how to restore the old behavior, look at:
https://github.com/nats-io/nats-server/pull/2693#issuecomment-996212582
New config values added:
`default_js_domain` is a mapping from account to domain, settable when JetStream is not enabled in an account.
`extension_hint` are hints for non clustered server to start in clustered mode (and be usable to extend)
`js_domain` is a way to set the JetStream domain to use for mqtt.
Signed-off-by: Matthias Hanel <mh@synadia.com>
When a consumer is configured with "meta-only" option, and the
stream was backed by a memory store, a memory corruption could
happen causing the application to receive corrupted headers.
Also replaced most of usage of `append(a[:0:0], a...)` to make
copies. This was based on this wiki:
https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F
But since Go 1.15, it is actually faster to call make+copy instead.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>