* Adding account purge operation
The new request is available for the system account.
The subject to send the request to is $JS.API.ACCOUNT.PURGE.*
With the name of the account to purge instead of the wildcard.
Also added directory cleanup code such that server do not
end up with empty streams directories and account dirs that
only contain streams
Also adding ACCOUNT to leaf node domain rewrite table
Addresses #3186 and #3306 by providing a way to
get rid of the streams for existing and non existing accounts
Signed-off-by: Matthias Hanel <mh@synadia.com>
- Send snapshot only if leader
- When processing snapshot, start with a smaller inactivity interval
that will double up to 10sec or use 10sec directly once we get a
message. Reason for that is that it is possible that the request
for snapshot is sent while the leader has not yet setup the subscription
that receives the requests (or subscription has not fully reached the
cluster).
- Don't remember snapfile on err.
- Do not consider current if we have not had any activity.
- Stabilize stream scale up under active heavy publishing.
- Due to the publish pressure move the check for followers direct subs spinning up til after we stop publishing.
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If there are more stream names that the current limit of 1024, getting
the list of names would return them all instead of using pagination.
For "stream infos", the Total amount returned would be the API limit
instead of the actual number of streams.
Resolves https://github.com/nats-io/natscli/issues/541
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Now monitorStream waits with scaling down the stream until all
monitorConsumer have scaled down their respective consumer
Also update consumer assignment for later use in monitorConsumer
Same for stream assignment in monitorStream
Signed-off-by: Matthias Hanel <mh@synadia.com>
The `JSStreamNameExistErr` will now include in the description that
the stream exists with a different configuration, because that is
the error clients would get when trying to add a stream with a
different configuration (otherwise this is a no-op and client
don't get an error).
Since that error was used in case of restore, a new error is added
but uses the same description prefix "stream name already in use"
but adds ", cannot restore" to indicate that this is a restore
failure because the stream already exists.
Resolves#3273
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The Move/Cancel/Downscale mechanism did not take into account that
the consumer's replica count can be set independently.
This also alters peer selection to have the ability to skip
unique tag prefix check for server that will be replaced.
Say you have 3 az, and want to add another server to az:1,
in order to replace a server that is the same zone.
Without this change, uniqueTagPrefix check would filter
the server to replace with and cause a failure.
The cancel move response could not be received due to
the wrong account name.
Signed-off-by: Matthias Hanel <mh@synadia.com>
* add the ability to cancel a move in progress
Move to individual subjects for move and cancel_move
New subjects are:
$JS.API.ACCOUNT.STREAM.MOVE.*.*
$JS.API.ACCOUNT.STREAM.CANCEL_MOVE.*.*
last and second to last token are account and stream name
Signed-off-by: Matthias Hanel <mh@synadia.com>
A customer experienced and endless failure to have a stream cacthup. The current leader was being asked for a message from a snapshot that was larger then what we had, resulting in EOF which silently failed.
We now detect this and signal end of catchup and redo the bad snapshot if possible.
Signed-off-by: Derek Collison <derek@nats.io>
When increasing the replica count unique tags for already existing peers
where ignored, which could lead to bad placement
Signed-off-by: Matthias Hanel <mh@synadia.com>
Make sure when processing a peer removal that the stream assignment agrees.
When a new leader takes over it can resend a peer removal, and if the stream/consumer really was rescheduled we could remove by accident.
Also need to make sure that when we remove a stream we remove the node as part of the stream assignment.
If we didn't, if the same asset returned to this server we would not start up the monitoring loop.
Simplify migration logic in monitorStream, to be driven by leader only
Improved unit tests
Added failure when server not in peer list
Move command does not require server anymore
Signed-off-by: Matthias Hanel <mh@synadia.com>
Also added:
ability to reload tags
special tag (!jetstream) to remove peer from peer placement
$JS.API.SERVER.STREAM.MOVE subject to initiate move away from a server
This changes a detail about regular stream move as well.
Before differing cluster names where used to start/stop a transfer.
Now only the peer list and it's size relative to configured replica matter.
Once a transfer is considered completed, excess peers will be dropped
from the beginning of the list.
This allows transfers within the cluster as well.
Signed-off-by: Matthias Hanel <mh@synadia.com>
This could happen when a consumer had not sent anything to the
attached NATS subscription and there was a consumer leader
step down or server restart.
Signed-off-by: Derek Collison <derek@nats.io>
We were getting a data race checking the js.clustered field in
updateUsage() following fix for lock inversion in PR #3092.
```
=== RUN TestJetStreamClusterKVMultipleConcurrentCreate
==================
WARNING: DATA RACE
Read at 0x00c0009db5d8 by goroutine 195:
github.com/nats-io/nats-server/v2/server.(*jsAccount).updateUsage()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:1681 +0x8f
github.com/nats-io/nats-server/v2/server.(*stream).storeUpdates()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/stream.go:2927 +0x1d9
github.com/nats-io/nats-server/v2/server.(*stream).storeUpdates-fm()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/stream.go:2905 +0x7d
github.com/nats-io/nats-server/v2/server.(*fileStore).removeMsg()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:2158 +0x14f7
github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:2777 +0x18f
github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs-fm()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:2770 +0x39
Previous write at 0x00c0009db5d8 by goroutine 128:
github.com/nats-io/nats-server/v2/server.(*jetStream).setupMetaGroup()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:604 +0xfae
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStreamClustering()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:514 +0x20a
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStream()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:400 +0x1168
github.com/nats-io/nats-server/v2/server.(*Server).EnableJetStream()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:206 +0x651
github.com/nats-io/nats-server/v2/server.(*Server).Start()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:1746 +0x1804
github.com/nats-io/nats-server/v2/server.RunServer·dwrap·4269()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server_test.go:90 +0x39
Goroutine 195 (running) created at:
time.goFunc()
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/time/sleep.go:180 +0x49
Goroutine 128 (finished) created at:
github.com/nats-io/nats-server/v2/server.RunServer()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server_test.go:90 +0x278
github.com/nats-io/nats-server/v2/server.RunServerWithConfig()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server_test.go:112 +0x44
github.com/nats-io/nats-server/v2/server.(*cluster).restartServer()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_helpers_test.go:1004 +0x1d5
github.com/nats-io/nats-server/v2/server.TestJetStreamClusterKVMultipleConcurrentCreate()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster_test.go:8463 +0x64b
testing.tRunner()
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/testing/testing.go:1259 +0x22f
testing.(*T).Run·dwrap·21()
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/testing/testing.go:1306 +0x47
==================
```
Running that test with adding some delay in several places also showed another race:
```
==================
WARNING: DATA RACE
Read at 0x00c00016adb8 by goroutine 160:
github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/filestore.go:2777 +0x106
github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs-fm()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/filestore.go:2771 +0x39
Previous write at 0x00c00016adb8 by goroutine 32:
github.com/nats-io/nats-server/v2/server.(*fileStore).UpdateConfig()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/filestore.go:360 +0x1c8
github.com/nats-io/nats-server/v2/server.(*stream).update()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/stream.go:1360 +0x852
github.com/nats-io/nats-server/v2/server.(*jetStream).processClusterCreateStream()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:2704 +0x4a4
github.com/nats-io/nats-server/v2/server.(*jetStream).processStreamAssignment()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:2452 +0xad9
github.com/nats-io/nats-server/v2/server.(*jetStream).applyMetaEntries()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:1407 +0x7e4
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorCluster()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:887 +0xc75
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorCluster-fm()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:813 +0x39
Goroutine 160 (running) created at:
time.goFunc()
/usr/local/go/src/time/sleep.go:180 +0x49
Goroutine 32 (running) created at:
github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:3013 +0x86
github.com/nats-io/nats-server/v2/server.(*jetStream).setupMetaGroup()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:612 +0x1092
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStreamClustering()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:514 +0x20a
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStream()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:400 +0x1168
github.com/nats-io/nats-server/v2/server.(*Server).EnableJetStream()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:206 +0x651
github.com/nats-io/nats-server/v2/server.(*Server).Start()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:1746 +0x1804
github.com/nats-io/nats-server/v2/server.RunServer·dwrap·4275()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server_test.go:90 +0x39
==================
```
Both are now addressed, either with proper locking, or with the use of an atomic in the place
where we cannot get the lock (without re-introducing the lock inversion issue).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When updating usage, there is a lock inversion in that the jetStream
lock was acquired while under the stream's (mset) lock, which is
not correct. Also, updateUsage was locking the jsAccount lock, which
again, is not really correct since jsAccount contains streams, so
it should be jsAccount->stream, not the other way around.
Removed the locking of jetStream to check for clustered state since
js.clustered is immutable.
Replaced using jsAccount lock to update usage with a dedicated lock.
Originally moved all the update/limit fields in jsAccount to new
structure to make sure that I would see all code that is updating
or reading those fields, and also all functions so that I could
make sure that I use the new lock when calling these. Once that
works was done, and to reduce code changes, I put the fields back
into jsAccount (although I grouped them under the new usageMu mutex
field).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When deciding to compact a file, we need to remove from the raw
bytes the empty records, otherwise, for small messages, we would
end-up calling compact() too many times.
When removing a message from the stream, in FIFO cases we would
write the index every 2 seconds at most when doing it in place,
when when dealing with out of order deletes, we would do it for
every single delete, which can be costly. We are now writing
only every 500ms for non FIFO cases.
Also fixed some unrelated code:
- Decision to install a snapshot was based on incorrect logical
expression
- In checkPending(), protect against the timer being nil which
could happen when consumer is stopped or leadership change.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Step down timing for consumers or streams.
Signals loss of leadership and sleeps before stepping down.
This makes it less likely that messages are being processed during step
down.
When becoming leader, consumer stream seqno got reset,
even though the consumer existed already.
Proper cleanup of redelivery data structures and timer
Signed-off-by: Matthias Hanel <mh@synadia.com>
This is a continuation of PR #3060, but extends to clustering.
Verified with manual test that a mirror created with v2.7.4 has
the duplicates window set and on restart with main would still
complain about use of dedup in cluster mode. The mirror stream
was recovered but showing as R1.
With this fix, a restart of the cluster - with existing data -
will properly recover the stream as an R3 and messages that
were published while in a bad state are synchronized.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Matthias Hanel mh@synadia.com
* [Fixed] limits enforcement issues
stream create had checks that stream restore did not have.
Moved code into commonly used function checkStreamCfg.
Also introduced (cluster/non clustered) StreamLimitsCheck functions to
perform checks specific to clustered /non clustered data structures.
Checking for valid stream config and limits/reservations before
receiving all the data. Now fails the request right away.
Added a jetstream limit "max_request_batch" to limit fetch batch size
Shortened max name length from 256 to 255, more common file name limit
Added check for loop in cyclic source stream configurations
features related to limits
Signed-off-by: Matthias Hanel <mh@synadia.com>