Commit Graph

341 Commits

Author SHA1 Message Date
Ivan Kozlovic
5573933034 Bump back the defaultMaxTotalCatchupOutBytes to 128MB
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-31 09:19:28 -06:00
Derek Collison
98bf861a7a Updates to stream and consumer move logic.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 16:11:35 -07:00
Derek Collison
56e177c329 Allow stream msgs to be compressed within the raft layer and during upper layer catchups.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 16:10:57 -07:00
Ivan Kozlovic
9a6a2c31ee [ADDED] JetStream: Ability to configure the per server max catchup bytes
The original value was hardcoded to 128MB and 32MB per stream. The
per-server limit is lowered to 32MB but is configurable with
a new configuration parameter:
```
jetstream {
   max_catchup: 8MB
}
```

The per-stream limit was also lowered from 32MB/128,000msgs to
8MB/32,000 messages.

Tests have shown no difference in performance for fast links.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-30 13:46:13 -06:00
Ivan Kozlovic
e609d12061 [FIXED] Stream info numbers may be 0 after cluster restart
This would happen after multiple replicas changes.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-30 08:49:39 -06:00
Ivan Kozlovic
8c23bfea5d Revert a change made in PR #3392
It seems to cause problems when upgrading from a v2.7.4 to main
branch.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-25 14:15:59 -06:00
Matthias Hanel
970491debc scale down happened too soon
when currentCount != replicas

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-23 17:44:56 -07:00
Derek Collison
212adf5775 General improvements to clustered streams during server restart and KV/CAS scenarios.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-22 18:36:15 -07:00
Ivan Kozlovic
5663bc2fa3 Reduce length of some clustering tests
Since PR #3381, the 2 tests modified here would take twice as
long (around 245 seconds) to complete.
Talking with Matthias, he suggested using a variable instead of
a const and set it to 0 for those 2 tests since they don't really
need that to be set.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-22 12:35:37 -06:00
Ivan Kozlovic
b1822e1b4c Some more checks for cc.meta == nil
Missed those when re-running the previous test for longer period
of time.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-22 11:06:04 -06:00
Ivan Kozlovic
c30445657f Fixed possible panic in monitorStream
Saw this panic in code coverage run:
```
=== RUN   TestJetStreamClusterPeerExclusionTag
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x88 pc=0x8acd55]

goroutine 97850 [running]:
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorStream(0xc002b94780, 0xc001ecb500, 0xc003229b00, 0x0)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:1653 +0x495
github.com/nats-io/nats-server/v2/server.(*jetStream).processClusterCreateStream.func1()
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:2953 +0x3b
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/server.go:3063 +0xa7
```

Was able to reproduce and reason was `meta` was nil.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-22 09:52:05 -06:00
Matthias Hanel
6bf50dbb77 induce delay prior to scale down (#3381)
This is to avoid a narrow race between adding server and them catching
up where they also register as current.

Also wait for all peers to be caught up.

This also avoids clearing catchup marker once catchup stalled.
A stalled catchup would remove the marker causing the peer to
register as current.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-18 13:47:40 -07:00
Matthias Hanel
9892a132e7 Improve StreamMoveInProgressError (#3376)
by adding progress indicators

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-17 15:12:32 -07:00
Derek Collison
9c9de656c6 We can't purge directories here since not 100% sure all state is in snapshot.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-17 14:57:19 -07:00
Ivan Kozlovic
7de4497815 Install consumer snapshot on clean exit and few other fixes
- didRemove in applyMetaEntries() could be reset when processing
multiple entries
- change "no race" test names to include JetStream
- separate raft nodes leader stepdown and stop in server
shutdown process
- in InstallSnapshot, call wal.Compact() with lastIndex+1

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-16 17:05:49 -06:00
Matthias Hanel
c6e37cf7af Fix race between stream stop and monitorStream (#3350)
* Fix race between stream stop and monitorStream

monitorCluster stops the stream, when doing so, monitorStream
needs to be stopped to avoid miscounting of store size.
In a test stop and reset of store size happened first and then
was followed by storing more messages via monitorStream

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-10 19:01:21 +02:00
Ivan Kozlovic
502e5b13f7 Declare some catchup static errors
Use `var .. = errors.New()`.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-08 17:51:31 -06:00
Ivan Kozlovic
ecddb08469 [IMPROVED] JetStream catchup can be aborted and better flow control
If the leader sends messages but the follower for any reason aborts
or retry the snapshot process, it will now send the error that
caused this and the leader can then abort the catchup instead of
waiting for its inactivity threshold of 5 seconds.

Also make the send of a batch be delayed for a bit until the number
of "acks" is 1/2 of the batch size or after reaching 100ms. This
helps avoid trickling of messages. Tested with the new test
TestJetStreamSuperClusterStreamCathupLongRTT() and see better
results both in size of batches and overall time is smaller or
similar but not longer.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-08 17:19:36 -06:00
Derek Collison
06112d6885 Reset activity interval on catchup to default vs ramp up. Tweak test.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
758b733d43 Attempt to improve long RTT catchup time during stream moves.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
e635de7526 Additional stability improvements for catchup.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
5a050fc10b Improve handling when a snapshot represents state we no longer have.
We would send skip messages for a sync request that was completely below our current state, but this could be more traffic then we might want.
Now we only send EOF and the other side can detect the skip forward and adjust on a successful catchup.
We still send skips if we can partially fill the sync request.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:08 -06:00
Ivan Kozlovic
d96e801825 Change the report to something like this instead:
```
Replica: Server name unknown at this time (peerID: jZ6RvVRH), outdated, OFFLINE, not seen
```
After discussing with @ripienaar, this text convey better a sense
that this is a transient situation.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-08 09:30:37 -06:00
Ivan Kozlovic
267e6d1958 [IMPROVED] Replicas ordering and info regarding unknown in stream info
If a cluster is brought down and then partially restarted, the
replica information about the non restarted node would be completely
missing. The CLI could report replicas 3 but then only the leader
and the running replicas, but nothing about the other node.
Since this node's server name is not know, this PR adds an entry
with something similar to this:
```
<unknown (peerID: jZ6RvVRH)>, outdated, OFFLINE, not seen
```

Also, replicas array is now ordered, which will help when using
a watcher or repeating stream info commands in that the replicas
output will be stable in regards to the list of replicas.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-07 18:54:26 -06:00
Matthias Hanel
52c4872666 better error when peer selection fails (#3342)
* better error when peer selection fails

It is pretty hard to diagnose what went wrong when not enough peers for
an operation where found. This change now returns counts of reasons why
peers where discarded.

Changed the error to JSClusterNoPeers as it seems more appropriate
of an error for that operation. Not having enough resources is one of
the conditions for a peer not being considered. But so is having a non
matching tag. Which is why JSClusterNoPeers seems more appropriate
In addition, JSClusterNoPeers was already used as error after one call
to selectPeerGroup already.

example:
no suitable peers for placement: peer selection cluster 'C' with 3 peers
offline: 0
excludeTag: 1
noTagMatch: 2
noSpace: 0
uniqueTag: 0
misc: 0

Examle for mqtt:
mid:12 - "mqtt" - unable to connect: create sessions stream for account "$G":
no suitable peers for placement: peer selection cluster 'MQTT' with 3 peers
        offline: 0
        excludeTag: 0
        noTagMatch: 0
        noSpace: 0
        uniqueTag: 0
        misc: 0
         (10005)

Signed-off-by: Matthias Hanel <mh@synadia.com>

* review comment

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-06 00:17:01 +02:00
Matthias Hanel
c56f3b9fbd Adding account purge operation (#3319)
* Adding account purge operation

The new request is available for the system account.
The subject to send the request to is $JS.API.ACCOUNT.PURGE.*
With the name of the account to purge instead of the wildcard.

Also added directory cleanup code such that server do not
end up with empty streams directories and account dirs that
only contain streams

Also adding ACCOUNT to leaf node domain rewrite table

Addresses #3186 and #3306 by providing a way to
get rid of the streams for existing and non existing accounts

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-05 18:24:19 +02:00
Derek Collison
5e98263de8 General stability improvements
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-29 16:02:31 -07:00
Derek Collison
50a25881e2 Encrypt meta and raft states.
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-29 08:10:57 -07:00
Ivan Kozlovic
5786d2d9d6 Changed "return" to "continue"
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-07-27 18:23:54 -06:00
Ivan Kozlovic
88203dd5d5 Fixed a panic when consumer is closed
Panic was:
```
=== RUN   TestJetStreamClusterDelete
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xcec8fb]
goroutine 1761 [running]:
github.com/nats-io/nats-server/v2/server.(*stream).config(0x0)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/stream.go:1192 +0x5b
github.com/nats-io/nats-server/v2/server.(*consumer).replica(0xc000101400)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:3580 +0xea
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorConsumer(0xc0001d2790, 0xc000101400, 0xc0004df0e0)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:3733 +0xe06
github.com/nats-io/nats-server/v2/server.(*jetStream).processClusterCreateConsumer.func1()
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:3445 +0x4d
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3057 +0x85
FAIL	github.com/nats-io/nats-server/v2/server	9.911s
```

Seem to have been introduced in #3282

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-07-27 16:51:10 -06:00
Matthias Hanel
3358205de3 add implementation for consumer replica change (#3293)
* add implementation for consumer replica change

fixes #3262

also check peer list on every update

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-27 03:56:28 +02:00
Matthias Hanel
04ffed48b0 fix peer tracking by removing peers before scaledown (#3289)
in doRemovePeerAsLeader the leader also records the removed peer in the removed set

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-26 22:01:03 +02:00
Matthias Hanel
6212087feb fix race by locking arround o.isLeader (#3291)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-26 21:49:04 +02:00
Ivan Kozlovic
fe370955c8 Merge pull request #3288 from nats-io/debug_test_failure
[FIXED] JetStream: Some scaling up issues
2022-07-26 08:57:17 -06:00
Ivan Kozlovic
1a6c5f1c90 [FIXED] JetStream: Some scaling up issues
- Send snapshot only if leader
- When processing snapshot, start with a smaller inactivity interval
  that will double up to 10sec or use 10sec directly once we get a
  message. Reason for that is that it is possible that the request
  for snapshot is sent while the leader has not yet setup the subscription
  that receives the requests (or subscription has not fully reached the
  cluster).
- Don't remember snapfile on err.
- Do not consider current if we have not had any activity.
- Stabilize stream scale up under active heavy publishing.
- Due to the publish pressure move the check for followers direct subs spinning up til after we stop publishing.

Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-07-25 18:44:18 -06:00
Ivan Kozlovic
ebeca00e20 [FIXED] JetStream/Cluster: Stream names/infos would return bad response
If there are more stream names that the current limit of 1024, getting
the list of names would return them all instead of using pagination.

For "stream infos", the Total amount returned would be the API limit
instead of the actual number of streams.

Resolves https://github.com/nats-io/natscli/issues/541

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-07-25 14:41:05 -06:00
Matthias Hanel
5a720d4977 down scale consumer before downscale of stream (#3282)
Now monitorStream waits with scaling down the stream until all
monitorConsumer have scaled down their respective consumer

Also update consumer assignment for later use in monitorConsumer
Same for stream assignment in monitorStream

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-22 19:54:13 +02:00
Ivan Kozlovic
a02a617c05 Merge pull request #3280 from nats-io/fix_3273
[IMPROVED] JetStream: stream already exists error description
2022-07-21 10:53:47 -06:00
Ivan Kozlovic
1da5ecfb96 [IMPROVED] JetStream: stream already exists error description
The `JSStreamNameExistErr` will now include in the description that
the stream exists with a different configuration, because that is
the error clients would get when trying to add a stream with a
different configuration (otherwise this is a no-op and client
don't get an error).

Since that error was used in case of restore, a new error is added
but uses the same description prefix "stream name already in use"
but adds ", cannot restore" to indicate that this is a restore
failure because the stream already exists.

Resolves #3273

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-07-21 10:20:07 -06:00
Derek Collison
f2abdaeb43 Make sure to protect against mset == nil
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-21 06:53:26 -07:00
Matthias Hanel
89b5e872ac Move and cancel fixes (#3270)
The Move/Cancel/Downscale mechanism did not take into account that
the consumer's replica count can be set independently.

This also alters peer selection to have the ability to skip 
unique tag prefix check for server that will be replaced.
Say you have 3 az, and want to add another server to az:1, 
in order to replace a server that is the same zone.
Without this change, uniqueTagPrefix check would filter 
the server to replace with and cause a failure.

The cancel move response could not be received due to 
the wrong account name.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-18 18:42:03 +02:00
Matthias Hanel
023500e1da add the ability to cancel a move in progress (#3253)
* add the ability to cancel a move in progress

Move to individual subjects for move and cancel_move

New subjects are:
$JS.API.ACCOUNT.STREAM.MOVE.*.*
$JS.API.ACCOUNT.STREAM.CANCEL_MOVE.*.*

last and second to last token are account and stream name

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-12 21:54:18 +02:00
Derek Collison
85123861d4 Merge pull request #3249 from nats-io/catchup_eof
Fix for stalled catchup in endless cycle on EOF
2022-07-07 17:54:07 -07:00
Derek Collison
333e2fc2f1 Fix for stalled catchup in endless cycle on EOF trying to retrieve catchup msg.
A customer experienced and endless failure to have a stream cacthup. The current leader was being asked for a message from a snapshot that was larger then what we had, resulting in EOF which silently failed.
We now detect this and signal end of catchup and redo the bad snapshot if possible.

Signed-off-by: Derek Collison <derek@nats.io>
2022-07-07 13:42:41 -07:00
Matthias Hanel
f0ee56cf0a Fix unique_tag issue with stream replica increase
When increasing the replica count unique tags for already existing peers
where ignored, which could lead to bad placement

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-07 21:22:55 +02:00
Derek Collison
c49d081341 Fix data race
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-07 09:05:50 -07:00
Matthias Hanel
70be4b77f9 fixes peer removal, simplifies move, more tests
Make sure when processing a peer removal that the stream assignment agrees.
When a new leader takes over it can resend a peer removal, and if the stream/consumer really was rescheduled we could remove by accident.

Also need to make sure that when we remove a stream we remove the node as part of the stream assignment.
If we didn't, if the same asset returned to this server we would not start up the monitoring loop.

Simplify migration logic in monitorStream, to be driven by leader only

Improved unit tests

Added failure when server not in peer list

Move command does not require server anymore

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-07 03:32:13 +02:00
Derek Collison
722ae548dd Fix data race
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-06 09:11:22 -07:00
Derek Collison
47bef915ed Allow all members of a replicated stream to participate in direct access.
We will wait until a non-leader replica is current to subscribe.

Signed-off-by: Derek Collison <derek@nats.io>
2022-07-03 11:08:24 -07:00
Matthias Hanel
6bd14e1b7a removed commented out code (#3228)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-06-29 20:31:12 +02:00