Commit Graph

3742 Commits

Author SHA1 Message Date
Ivan Kozlovic
1ba617bba0 Fixed data race with RAFT node election timer
Got this race:
```
==================
WARNING: DATA RACE
Read at 0x00c001c880e8 by goroutine 342:
  github.com/nats-io/nats-server/v2/server.(*raft).resetElect()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:1525 +0x44
  github.com/nats-io/nats-server/v2/server.(*raft).resetElectionTimeout()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:1520 +0xa4
  github.com/nats-io/nats-server/v2/server.(*raft).handleAppendEntry()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:2537 +0x12e
  github.com/nats-io/nats-server/v2/server.(*raft).handleAppendEntry-fm()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:2525 +0xcc
...

Previous write at 0x00c001c880e8 by goroutine 587:
  github.com/nats-io/nats-server/v2/server.(*raft).resetElect()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:1526 +0x113
  github.com/nats-io/nats-server/v2/server.(*raft).resetElectionTimeout()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:1520 +0xa4
  github.com/nats-io/nats-server/v2/server.(*Server).startRaftNode()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:484 +0x20d1
  github.com/nats-io/nats-server/v2/server.(*jetStream).createRaftGroup()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:1497 +0x9ed
  github.com/nats-io/nats-server/v2/server.(*jetStream).processClusterCreateConsumer()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:3063 +0xba4
...

==================
WARNING: DATA RACE
Read at 0x00c0006671f0 by goroutine 342:
  time.(*Timer).Stop()
      /usr/local/go/src/time/sleep.go:78 +0x84
  github.com/nats-io/nats-server/v2/server.(*raft).resetElect()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:1528 +0x58
  github.com/nats-io/nats-server/v2/server.(*raft).resetElectionTimeout()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:1520 +0xa4
  github.com/nats-io/nats-server/v2/server.(*raft).handleAppendEntry()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:2537 +0x12e
  github.com/nats-io/nats-server/v2/server.(*raft).handleAppendEntry-fm()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:2525 +0xcc
...

Previous write at 0x00c0006671f0 by goroutine 587:
  time.NewTimer()
      /usr/local/go/src/time/sleep.go:92 +0xb3
  github.com/nats-io/nats-server/v2/server.(*raft).resetElect()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:1526 +0x104
  github.com/nats-io/nats-server/v2/server.(*raft).resetElectionTimeout()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:1520 +0xa4
  github.com/nats-io/nats-server/v2/server.(*Server).startRaftNode()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/raft.go:484 +0x20d1
  github.com/nats-io/nats-server/v2/server.(*jetStream).createRaftGroup()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:1497 +0x9ed
...
```

Looked at all places where resetElect() or resetElectionTimeout() was invoked without
being protected by the raft's lock and added it.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 18:56:28 -06:00
Ivan Kozlovic
37a3403585 Bump to version 2.8.0-beta.15
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 17:50:21 -06:00
Derek Collison
3c0bced76e Move test to no race, rename others
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-12 16:23:36 -07:00
Derek Collison
3bd8ee845e Fix description for Wipe
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-12 16:18:38 -07:00
Derek Collison
04db6b0935 Only wipe on certain errors and always resume
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-12 15:50:37 -07:00
Ivan Kozlovic
50c3986863 [FIXED] JetStream stream catchup issues
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 16:05:12 -06:00
Derek Collison
5dfcc5e934 Fix for flapping WAL test
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 22:50:25 -07:00
Derek Collison
ce650937f0 Don't set domain here
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 20:52:22 -07:00
Matthias Hanel
0f113aa3d5 [FIXED] subject renaming with hand crafted reply subject (#3026)
do so by rejecting jsackprefix in reply subjects

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-11 22:32:02 -04:00
Derek Collison
aa256de55b Add in Domain to alternates
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 18:47:19 -06:00
Derek Collison
b7718e2b7a First pass support for stream alternates
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 18:47:19 -06:00
Derek Collison
0979c9f720 Bump to 2.8.0-beta.14
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 17:41:57 -07:00
Derek Collison
04cce6df68 Merge pull request #3020 from nats-io/move-updates
[IMPROVED] Raft layer for general stability and leader election.
2022-04-11 17:33:13 -07:00
Matthias Hanel
02d25cc640 [FIXED] Consumer deliver subject incorrect when imported and crossing gateway (#3025)
followup to #3017

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-11 20:27:25 -04:00
Derek Collison
e330572cef Select next leader before truncating
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 17:04:29 -07:00
Derek Collison
3ed1ecc032 Remove old code
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 12:00:29 -07:00
Jaime Piña
cfa55281ec Refactor SystemLimitsPlacement tests (#3014) 2022-04-11 11:41:38 -07:00
Matthias Hanel
13e5ab10bd fix js nex interest check where leaf node masked gw subj propagation (#3016)
basically a gw subject propagation issue could be hidden behind a leaf
node.
also change error text when this was the case

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-11 14:04:09 -04:00
Derek Collison
95f3a3f919 Resolved conflicts with main
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 06:24:47 -07:00
Derek Collison
c3612b57c7 Fixes for some flapping tests
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-10 13:02:03 -07:00
Derek Collison
37cbac99e7 Improvements to the raft layer for general stability and support of scale up and down and asset move.
Also fixed a bug that would allow a leadership transfer when catching up.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-10 08:59:39 -07:00
Derek Collison
e7ff38a4ca Add consumerMemStore impl to allow proper replication of state.
Resolves #3006

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-10 08:01:13 -07:00
Derek Collison
2510d671de Skip flapper for now, will fix in separate PR
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-09 11:55:04 -07:00
Derek Collison
cd7f16f28a Tweak timing for test to prevent flapping
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-09 11:13:49 -07:00
Derek Collison
331c2faaa6 When using a stream import for a push consumer's messages, if the message crossed a route we dropped the delivered subject.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-09 06:42:22 -07:00
Derek Collison
3663d595fc Disallow moving a stream that is already being moved
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-07 17:09:55 -07:00
Matthias Hanel
5662141932 Adding unique_tag to ensure matching tags are not used twice (#3011)
Allows to not place a stream in the same availability zone twice.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-07 18:11:00 -04:00
Ivan Kozlovic
47776bdd36 Merge pull request #3013 from nats-io/ln_min_version
[ADDED] LeafNode `min_version` new option
2022-04-07 13:57:29 -06:00
Ivan Kozlovic
b5c9583ee2 Reject configuration with value below 2.8.0
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-07 12:49:34 -06:00
Matthias Hanel
f4c2302301 fix sleep in unit test to ensure updates have propagated (#3012)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-07 12:16:18 -04:00
Ivan Kozlovic
7fa2676353 Fixed comment typos and some rewording
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-07 09:22:51 -06:00
Ivan Kozlovic
9e6f965913 [ADDED] LeafNode min_version new option
If set, a server configured to accept leafnode connections will
reject a remote server whose version is below that value. Note
that servers prior to v2.8.0 are not sending their version
in the CONNECT protocol, which means that anything below 2.8.0
would be rejected.

Configuration example:
```
leafnodes {
    port: 7422
    min_version: 2.8.0
}
```
The option is a string and can have the "v" prefix:
```
min_version: "v2.9.1"
```
Note that although suffix such as `-beta` would be accepted,
only the major, minor and update are used for the version comparison.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-06 18:40:33 -06:00
Ivan Kozlovic
c78f7f343c Add test that demonstrated the consumer filter perf degradation
This is a follow up to PR #3008.

This test fails on v2.7.4 but passes on main.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-06 09:27:56 -06:00
Ivan Kozlovic
1691e9aaf6 Bump version to 2.8.0-beta.12
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-05 20:48:22 -06:00
Derek Collison
ef9728997d During recovery check our guess on the last block.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-05 19:20:31 -07:00
Derek Collison
ab5e2344e0 When loading blocks in use len(mb.fss) to determine if we can use sfilter optimization.
Also check fs.lmb when the stream config is updated.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-05 18:49:21 -07:00
Ivan Kozlovic
371ce36712 [IMPROVED] Stream with multiple subjects and consumer with filter
This is more of a regression introduced in v2.7.3 (with PR #2848).
When the store has a list of subjects, finding the next message
to deliver would go through the subjects map and have to match
to find out if it is a subset (if the filter had a wildcard).
In situations where there were lots of subjects (for instance 1
message per subject), but the consumer did not filter on anything
specific, then this processing was becoming slow.

We now check that if the stream has a single subject (even with
wildcard) and the consumer filters on that exact subject, then
we can do a linear scan. We also do a linear scan if the number
of messages in the block is 1/2 the number of subjects in the
subjects map.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-05 18:19:17 -06:00
Matthias Hanel
2db7d9fe2f unit test to make sure tiered limits and stream moves work together (#3007)
This needs testing because stream move adjusts the replication factor

Because adjusting replication factor and moving is illegal, this case
does not need to be tested

In order to support one off configurations, added same modification
callout to super cluster as is used with cluster

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-05 18:11:04 -04:00
Matthias Hanel
d9da66d67e returns -1 for new unlimited/unset limits and tests/fixes info counts (#3002)
iterates on tiered limits

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-05 12:25:55 -04:00
Ivan Kozlovic
5f4f813c53 Bump to 2.8.0-beta.11
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-05 09:19:13 -06:00
Ivan Kozlovic
9b5797f63c Undo sending bad request on no-interest in apiDispatch
This broke cross-account functionality. Ported the test from the
Go client that showed the failure after PR#2997 was merged.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-05 08:51:28 -06:00
Derek Collison
92813d7370 Bump to 2.8.0-beta.10
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-04 19:15:05 -07:00
Derek Collison
8e6764dfb3 Merge pull request #3001 from nats-io/asset-move
Allow streams and consumers to migrate between clusters.
2022-04-04 19:13:40 -07:00
Derek Collison
7e38ebcb6e Allow assets such as streams and their associated consumers to migrate between clusters.
The system will allow an update to a stream, and subsequently all attached consumers, to be placed in another cluster either directly or via tag placement.
The meta layer will scale the underlying peerset appropriately to straddle the two clusters for both the stream and consumers, taking into account the consumer type.
Control will then pass to the current leaders of the assets who will monitor the catchup status of the new peers.
(Note we can optimize this later to only traverse once across a GW for any given asset, but for now this is simpler)
Once the original leaders have determined the assets are synched it will pass leadership to a member of the new peerset.
Once the new leader has been elected, it will forward a request for the meta layer to shrink the peerset by removing the old peers.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-04 18:28:36 -07:00
Matthias Hanel
268a29e719 fix unit test that did not fail when header was modified (#3000)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-04 19:24:09 -04:00
Matthias Hanel
569328bc18 fix crash on shutdown with meta being nil (#2999)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-04 17:40:19 -04:00
Matthias Hanel
b7bc842c8b Add a config modification callback to createJetStreamCluster (#2998)
* Add a config modification callback to createJetStreamCluster

named createJetStreamClusterAndModHook allowing the generated config to
be altered prior to server start

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-04 17:39:58 -04:00
Ivan Kozlovic
29ea280fe7 [FIXED] JetStream: send "bad request" response for malformed API requests
An example was a "consumer info" request with a consumer name that
had tokens, which is illegal. This results in the request being
dropped in apiDispatch() because there was no interest.
The server will now return a "bad request" error in such case.

Resolves #2995

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-04 11:25:55 -06:00
Ivan Kozlovic
14f54b8dd7 [ADDED] Monitoring: MQTT and Websocket blocks in /varz endpoint
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-04 10:11:55 -06:00
Ivan Kozlovic
366d217f44 Some changes based on review
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-01 17:55:33 -06:00