Commit Graph

2189 Commits

Author SHA1 Message Date
Derek Collison
7c0b6faf2c We were having issues with the account being changed for the internal system client.
This changes when we are sending internal messages through the shared internal sendq but to a different account.
We will now use an internal client that is only accessible to the send loop.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-20 11:58:31 -08:00
Derek Collison
55d750733b Fix based on feedback from Ivan
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-20 11:58:31 -08:00
Derek Collison
fed4c0cce0 Race detector cacthes this now with 1.15.7
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-20 11:58:31 -08:00
Derek Collison
ff54c9dc9c Reworked snapshot and restore.
Underestimated the effort to get stream restore working properly in cluster mode.
Some good bug fixes and stability improvments.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-20 11:58:31 -08:00
Derek Collison
2e9545d587 Make snapshot API cluster aware
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-20 11:58:31 -08:00
Derek Collison
a1730f1b31 Report on RAFT group information.
This adds in optional reporting to stream and consumer info when running in clsutered mode.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-20 11:58:31 -08:00
Ivan Kozlovic
f5df209022 Fixed SIGSEGV when sending update for unknown stream
Will now return an error that the stream is unknown.

Resolves #1827

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-20 12:42:14 -07:00
Ivan Kozlovic
7d1a4778b8 Merge pull request #1826 from nats-io/fix_consumer_loop_delivery_exit
Fix stop of consumer's delivery loop
2021-01-20 10:34:57 -07:00
Ivan Kozlovic
a1f0117474 Fixed consumer sending to nil channel on shutdown/leader change.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-20 10:05:15 -07:00
Ivan Kozlovic
c4a284b58f Fix stop of consumer's delivery loop
I noticed that some consumer go routines were left running at the end
of the test suite.
It turns out that there was a race the way the consumer's qch was closed.
Since it was closed and then set to nil, it is possible that the go
routines that are started and then try to capture o.qch would actually
get qch==nil, wich then when doing a select on that nil channel would
block forever.

So we know pass the qch to the 2 go routines loopAndGatherMsgs() and
loopAndDeliverMsgs() so that when we close the channel there is
no risk of that race happening.

I do believe that there is still something that should be looked at:
it seems that a consumer's delivery loop can now be started/stopped
many times based on leadership acquired/lost. If that is the case,
I think that the consumer should wait for previous go routine to
complete before trying to start new ones.

Also moved 3 JetStream tests to the test/norace_test.go file because
they would consumer several GB of memory when running with the -race flag.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-19 17:39:32 -07:00
Ivan Kozlovic
42dcdd2eb2 Simplify sendSubsToRoute()
Since we were creating subs on the fly, sub.im would always be nil.
We were passing a client because it was needed in sendRouteSubOrUnSubProtos().

This PR simply fills the buffer with each account's subscriptions.
There is also no need to have subs sent from different go routine
based on some threshold. Routes are no longer subject to max pending.

Some code has been made into a function so that they can be shared
by sendSubsToRoute() and sendRouteSubOrUnSubProtos(). The function
is simply adding to given buffer the RS+/- protocol.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-19 14:01:43 -07:00
Derek Collison
e4bf3767f2 Only send if we deleted properly
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-17 13:48:40 -08:00
Derek Collison
78747b2414 Stability improvements around startup and restore.
We were incorrectly starting clustering before enabling accounts and restoring state.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-17 13:44:49 -08:00
Derek Collison
d653627e7a Stabilty improvements for split votes.
Was incorrectly resetting term and exiting on failed vote. Also was not properly stepping down when we were a candidate and saw an entry from a leader.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-17 13:34:56 -08:00
Derek Collison
5479a8e867 Fix for segfault
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-17 13:19:11 -08:00
Derek Collison
8d568d41e4 Copy off before starting Go routine
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-17 13:05:25 -08:00
Derek Collison
fe9e45bbd2 Updates based on PR comments
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-17 12:08:35 -08:00
Derek Collison
a18a6803c1 Added support for stream and consumer lists.
This utilizes a scatter and gather approach.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-16 12:42:45 -08:00
Derek Collison
cb69df7118 Add proper support for stream update
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-16 06:29:37 -08:00
Derek Collison
b606dceb59 Stabilize restart/catchup for raft.
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-16 05:47:48 -08:00
Derek Collison
367d000314 Merge pull request #1815 from nats-io/jscfix
Routes send subscriptions by utilizing random clients from an account.
2021-01-15 18:26:38 -07:00
Ivan Kozlovic
de1bf362b1 Merge pull request #1816 from nats-io/fix_gw_hash
Fixed gateway reply mapping following changes in JetStream clustering
2021-01-15 18:22:18 -07:00
Ivan Kozlovic
1874964498 Merge pull request #1812 from nats-io/leafnode_fixes
Fixed some leafnode issues introduced from JS cluster work
2021-01-15 18:22:02 -07:00
Derek Collison
754e31a3bc Routes send subscriptions by utilizing random clients from an account.
There was a bug where the client chosen under the $SYS account could have a different account.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-15 17:02:38 -08:00
Ivan Kozlovic
ef38abe75b Fixed gateway reply mapping following changes in JetStream clustering
Those changes are required to maintain backward compatibility.
Since the replies are "_G_.<gateway name hash>.<server ID hash>"
and the hash were 6 characters long, changing to 8 the hash function
would break things.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-15 17:32:04 -07:00
Ivan Kozlovic
c9bba7d1e3 Change back "server_name" to "name" for backward compatibility
The LeafNode connect protocol's Name field had json tag "name" but
was changed to "server_name" in the JetStream cluster branch.
Changing it back to "name" to not have to deal with different
places where to get the name from.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-15 14:00:21 -07:00
Derek Collison
7286033a6f Merge pull request #1813 from nats-io/fixes
Fixes for data races
2021-01-15 12:18:00 -07:00
Ivan Kozlovic
0d78bce9cf Fixed some leafnode issues introduced from JS cluster work
Also fixed a flapper.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-15 12:00:34 -07:00
Ryota
91a1d9a556 Update error message with correct config value 2021-01-15 13:18:31 +00:00
Ivan Kozlovic
6f8285b1f0 Merge pull request #1806 from nats-io/latency-sharing
Fixing latency sharing which was overwritten
2021-01-14 16:57:15 -07:00
Matthias Hanel
f1af382929 Fixing latency sharing which was overwritten
Also adjusting unit test to not check for renamed values

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-01-14 18:47:44 -05:00
Ivan Kozlovic
343968067c Merge pull request #1805 from nats-io/scoped-signing-keys
[added] enforcement and usage of scoped signing keys
2021-01-14 15:24:28 -07:00
Matthias Hanel
2cb5f1b391 Fix flapping unit test and incorporate more review comments
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-01-14 16:59:57 -05:00
Derek Collison
4b84decc7f Fix for race
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 12:59:12 -08:00
Derek Collison
36f0dd5881 Fix data race
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 12:25:42 -08:00
Matthias Hanel
c14076b13f Incorporating review comments
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-01-14 15:15:20 -05:00
Matthias Hanel
2edd883a6e [added] enforcement and usage of scoped signing keys
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-01-14 14:52:54 -05:00
Ivan Kozlovic
6c4229300a Fixed service import cycle detection that broke with JS clustering
Also added some no-op error handler for some tests to silence the
error report in the log.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-14 11:27:36 -07:00
Derek Collison
1b0e740123 Fix for race
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 07:28:14 -08:00
Derek Collison
9d9f0f7099 Fix for race on setting term
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 07:02:20 -08:00
Derek Collison
14e3319ba3 Fix for no raft node
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 06:59:36 -08:00
Derek Collison
ab2a645791 Fix for various flappers
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 06:54:08 -08:00
Derek Collison
b68d7066c4 Remove alpha banner
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 06:05:55 -08:00
Derek Collison
4bfe9d4c24 Fixes to PR.
Add nats to default storage directory
Fix race in raft, change leader notice
Fix test crash on failure

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 05:56:05 -08:00
Derek Collison
37cf7584bd Merge branch 'master' into jsc 2021-01-14 02:52:35 -07:00
Derek Collison
f0cdf89c61 JetStream Clustering WIP
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 01:14:52 -08:00
Matthias Hanel
9c2bf8e4a9 [Added] support for jwt export response threshold
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-01-14 01:12:35 -08:00
Ivan Kozlovic
23f8e3d5b9 Fixed sublist notification
The insert notification was done based on the creation of a node
during an insert, which was wrong since the node may have already
existed and still the subscription could be all new. For instance,
suppose that there is a subscription on "foo.bar".
We register an notification interest for "foo", which does not
notify, which is normal. Then we create a subscription on "foo".
During the insert, "foo" node already exists so notification would
not be sent, but it should.
Fixed also removed by having removeFromNode() returning a boolean
to indicate if the subscription was the last in that node.
However, it seems that we again check for interest in
chkForRemoveNotification(), so not sure if that is required.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-13 20:59:36 -07:00
Ivan Kozlovic
7b116379cb Propose going back to condition variable to notify writeLoop
This is how it was up to v2.1.2 included (changed in v2.1.4 onward).
I added a benchmark that has 3 subscribers running and increase
the number of publishers: 1, 2, 5 and 10. This is the comparison
between the pre-PR and post-PR:

```
benchcmp old.txt new.txt
benchmark                           old ns/op     new ns/op     delta
Benchmark___BumpPubCount_1x3-16     396           385           -2.78%
Benchmark___BumpPubCount_2x3-16     495           406           -17.98%
Benchmark___BumpPubCount_5x3-16     542           395           -27.12%
Benchmark__BumpPubCount_10x3-16     549           515           -6.19%

benchmark                           old MB/s     new MB/s     speedup
Benchmark___BumpPubCount_1x3-16     717.27       737.54       1.03x
Benchmark___BumpPubCount_2x3-16     574.31       699.02       1.22x
Benchmark___BumpPubCount_5x3-16     524.35       718.80       1.37x
Benchmark__BumpPubCount_10x3-16     517.26       551.53       1.07x
```

It is inline with what the user reported, seeing a 20% drop in performance
when going from 1 publisher to 2. But, as we can see, the difference
between go channel and cond variable reduces with the increased number
of publishers after a certain number.

I am not sure of the performance impact on other situations, so this
PR is more of a proposal than a fix.

Resolves #1786

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-12 12:24:37 -07:00
Ivan Kozlovic
0d34688c4b Merge pull request #1800 from nats-io/fix_1799
[FIXED] Monitoring endpoint `connz?auth=true` show incorrect user
2021-01-11 14:28:28 -07:00