This changes when we are sending internal messages through the shared internal sendq but to a different account.
We will now use an internal client that is only accessible to the send loop.
Signed-off-by: Derek Collison <derek@nats.io>
Underestimated the effort to get stream restore working properly in cluster mode.
Some good bug fixes and stability improvments.
Signed-off-by: Derek Collison <derek@nats.io>
I noticed that some consumer go routines were left running at the end
of the test suite.
It turns out that there was a race the way the consumer's qch was closed.
Since it was closed and then set to nil, it is possible that the go
routines that are started and then try to capture o.qch would actually
get qch==nil, wich then when doing a select on that nil channel would
block forever.
So we know pass the qch to the 2 go routines loopAndGatherMsgs() and
loopAndDeliverMsgs() so that when we close the channel there is
no risk of that race happening.
I do believe that there is still something that should be looked at:
it seems that a consumer's delivery loop can now be started/stopped
many times based on leadership acquired/lost. If that is the case,
I think that the consumer should wait for previous go routine to
complete before trying to start new ones.
Also moved 3 JetStream tests to the test/norace_test.go file because
they would consumer several GB of memory when running with the -race flag.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Since we were creating subs on the fly, sub.im would always be nil.
We were passing a client because it was needed in sendRouteSubOrUnSubProtos().
This PR simply fills the buffer with each account's subscriptions.
There is also no need to have subs sent from different go routine
based on some threshold. Routes are no longer subject to max pending.
Some code has been made into a function so that they can be shared
by sendSubsToRoute() and sendRouteSubOrUnSubProtos(). The function
is simply adding to given buffer the RS+/- protocol.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Was incorrectly resetting term and exiting on failed vote. Also was not properly stepping down when we were a candidate and saw an entry from a leader.
Signed-off-by: Derek Collison <derek@nats.io>
Those changes are required to maintain backward compatibility.
Since the replies are "_G_.<gateway name hash>.<server ID hash>"
and the hash were 6 characters long, changing to 8 the hash function
would break things.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The LeafNode connect protocol's Name field had json tag "name" but
was changed to "server_name" in the JetStream cluster branch.
Changing it back to "name" to not have to deal with different
places where to get the name from.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The insert notification was done based on the creation of a node
during an insert, which was wrong since the node may have already
existed and still the subscription could be all new. For instance,
suppose that there is a subscription on "foo.bar".
We register an notification interest for "foo", which does not
notify, which is normal. Then we create a subscription on "foo".
During the insert, "foo" node already exists so notification would
not be sent, but it should.
Fixed also removed by having removeFromNode() returning a boolean
to indicate if the subscription was the last in that node.
However, it seems that we again check for interest in
chkForRemoveNotification(), so not sure if that is required.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is how it was up to v2.1.2 included (changed in v2.1.4 onward).
I added a benchmark that has 3 subscribers running and increase
the number of publishers: 1, 2, 5 and 10. This is the comparison
between the pre-PR and post-PR:
```
benchcmp old.txt new.txt
benchmark old ns/op new ns/op delta
Benchmark___BumpPubCount_1x3-16 396 385 -2.78%
Benchmark___BumpPubCount_2x3-16 495 406 -17.98%
Benchmark___BumpPubCount_5x3-16 542 395 -27.12%
Benchmark__BumpPubCount_10x3-16 549 515 -6.19%
benchmark old MB/s new MB/s speedup
Benchmark___BumpPubCount_1x3-16 717.27 737.54 1.03x
Benchmark___BumpPubCount_2x3-16 574.31 699.02 1.22x
Benchmark___BumpPubCount_5x3-16 524.35 718.80 1.37x
Benchmark__BumpPubCount_10x3-16 517.26 551.53 1.07x
```
It is inline with what the user reported, seeing a 20% drop in performance
when going from 1 publisher to 2. But, as we can see, the difference
between go channel and cond variable reduces with the increased number
of publishers after a certain number.
I am not sure of the performance impact on other situations, so this
PR is more of a proposal than a fix.
Resolves#1786
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>