- Save the TLS name only if not already set
- Use the passed URLs slice instead of using s.getOpts().Routes
- Enhanced the test
- Fixed an unrelated DATA RACE report
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The server was not setting "server name" in the TLS configuration
for route connections, which may lead to failed (re)connect if
the certificate does not allow for the IP and the URL did not
have the hostname, which would happen with gossip protocol.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
"meta_cluster": {
"name": "local",
"leader": "A",
"peer": "NUmM6cRx",
"replicas": [
{
"name": "B",
"current": true,
"active": 690369000,
"peer": "b2oh2L6w"
},
{
"name": "Server name unknown at this time (peerID: jZ6RvVRH)",
"current": false,
"offline": true,
"active": 0,
"peer": "jZ6RvVRH"
}
],
"cluster_size": 3
}
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
A server maintains a map for the subject+queue to know the number
of members on the same group. However, on unsubscribe when we get
to the last one being unsubscribed, we were removing from the map
but then unfortunately adding back with a value of 0, which caused
a leak. If the same subscription was coming back, then this map
entry would be reused, but if it is a never coming back queue sub,
then memory could increase continously.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Added Start, LastActivity, Uptime and Idle that we normally have
in a Connz for non route connections. This info can be useful
to determine if a route is recent, etc..
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Gateway connection will be closed and error reported if a remote
has a name that is a duplicate of the local cluster.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This was introduced when fixing #2881. The call to setFirstPingTimer
needed to be done under the client's lock.
Moved setFirstPingTimer from a server receiver to a client receiver.
The only reason it was a server receiver is because we need the
server options, but c.srv is always set when invoking this function,
so we will get the server from c.srv in that function now.
Related to #2881
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
We will only send if all peers in our group are >= 2.7.1 and we will check for updates.
When a consumer follower takes over it will notify all pending requests that those requests are invalid now.
Signed-off-by: Derek Collison <derek@nats.io>
This allows stream placement to overflow to adjacent clusters.
We also do more balanced placement based on resources (store or mem). We can continue to expand this as well.
We also introduce an account requirement that stream configs contain a MaxBytes value.
We now track account limits and server limits more distinctly, and do not reserver server resources based on account limits themselves.
Signed-off-by: Derek Collison <derek@nats.io>
- When detecting duplicate route, it was possible that a server
would lose track of the peer's gateway URL, which would prevent
it from gossiping that URL to inbound gateway connections
- When a server has gateways enabled and has as a remote its
own gateway, the monitoring endpoint `/varz` would include it
but without the "urls" array.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is the reverse of the early work to have LNs extend a non-JS cluster.
Also have mixed mode tests as well.
Signed-off-by: Derek Collison <derek@nats.io>
* [fixed] jetstream unique server name requirement across domains
including domain in server info
adding check for cluster name in duplicate leaf node connection check
This does not address non unique domains in the same domain, say within
super cluster.
Signed-off-by: Matthias Hanel <mh@synadia.com>
* [changed] pinned certs to check the server connected to as well
on reload clients with removed pinned certs will be disconnected.
The check happens only on tls handshake now.
Signed-off-by: Matthias Hanel <mh@synadia.com>
This structure is used in ClientAuthentication, an interface
designed to let 3rd parties extend the authentication mechanisms
of the server
In order to allow those 3rd parties to create unit tests, mocks etc
we need to export this structure so it's accessible externally
Signed-off-by: R.I.Pienaar <rip@devco.net>
In a setup with a cluster of servers to which 2 different leaf nodes
attach to, and queue subs are attached to one of the leaf, if the
leaf server is restarted and reconnects to another server in the
cluster, there was a risk for an infinite message loop between
some servers in the "hub" cluster.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This could happen when a leafnode has permissions set and another
connection (client, etc..) is about to assign a message to the
leafnode while the leafnode itself is receiving messages and they
both check permissions at the same time.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Currently, we use ReadyForConnections in server tests to wait for the
server to be ready. However, when this fails we don't get a clue about
why it failed.
This change adds a new unexported method called readyForConnections that
returns an error describing which check failed. The exported
ReadyForConnections version works exactly as before. The unexported
version gets used in internal tests only.
1. When in mixed mode and only running the global account we now will check the account for JS.
2. Added code to decrease the cluster set size if we guessed wrong in mixed mode setup.
Signed-off-by: Derek Collison <derek@nats.io>
If leafnodes from a cluster were to reconnect to a server in
a different cluster, it was possible for that server to send
to the leafnodes some their own subscriptions that could cause
an inproper loop detection error.
There was also a defect that would cause subscriptions over route
for leafnode subscriptions to be registered under the wrong key,
which would lead to those subscriptions not being properly removed
on route disconnect.
Finally, during route disconnect, the leafnodes map was not updated.
This PR fixes that too.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Took the approach of storing struct instead of pointer. Of course,
when changing the offline bool from false to true, it means that
we need to call Store again (with same key).
This is based on the assumption that those Load/Store are not too
frequent. Otherwise, we may need to use locking (and keep *nodeInfo)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This allows metacontrollers to span superclusters. Also includes placement directives for streams. By default they select the request origin cluster.
Signed-off-by: Derek Collison <derek@nats.io>
Added two options in the remote leaf node configuration
- compress, for websocket only at the moment
- ws_masking, to force remote leafnode connections to mask websocket
frames (default is no masking since it is communication between
server to server)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Since we were creating subs on the fly, sub.im would always be nil.
We were passing a client because it was needed in sendRouteSubOrUnSubProtos().
This PR simply fills the buffer with each account's subscriptions.
There is also no need to have subs sent from different go routine
based on some threshold. Routes are no longer subject to max pending.
Some code has been made into a function so that they can be shared
by sendSubsToRoute() and sendRouteSubOrUnSubProtos(). The function
is simply adding to given buffer the RS+/- protocol.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Those changes are required to maintain backward compatibility.
Since the replies are "_G_.<gateway name hash>.<server ID hash>"
and the hash were 6 characters long, changing to 8 the hash function
would break things.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When cluster origin code was added, a server may send LS+ with
an origin cluster name in the protocol. Parsing code from a ROUTER
connection was adjusted to understand this LS+ protocol.
However, the server was also sending an LS- with origin but the
parsing code was not able to understand that. When the unsub was
for a queue subscription, this would cause the parser to error out
and close the route connection.
This PR sends an LS- without the origin in this case (so that tracing
makes sense in term of LS+/LS- sent to a route). The receiving side
then traces appropriate LS- but processes as a normal RS-.
Resolves#1751
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Returned imports/exports are formated like jwt exports imports, even if
they originating account is from config.
Fixes#1604
Signed-off-by: Matthias Hanel <mh@synadia.com>
We previously simply called DialTimeout() on a route's url when
soliciting. If it resolved to the IP of the host, it would create
a route to self, which server detects, but then would not try again
with other IPs that would have allowed to form a cluster with
other servers running on the other IPs.
This PR keeps track of local IPs + cluster port and exclude them
from the list of IPs returned by LookupHost API. This even prevent
solicitation of routes to self. Only non-local IPs will be tried.
Resolves#1586
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Inhibit Go's default TCP keepalive settings for NATS
Go 1.13 changed the semantics of the tuning parameters for TCP keepalives, including the default value. This affects all TCP listeners. The NATS protocol has its own L7 keepalive system (PING/PONG) and the Go defaults are not a good fit for some valid deployment scenarios, while Go doesn't directly expose a working API for tuning these.
Rather than add a configuration knob and pull in another dependency (with portability issues) just disable TCP keepalives for all listeners used for speaking the NATS protocol.
Change the tests so we test the same logic. Do not change HTTP monitoring, profiling, or the websocket API listeners.
Change KeepAlive on client connections too.
This change allows the removal of the connection and update of
the server state to be done "in place" but still delay the flushing
of and close of tcp connection to the writeLoop. With ref counting
we ensure that the reconnect happens after the flushing but not
before the state has been updated.
Had to fix some places where we may have called closeConnection()
from under the server lock since it now would deadlock for sure.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
We cannot call c.closeConnection() under the server lock because
closeConnection() can invoke server lock in some cases.
Created a test that should run without `-race` to reproduce the deadlock
(which it does) but sometimes would fail because cluster would not be
formed. This unconvered an issue with conflict resolution which
test TestRouteClusterNameConflictBetweenStaticAndDynamic() can reproduce
easily. The issue was that we were not updating a dynamic name with
the remote if the remote was non dynamic.
Resolves#1543
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Resolves#1532
Instead of the fetched account we create a dummy account that is
expired. Any client connecting will trigger a fetch of the actual
account jwt.
This also avoids one fetch, thus the unit test was changed to reflect
this.
Unlike other resolver the memory resolver does not depend on external
systems. It is purely based on server configuration. Therefore, fetch
can be done and not finding an account means there is a configuration issue.
If some servers in the cluster have the same connect URLs (due
to the use of client advertise), then it would be possible to
have a server sends the connect_urls INFO update to clients with
missing URLs.
Resolves#1515
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The call sendProtoNow() should not normally be used (only when
setting up a connection when the writeloop is not yet started and
server needs to send something before being able to start the
writeLoop.
Instead, code should use enqueueProto(). For this particular
case though, use queueOutbound() directly and add to the
producer's pcd map.
Also fixed other places where we were using queueOutbound() +
flushSignal() which is what enqueueProto is doing.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>