Commit Graph

1124 Commits

Author SHA1 Message Date
Ivan Kozlovic
2d4c3dd38f Added logging of account interest mode switch for gateways
Both sides will log when an account is switched to interest-only
mode. There are 2 traces (start/complete) per account.
They are logged at [INF] level.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-28 14:55:45 -06:00
Ivan Kozlovic
5478eaf01e Added /gatewayz endpoint
Such endpoint will list the gateway/cluster name, address and port
then list of outbound/inbound connections.
For each remote gateway there will be at most one outbound connection.
There can be 0 or more inbound connections for the same remote
gateway.

For each of these outbound/inbound connection, the connection info
similar to Connz is reported. Optionally, one can include the
interest mode/stats for each account.

Here are possible options:

* No specific options

http://host:port/gatewayz

* Limit to specific remote gateway, say name "B":

http://host:port/gatewayz/gw_name=B

* Include accounts (default limit to 1024 accounts)

http://host:port/gatewayz/accs=1

* Specific limit, say 200 (note accs=1 in this case is optional)

http://host:port/gatewayz/accs=1&accs_limit=200

* Specific account, say "acc_1". Note that accs=1 is not required then

http://host:port/gatewayz/acc_name=acc_1

* Above options can be mixed: specific remote gateway (B), with 100
  accounts reported

http://host:port/gatewayz/gw_name=B&accs_limit=200

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-28 12:41:09 -06:00
Ivan Kozlovic
4ed08dde07 Merge pull request #1013 from nats-io/fix_gw_qinterest_loss
Fixed loss of queue subscription interest across Gateways in some cases
2019-05-26 18:23:06 -06:00
Ivan Kozlovic
ce1e6defab Fix flappers
- TestSystemAccountConnectionUpdatesStopAfterNoLocal: I believe that
  the check on number of notifications was wrong. Since we did not
  consume the ones for the connect, the expected count after the
  disconnect is 8 instead of 4.

- Possible fix GW tests complaining about number of outbound/inbound
  I think that it may be possible that connection does not succeed
  right away (remote to fully started, etc) and due to dial timeout
  and reconnect attempt delay, I suspect that when given a max time
  of 1sec to complete, it may not be enough.
  Quick change for now is to override to 2secs for now in the
  wait helpers. If that proves conclusive, we could remove the
  timeout given to these helpers.

- TestGatewaySendAllSubsBadProtocol: used a t.Fatalf() in checkFor
  instead of return fmt.Errorf().

- TestLeafNodeResetsMSGProto: this test is not about change to
  interest mode only, so to avoid possible mix of protos, delay
  a bit creation of gateway after creation of leaf node.

- Some defer s.Shutdown() were missing

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-26 17:17:08 -06:00
Ivan Kozlovic
b325cf1e4a Fixed loss of queue subscription interest across Gateways in some cases
Suppose two servers, SA in cluster A and SB in cluster B. If SA
sends a message to SB on an account for which there is no interest
at all (account not known or no subscription), SB will send an A-
and keep track that it sent an A- for this account.

When a queue subscription is created on SB, SB will send and RS+
to A because A needs to have perfect knowledge of all queue subs
in all clusters.

If then a regular subscription is also created on SB, SB will
think that it needs to send an A+ because it had sent an A- for
this account. However, SA had an entry for this account for the
queue sub. The A+ would clear the entry in the map and would cause
SA to not send messages to SB even if they would have been a
match for the queue sub on SB.

We fix this in two ways:
- Clear the possible A- in SB when sending an RS+ for queue sub
- Processing of A-/A+ to be aware of a possible entry in the map
  due to queue subs.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-25 16:27:00 -06:00
Ivan Kozlovic
a3996cbd29 Shorten help function name
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-24 14:21:56 -06:00
Ivan Kozlovic
55597a7e8b [ADDED] URLs to cluster{} in /varz and update of gateway ones
In varz's cluster{} section, there was no URLs field. This PR adds
it and displays the routes defined in the cluster{} config section.
The value gets updated should there be a config reload following
addition/removal of an url from "routes".

If config had 1 route to "nats://127.0.0.1:1234", here is what
it would look like now:
```
"cluster": {
    "addr": "0.0.0.0",
    "cluster_port": 6222,
    "auth_timeout": 1,
    "urls": [
      "127.0.0.1:1234"
    ]
  },
```
Adding route to "127.0.0.1:4567" and doing config reload:
```
"cluster": {
    "addr": "0.0.0.0",
    "cluster_port": 6222,
    "auth_timeout": 1,
    "urls": [
      "127.0.0.1:1234",
      "127.0.0.1:4567"
    ]
  },
```
Note that due to how we handle discovered servers in the cluster,
new urls dynamically discovered will not show in above output.
This could be done, but would need some changes in how we store
things (actually in this case, new urls are not stored, just
attempted to be connected. Once they connect, they would be visible
in /routez).

For gateways, however, this PR displays the combination of the
URLs defined in config and the ones that are discovered after
a connection is made to a give cluster. So say cluster A has a single
url to one server in cluster B, when connecting to that server,
the server on A will get the list of the gateway URLs that one
can connect to, and these will be reflected in /varz. So this is
a different behavior that for routes. As explained above, we could
harmonize the behavior in a future PR.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-24 13:42:41 -06:00
Ivan Kozlovic
48c3f7f846 Fixed some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-24 09:53:35 -06:00
Ivan Kozlovic
97ee89cc67 Check inbound GW connection connected state in parser
If the first protocol for an inbound gateway connection is not
CONNECT, reject with auth violation.

Fixes #1006

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-22 12:31:16 -06:00
Derek Collison
933f5d0df4 Add in TestGatewayServiceExportWithWildcards
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-21 15:28:42 -07:00
Derek Collison
67bb08af8b Fixes for a few flappers.
TestJWTAccountImportActivationExpires
TestGatewayServiceImportWithQueue

Signed-off-by: Derek Collison <derek@nats.io>
2019-05-21 15:12:31 -07:00
Derek Collison
ecfd1a2c85 Max flapper less so
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-21 11:59:34 -07:00
Derek Collison
8a614b49e1 Fix for reload race on global account
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-21 11:37:15 -07:00
Ivan Kozlovic
1cdc3eb41f Better randomize solicited Gateway URLs
Shuffle the array created when iterating through the gateways URLs
map since map iteration may not be well randomized with small maps.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-21 09:28:59 -06:00
Ivan Kozlovic
7272e4e317 Make the error report attempts configurable
This is a continuation of #1000. Added a configuration to specify
the number of attempts at which the repeated error is reported.
The algo is now to print only the 1st attempt and when current
attempt % <this config param> == 0.

Resolves #969

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-20 16:28:48 -06:00
Ivan Kozlovic
03930ba0e4 [UPDATED] Reduce report of failed connection attempts
This applies to routes, gateways and leaf node connections.
The failed attempts will be printed at the first, after the first
minute and then every hour.
The connect/error statements now include the attempt number.

Note that in debug mode, all attempts are traced, so you may get
double trace (one for debug, one for info/error).

Resolves #969

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-20 10:13:56 -06:00
Ivan Kozlovic
1eff7bc112 Fixed gateway test race report
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-13 11:49:29 -06:00
Derek Collison
d7140a0fd1 Update for client rename
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-10 15:11:30 -07:00
Derek Collison
da9120345c RC version bump [ci skip]
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-09 18:13:31 -07:00
Derek Collison
034a9fd1e4 Merge pull request #990 from nats-io/smap
Optimize updates for leaf node smaps.
2019-05-09 18:12:29 -07:00
Derek Collison
042e5a539a Optimize updates for leaf node smaps.
Previously we would walk all clients bound to an account to
collect the leaf nodes for updating of the subscription maps.

Signed-off-by: Derek Collison <derek@nats.io>
2019-05-09 17:25:17 -07:00
Ivan Kozlovic
c014211318 [FIXED] Changes to Varz content and fixed race conditions
----------------------------------------------------------------
Backward-incompatibility note:

Varz used to embed *Info and *Options which are other server objects.
However, Info is a struct that servers used to send protocols to other
servers or clients and its content must contain json tags since we
need to marshal those to be sent over. The problem is that it made
those fields now accessible to users calling Varz() and also visible
to the http /varz output. Some fields in Info were introduced in the
2.0 branch that clashed with json tag in Options, which made cluster{}
for instance disappear in the /varz output - because a Cluster string
in Info has the same json tag, and Cluster in Info is empty in some
cases.
For users that embed NATS and were using Server.Varz() directly,
without the use of the monitoring endpoint, they were then given
access (which was not the intent) to server internals (Info and Options).
Fields that were in Info or Options or directly in Varz that did not
clash with each other could be referenced directly, for instace, this
is you could access the server ID:

v, _ := s.Varz(nil)
fmt.Println(v.ID)

Another way would be:

fmt.Println(v.Info.ID)

Same goes for fields that were brought from embedding the Options:

fmt.Println(v.MaxConn)

or

fmt.Println(v.Options.MaxConn)

We have decided to explicitly define fields in Varz, which means
that if you previously accessed fields through v.Info or v.Options,
you will have to update your code to use the corresponding field
directly: v.ID or v.MaxConn for instance.

So fields were also duplicated between Info/Options and Varz itself
so depending on which one your application was accessing, you may
have to update your code.
---------------------------------------------------------------

Other issues that have been fixed is races that were introduced
by the fact that the creation of a Varz object (pointing to
some server data) was done under server lock, but marshaling not
being done under that lock caused races.

The fact that object returned to user through Server.Varz() also
had references to server internal objects had to be fixed by
returning deep copy of those internal objects.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-09 14:33:04 -06:00
Derek Collison
6584a9a828 lint updates
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-06 15:41:38 -07:00
Derek Collison
60978531ee gofmt -s update
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-06 15:15:11 -07:00
Derek Collison
acfe372d63 Changes for rename from gnatsd -> nats-server
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-06 15:04:24 -07:00
Derek Collison
44b01299c4 Bump to RC11 [ci skip]
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-02 17:19:02 -07:00
Derek Collison
bed56ab9cc Make sure remotes send existing sub interest
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-02 17:05:02 -07:00
Derek Collison
7ebe283601 Bump to RC10 [ci skip]
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-02 15:46:04 -07:00
Derek Collison
dacf0a4e67 Merge pull request #980 from nats-io/leafupdates
Leafnode updates
2019-05-02 15:19:57 -07:00
Derek Collison
90211e5b39 Be safer on gw and sl access
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-02 15:14:47 -07:00
Ivan Kozlovic
5e01570ad4 Fixed failed configuration reload due to present of leafnode with TLS
We don't support reload of leafnode config yet, but we need to make
sure it does not fail the reload process if nothing has been changed.
(it would fail because TLSConfig internally do change in some cases)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-02 15:49:56 -06:00
Derek Collison
5292ec1598 Various fixes, init smap for leafnodes with gateways too
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-02 14:22:51 -07:00
Ivan Kozlovic
434501c3ed Fixed LeafNode failed to create TLS connection when CA needed
If server solicits leaf node TLS connection and needs to verify
the server certificate, it did not have the root CAs set in its
config.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-02 15:06:08 -06:00
Derek Collison
1c8d4b4b6e Make sure we are set to RMSG for send to Gateways
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-01 15:31:54 -07:00
Derek Collison
1d736ccc61 Make sure we use correct MSG prefix when mixing between leafnodes and routes.
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-01 15:08:20 -07:00
Ivan Kozlovic
dce9d672c1 Fixed panic with leafnode and gateway when no interest registered
Say there are 2 clusters, A and B. A client connects to A and
publishes messages on an account that B has no interest in.
Then a leaf node server connects to B (using same account than
the no-interest is for). Cluster B will ask cluster
A to switch to interest mode only for leaf node account. This
would cause a panic.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-01 13:40:17 -06:00
Ivan Kozlovic
de46bf5470 Pre-Release 2.0.0-RC8
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-01 11:37:48 -06:00
Waldemar Quevedo
984a59a6b0 Update json tag used for leaf node option 2019-04-30 12:09:44 -07:00
Derek Collison
2d0abd66af Bump to RC7, remove conditional panic
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-25 17:09:53 -07:00
Derek Collison
17839518de Updates based on PR feedback
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-25 15:47:35 -07:00
Derek Collison
2ec3eaeaa9 Leafnode account based connections limits
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-25 14:40:59 -07:00
Derek Collison
26929d3e4b Fixed description
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-23 18:56:27 -07:00
Derek Collison
f320f318b7 Fixed merge conflict
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-23 17:28:42 -07:00
Derek Collison
bfe83aff81 Make account lookup faster with sync.Map
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-23 17:13:23 -07:00
Ivan Kozlovic
9f497a6cd4 Revert to use Sublist but use the SublistNoCache version.
Remove sub from rsubs sublist when user UNSUBs.

Fix bench test that was not actually creating a SUB per request
in the Benchmark_Gateways_Requests_CreateOneSubForEach test.
Also UNSUBs older SUBs after a certain threshold to simulate
actual req/reply.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-04-23 14:13:13 -06:00
Ivan Kozlovic
41436fb787 Updates based on comments
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-04-22 20:00:21 -06:00
Ivan Kozlovic
bb4e8ae0f9 Gateways: Fix race for request reply
This addresses the following race:
- client connection creates a subscription on a reply subject
- client connection sends a request
- server sends the subscription to inbound gateway
- server sends the message to outbound gateway (those may be
  to different servers)
- receiving server sends to sub interested in request subject
- app sends reply
- its server then check for interest on the reply's subject

In interestOnly mode, there is a possibility that this server
has not received the interest on the reply subject yet and would
then drop the reply.

This PR detects above scenario and will prefix the reply subject
to identify the origin cluster if it is detected that the last
subscription from the sending connection was created less than
a second ago.
Once the destination has this prefix, the destination cluster
will always send back that message to origin cluster even if
there is no registered interest.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-04-22 20:00:21 -06:00
Derek Collison
2a7b2a9578 Merge pull request #967 from nats-io/nocache
Allow sublist cache to be disabled
2019-04-22 18:42:52 -07:00
Derek Collison
4ccfef004c Update for Ivan's suggestion on just checking s.cache since we have read lock
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-22 18:35:24 -07:00
Derek Collison
da2dab92d1 Allow disabling of shared cache with new constuctor. Also share empty results.
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-22 17:53:14 -07:00