Commit Graph

1151 Commits

Author SHA1 Message Date
Ivan Kozlovic
6382ba8d77 Release v2.0.0-RC19
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-06-04 09:05:00 -06:00
Ivan Kozlovic
ed1901c792 Update go.mod to satisfy v2 requirements
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-06-03 19:45:47 -06:00
Derek Collison
2a8e630bf1 Fix for leafnode and dq selection over GWs
Signed-off-by: Derek Collison <derek@nats.io>
2019-06-01 16:43:54 -07:00
Derek Collison
adba6dc023 Add in leafnode bound account events for accounting
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-31 16:58:27 -07:00
Derek Collison
d246359dc8 Merge pull request #1028 from nats-io/leaf_gw_si
Bug fix for service import with leafnodes and gws
2019-05-31 11:29:33 -07:00
Derek Collison
3cf6f6a5d2 Bug fix for service import with leafnodes and gws
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-31 11:22:02 -07:00
Ivan Kozlovic
37f4e71246 Fixed race due to use of byte slice instead of string
The go routine that is started during interest mode switch was
using the accName (which was a byte slice) instead of account,
which was a string copy of that byte slice. It meant that when
printing the notice, the underlying buffer may have be overwriten
by the readloop.

Changing accName to a string - since we were doing a copy anyway,
better change it at the function param level.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-30 18:43:01 -06:00
Ivan Kozlovic
1c193452b1 Merge pull request #1026 from nats-io/switch_gw_to_interest_only_once
Switch gateway to InterestMode only once
2019-05-30 17:24:05 -06:00
Ivan Kozlovic
37b3546e7b Switch gateway to InterestMode only once
When a leafnode connection is created, the server forces all
gateway inbound connections to switch to InterestMode. Do this only
once, regardless of how many times the LN (re)connects.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-30 17:21:15 -06:00
Derek Collison
c188cb092c Merge pull request #1025 from nats-io/lf
Cleaned up logging for leafnodes
2019-05-30 16:04:36 -07:00
Derek Collison
257b670ae2 Cleaned up logging for leafnodes
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-30 15:53:14 -07:00
Ivan Kozlovic
eeffab25db Merge pull request #1024 from nats-io/allow_top_level_unknown_fields_in_process_cfg
Added a function to allow ignoring top-level unknown config option
2019-05-30 16:40:01 -06:00
Ivan Kozlovic
451d4bb05c Change name of public function
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-30 16:25:37 -06:00
Derek Collison
da938dcb1e Cleaned up debug and fixed test
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-30 14:30:35 -07:00
Ivan Kozlovic
437e16ca71 Added a function to allow ignoring top-level unknown config option
This will be required for NATS Streaming server since streaming
allows user to have NATS and Streaming specific options in same
file.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-30 15:24:39 -06:00
Derek Collison
14ade43da9 Fix for unix time flapper
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-30 10:34:38 -07:00
Derek Collison
42a7797a50 Add chunk and total bytes to slow consumer log
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-30 09:15:20 -07:00
Derek Collison
eb6689143d Merge pull request #1020 from nats-io/bug
Fix for reloadAuthorization bugs
2019-05-29 14:20:58 -07:00
Ivan Kozlovic
44554057d1 Merge pull request #1019 from nats-io/warn_readloop_busy
Print warning if code in readloop execute for more than threshold
2019-05-29 15:16:31 -06:00
Derek Collison
a94fe78c22 Check for resolver reload to no resolver
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-29 14:00:51 -07:00
Ivan Kozlovic
7f2620904c Fixed setting timer for account connection updates
The timer was not set with the proper variable, which caused the
check to always think that a new timer should be created, which
would lead to more and more timers being created which translated
to updates being sent more and more frequently.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-29 14:28:26 -06:00
Derek Collison
874f06a212 Fix bugs on reloadAuthorization
When tls is on routes it can cause reloadAuthorization to be called.
We were assuming configured accounts, but did not copy the remote map.
This copies the remote map when transferring for configured accounts
and also handles operator mode. In operator mode we leave the accounts
in place, and if we have a memory resolver we will remove accounts that
are not longer defined or have bad claims.

Signed-off-by: Derek Collison <derek@nats.io>
2019-05-29 13:19:58 -07:00
Ivan Kozlovic
33505e4849 Print warning if code in readloop execute for more than threshold
Issue a warning in readLoop if execution of code after connection
Read() until end of for loop reaches a certain threshold.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-28 18:33:48 -06:00
Derek Collison
bd589fb20c Warn on no random client
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-28 16:17:25 -07:00
Ivan Kozlovic
66f5325cee Merge pull request #1018 from nats-io/gw_log_interest_switch
Added logging of account interest mode switch for gateways
2019-05-28 15:33:06 -06:00
Ivan Kozlovic
f5991e8a2b Merge pull request #1015 from nats-io/restore_conn_error_default_attempts_to_one
Update to connect/reconnect error reports logic
2019-05-28 14:57:29 -06:00
Ivan Kozlovic
2d4c3dd38f Added logging of account interest mode switch for gateways
Both sides will log when an account is switched to interest-only
mode. There are 2 traces (start/complete) per account.
They are logged at [INF] level.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-28 14:55:45 -06:00
Ivan Kozlovic
5478eaf01e Added /gatewayz endpoint
Such endpoint will list the gateway/cluster name, address and port
then list of outbound/inbound connections.
For each remote gateway there will be at most one outbound connection.
There can be 0 or more inbound connections for the same remote
gateway.

For each of these outbound/inbound connection, the connection info
similar to Connz is reported. Optionally, one can include the
interest mode/stats for each account.

Here are possible options:

* No specific options

http://host:port/gatewayz

* Limit to specific remote gateway, say name "B":

http://host:port/gatewayz/gw_name=B

* Include accounts (default limit to 1024 accounts)

http://host:port/gatewayz/accs=1

* Specific limit, say 200 (note accs=1 in this case is optional)

http://host:port/gatewayz/accs=1&accs_limit=200

* Specific account, say "acc_1". Note that accs=1 is not required then

http://host:port/gatewayz/acc_name=acc_1

* Above options can be mixed: specific remote gateway (B), with 100
  accounts reported

http://host:port/gatewayz/gw_name=B&accs_limit=200

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-28 12:41:09 -06:00
Ivan Kozlovic
4ed08dde07 Merge pull request #1013 from nats-io/fix_gw_qinterest_loss
Fixed loss of queue subscription interest across Gateways in some cases
2019-05-26 18:23:06 -06:00
Ivan Kozlovic
d2578f9e05 Update to connect/reconnect error reports logic
Changed the introduced new option and added a new one. The idea
is to be able to differentiate between never connected and reconnected
event. The never connected situation will be logged at first attempt
and every hour (by default, configurable).
However, once connected and if trying to reconnect, will report every
attempts by default, but this is configurable too.

These two options are supported for config reload.

Related to #1000
Related to #1001
Resolves #969

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-26 17:51:01 -06:00
Ivan Kozlovic
ce1e6defab Fix flappers
- TestSystemAccountConnectionUpdatesStopAfterNoLocal: I believe that
  the check on number of notifications was wrong. Since we did not
  consume the ones for the connect, the expected count after the
  disconnect is 8 instead of 4.

- Possible fix GW tests complaining about number of outbound/inbound
  I think that it may be possible that connection does not succeed
  right away (remote to fully started, etc) and due to dial timeout
  and reconnect attempt delay, I suspect that when given a max time
  of 1sec to complete, it may not be enough.
  Quick change for now is to override to 2secs for now in the
  wait helpers. If that proves conclusive, we could remove the
  timeout given to these helpers.

- TestGatewaySendAllSubsBadProtocol: used a t.Fatalf() in checkFor
  instead of return fmt.Errorf().

- TestLeafNodeResetsMSGProto: this test is not about change to
  interest mode only, so to avoid possible mix of protos, delay
  a bit creation of gateway after creation of leaf node.

- Some defer s.Shutdown() were missing

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-26 17:17:08 -06:00
Ivan Kozlovic
b325cf1e4a Fixed loss of queue subscription interest across Gateways in some cases
Suppose two servers, SA in cluster A and SB in cluster B. If SA
sends a message to SB on an account for which there is no interest
at all (account not known or no subscription), SB will send an A-
and keep track that it sent an A- for this account.

When a queue subscription is created on SB, SB will send and RS+
to A because A needs to have perfect knowledge of all queue subs
in all clusters.

If then a regular subscription is also created on SB, SB will
think that it needs to send an A+ because it had sent an A- for
this account. However, SA had an entry for this account for the
queue sub. The A+ would clear the entry in the map and would cause
SA to not send messages to SB even if they would have been a
match for the queue sub on SB.

We fix this in two ways:
- Clear the possible A- in SB when sending an RS+ for queue sub
- Processing of A-/A+ to be aware of a possible entry in the map
  due to queue subs.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-25 16:27:00 -06:00
Ivan Kozlovic
a3996cbd29 Shorten help function name
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-24 14:21:56 -06:00
Ivan Kozlovic
55597a7e8b [ADDED] URLs to cluster{} in /varz and update of gateway ones
In varz's cluster{} section, there was no URLs field. This PR adds
it and displays the routes defined in the cluster{} config section.
The value gets updated should there be a config reload following
addition/removal of an url from "routes".

If config had 1 route to "nats://127.0.0.1:1234", here is what
it would look like now:
```
"cluster": {
    "addr": "0.0.0.0",
    "cluster_port": 6222,
    "auth_timeout": 1,
    "urls": [
      "127.0.0.1:1234"
    ]
  },
```
Adding route to "127.0.0.1:4567" and doing config reload:
```
"cluster": {
    "addr": "0.0.0.0",
    "cluster_port": 6222,
    "auth_timeout": 1,
    "urls": [
      "127.0.0.1:1234",
      "127.0.0.1:4567"
    ]
  },
```
Note that due to how we handle discovered servers in the cluster,
new urls dynamically discovered will not show in above output.
This could be done, but would need some changes in how we store
things (actually in this case, new urls are not stored, just
attempted to be connected. Once they connect, they would be visible
in /routez).

For gateways, however, this PR displays the combination of the
URLs defined in config and the ones that are discovered after
a connection is made to a give cluster. So say cluster A has a single
url to one server in cluster B, when connecting to that server,
the server on A will get the list of the gateway URLs that one
can connect to, and these will be reflected in /varz. So this is
a different behavior that for routes. As explained above, we could
harmonize the behavior in a future PR.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-24 13:42:41 -06:00
Ivan Kozlovic
48c3f7f846 Fixed some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-24 09:53:35 -06:00
Ivan Kozlovic
97ee89cc67 Check inbound GW connection connected state in parser
If the first protocol for an inbound gateway connection is not
CONNECT, reject with auth violation.

Fixes #1006

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-22 12:31:16 -06:00
Derek Collison
933f5d0df4 Add in TestGatewayServiceExportWithWildcards
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-21 15:28:42 -07:00
Derek Collison
67bb08af8b Fixes for a few flappers.
TestJWTAccountImportActivationExpires
TestGatewayServiceImportWithQueue

Signed-off-by: Derek Collison <derek@nats.io>
2019-05-21 15:12:31 -07:00
Derek Collison
ecfd1a2c85 Max flapper less so
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-21 11:59:34 -07:00
Derek Collison
8a614b49e1 Fix for reload race on global account
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-21 11:37:15 -07:00
Ivan Kozlovic
1cdc3eb41f Better randomize solicited Gateway URLs
Shuffle the array created when iterating through the gateways URLs
map since map iteration may not be well randomized with small maps.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-21 09:28:59 -06:00
Ivan Kozlovic
7272e4e317 Make the error report attempts configurable
This is a continuation of #1000. Added a configuration to specify
the number of attempts at which the repeated error is reported.
The algo is now to print only the 1st attempt and when current
attempt % <this config param> == 0.

Resolves #969

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-20 16:28:48 -06:00
Ivan Kozlovic
03930ba0e4 [UPDATED] Reduce report of failed connection attempts
This applies to routes, gateways and leaf node connections.
The failed attempts will be printed at the first, after the first
minute and then every hour.
The connect/error statements now include the attempt number.

Note that in debug mode, all attempts are traced, so you may get
double trace (one for debug, one for info/error).

Resolves #969

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-20 10:13:56 -06:00
Ivan Kozlovic
1eff7bc112 Fixed gateway test race report
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-13 11:49:29 -06:00
Derek Collison
d7140a0fd1 Update for client rename
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-10 15:11:30 -07:00
Derek Collison
da9120345c RC version bump [ci skip]
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-09 18:13:31 -07:00
Derek Collison
034a9fd1e4 Merge pull request #990 from nats-io/smap
Optimize updates for leaf node smaps.
2019-05-09 18:12:29 -07:00
Derek Collison
042e5a539a Optimize updates for leaf node smaps.
Previously we would walk all clients bound to an account to
collect the leaf nodes for updating of the subscription maps.

Signed-off-by: Derek Collison <derek@nats.io>
2019-05-09 17:25:17 -07:00
Ivan Kozlovic
c014211318 [FIXED] Changes to Varz content and fixed race conditions
----------------------------------------------------------------
Backward-incompatibility note:

Varz used to embed *Info and *Options which are other server objects.
However, Info is a struct that servers used to send protocols to other
servers or clients and its content must contain json tags since we
need to marshal those to be sent over. The problem is that it made
those fields now accessible to users calling Varz() and also visible
to the http /varz output. Some fields in Info were introduced in the
2.0 branch that clashed with json tag in Options, which made cluster{}
for instance disappear in the /varz output - because a Cluster string
in Info has the same json tag, and Cluster in Info is empty in some
cases.
For users that embed NATS and were using Server.Varz() directly,
without the use of the monitoring endpoint, they were then given
access (which was not the intent) to server internals (Info and Options).
Fields that were in Info or Options or directly in Varz that did not
clash with each other could be referenced directly, for instace, this
is you could access the server ID:

v, _ := s.Varz(nil)
fmt.Println(v.ID)

Another way would be:

fmt.Println(v.Info.ID)

Same goes for fields that were brought from embedding the Options:

fmt.Println(v.MaxConn)

or

fmt.Println(v.Options.MaxConn)

We have decided to explicitly define fields in Varz, which means
that if you previously accessed fields through v.Info or v.Options,
you will have to update your code to use the corresponding field
directly: v.ID or v.MaxConn for instance.

So fields were also duplicated between Info/Options and Varz itself
so depending on which one your application was accessing, you may
have to update your code.
---------------------------------------------------------------

Other issues that have been fixed is races that were introduced
by the fact that the creation of a Varz object (pointing to
some server data) was done under server lock, but marshaling not
being done under that lock caused races.

The fact that object returned to user through Server.Varz() also
had references to server internal objects had to be fixed by
returning deep copy of those internal objects.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-09 14:33:04 -06:00
Derek Collison
6584a9a828 lint updates
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-06 15:41:38 -07:00