Commit Graph

105 Commits

Author SHA1 Message Date
Derek Collison
10d4f1ab7a Convert leafnode solicited remotes to array
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-10 11:53:34 -07:00
Ivan Kozlovic
5478eaf01e Added /gatewayz endpoint
Such endpoint will list the gateway/cluster name, address and port
then list of outbound/inbound connections.
For each remote gateway there will be at most one outbound connection.
There can be 0 or more inbound connections for the same remote
gateway.

For each of these outbound/inbound connection, the connection info
similar to Connz is reported. Optionally, one can include the
interest mode/stats for each account.

Here are possible options:

* No specific options

http://host:port/gatewayz

* Limit to specific remote gateway, say name "B":

http://host:port/gatewayz/gw_name=B

* Include accounts (default limit to 1024 accounts)

http://host:port/gatewayz/accs=1

* Specific limit, say 200 (note accs=1 in this case is optional)

http://host:port/gatewayz/accs=1&accs_limit=200

* Specific account, say "acc_1". Note that accs=1 is not required then

http://host:port/gatewayz/acc_name=acc_1

* Above options can be mixed: specific remote gateway (B), with 100
  accounts reported

http://host:port/gatewayz/gw_name=B&accs_limit=200

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-28 12:41:09 -06:00
Ivan Kozlovic
ce1e6defab Fix flappers
- TestSystemAccountConnectionUpdatesStopAfterNoLocal: I believe that
  the check on number of notifications was wrong. Since we did not
  consume the ones for the connect, the expected count after the
  disconnect is 8 instead of 4.

- Possible fix GW tests complaining about number of outbound/inbound
  I think that it may be possible that connection does not succeed
  right away (remote to fully started, etc) and due to dial timeout
  and reconnect attempt delay, I suspect that when given a max time
  of 1sec to complete, it may not be enough.
  Quick change for now is to override to 2secs for now in the
  wait helpers. If that proves conclusive, we could remove the
  timeout given to these helpers.

- TestGatewaySendAllSubsBadProtocol: used a t.Fatalf() in checkFor
  instead of return fmt.Errorf().

- TestLeafNodeResetsMSGProto: this test is not about change to
  interest mode only, so to avoid possible mix of protos, delay
  a bit creation of gateway after creation of leaf node.

- Some defer s.Shutdown() were missing

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-26 17:17:08 -06:00
Ivan Kozlovic
55597a7e8b [ADDED] URLs to cluster{} in /varz and update of gateway ones
In varz's cluster{} section, there was no URLs field. This PR adds
it and displays the routes defined in the cluster{} config section.
The value gets updated should there be a config reload following
addition/removal of an url from "routes".

If config had 1 route to "nats://127.0.0.1:1234", here is what
it would look like now:
```
"cluster": {
    "addr": "0.0.0.0",
    "cluster_port": 6222,
    "auth_timeout": 1,
    "urls": [
      "127.0.0.1:1234"
    ]
  },
```
Adding route to "127.0.0.1:4567" and doing config reload:
```
"cluster": {
    "addr": "0.0.0.0",
    "cluster_port": 6222,
    "auth_timeout": 1,
    "urls": [
      "127.0.0.1:1234",
      "127.0.0.1:4567"
    ]
  },
```
Note that due to how we handle discovered servers in the cluster,
new urls dynamically discovered will not show in above output.
This could be done, but would need some changes in how we store
things (actually in this case, new urls are not stored, just
attempted to be connected. Once they connect, they would be visible
in /routez).

For gateways, however, this PR displays the combination of the
URLs defined in config and the ones that are discovered after
a connection is made to a give cluster. So say cluster A has a single
url to one server in cluster B, when connecting to that server,
the server on A will get the list of the gateway URLs that one
can connect to, and these will be reflected in /varz. So this is
a different behavior that for routes. As explained above, we could
harmonize the behavior in a future PR.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-24 13:42:41 -06:00
Ivan Kozlovic
48c3f7f846 Fixed some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-24 09:53:35 -06:00
Derek Collison
d7140a0fd1 Update for client rename
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-10 15:11:30 -07:00
Ivan Kozlovic
c014211318 [FIXED] Changes to Varz content and fixed race conditions
----------------------------------------------------------------
Backward-incompatibility note:

Varz used to embed *Info and *Options which are other server objects.
However, Info is a struct that servers used to send protocols to other
servers or clients and its content must contain json tags since we
need to marshal those to be sent over. The problem is that it made
those fields now accessible to users calling Varz() and also visible
to the http /varz output. Some fields in Info were introduced in the
2.0 branch that clashed with json tag in Options, which made cluster{}
for instance disappear in the /varz output - because a Cluster string
in Info has the same json tag, and Cluster in Info is empty in some
cases.
For users that embed NATS and were using Server.Varz() directly,
without the use of the monitoring endpoint, they were then given
access (which was not the intent) to server internals (Info and Options).
Fields that were in Info or Options or directly in Varz that did not
clash with each other could be referenced directly, for instace, this
is you could access the server ID:

v, _ := s.Varz(nil)
fmt.Println(v.ID)

Another way would be:

fmt.Println(v.Info.ID)

Same goes for fields that were brought from embedding the Options:

fmt.Println(v.MaxConn)

or

fmt.Println(v.Options.MaxConn)

We have decided to explicitly define fields in Varz, which means
that if you previously accessed fields through v.Info or v.Options,
you will have to update your code to use the corresponding field
directly: v.ID or v.MaxConn for instance.

So fields were also duplicated between Info/Options and Varz itself
so depending on which one your application was accessing, you may
have to update your code.
---------------------------------------------------------------

Other issues that have been fixed is races that were introduced
by the fact that the creation of a Varz object (pointing to
some server data) was done under server lock, but marshaling not
being done under that lock caused races.

The fact that object returned to user through Server.Varz() also
had references to server internal objects had to be fixed by
returning deep copy of those internal objects.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-09 14:33:04 -06:00
Derek Collison
fcf1cecda9 Merge pull request #737 from nats-io/route_perm2
Route permission propogation
2018-09-05 17:32:19 -07:00
Ivan Kozlovic
b15377b40c Merge pull request #736 from nats-io/fix_flappers
Fixed flapping tests
2018-09-05 18:10:50 -06:00
Ivan Kozlovic
8f480f3f42 Fixed flapping tests
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-09-05 17:22:15 -06:00
Derek Collison
2ee868ba18 Propogate route imports and exports to other connected servers
Signed-off-by: Derek Collison <derek@nats.io>
2018-09-05 16:15:31 -07:00
Derek Collison
21f29cf897 Move tests
Signed-off-by: Derek Collison <derek@nats.io>
2018-09-05 13:52:52 -07:00
Derek Collison
305d7bdf88 Allow subsz detail and test for matching subs
Signed-off-by: Derek Collison <derek@nats.io>
2018-07-01 13:02:28 -07:00
Derek Collison
cd834a36fa Added more sort options, fixed some broken ones.
Fixes #700, #701, #702

Signed-off-by: Derek Collison <derek@nats.io>
2018-06-29 17:44:01 -07:00
Derek Collison
2e0830201c Make sure closed conn accounting correct for bad clients
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-29 11:42:23 -07:00
Derek Collison
3b953ce838 Allow localhost to not be defined, only need 127.0.0.1
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-28 16:10:19 -07:00
Ivan Kozlovic
aff1dcf089 Fix some tests
Add some helpers to check on some state.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-06-27 17:26:49 -06:00
Ivan Kozlovic
f692c0ef8a Add debug info for failed RTT test
The test TestConnzRTT() failed once with "invalid duration". Adding
the original string in case of error to understand better.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-06-26 19:54:12 -06:00
Ivan Kozlovic
cb1c2e7352 Use waitForClientConnCount() in TestConnzTLSInHandshake()
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-06-26 19:42:55 -06:00
Ivan Kozlovic
c092c3d19e Wait for correct client count in TestConnzTLSInHandshake
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-06-26 18:52:56 -06:00
Derek Collison
f9f478b143 Wait for all closed connections before starting
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-26 15:14:32 -07:00
Derek Collison
4a18daed31 megacheck
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-26 15:01:23 -07:00
Derek Collison
e1058d4dd8 Make sure closed connection with options are race safe
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-26 14:45:58 -07:00
Derek Collison
0c0dd92467 cluster should be empty when not defined
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-26 10:49:18 -07:00
Derek Collison
ec8e2636de Track closed connections and reason for closing
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-25 17:56:07 -07:00
Derek Collison
17fecd4c9b Support CID in client INFO, allow filtering /connz by CID
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-21 15:23:15 -07:00
Derek Collison
7e28af236b Support for RTT - #643
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-20 20:18:59 -07:00
Derek Collison
50bb4b9a1b delivery last activity update 2018-06-04 17:45:05 -07:00
Derek Collison
955d8ee698 require 1.9 or above, bug fix in test 2018-06-04 17:45:05 -07:00
Derek Collison
3e2e8c9ce5 Fixed bug reusing test sub
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-04 17:45:05 -07:00
Derek Collison
50a99241ea Slow consumer updates and latency improvements.
Use pending bytes as slow consumer trigger, so reintroduce max_pending.
Improve latency with inplace flush calls when appropriate. Utilize simple
time budget for readLoop routine.

Signed-off-by: Derek Collison <derek@nats.io>
2018-06-04 17:45:05 -07:00
Ivan Kozlovic
9d11587e49 Add test to ensure server handles more than 1 signal
Also, try to fix flapping test
2018-04-06 17:24:41 -06:00
Ivan Kozlovic
705d8f5fe8 Fix last activity monitoring test
This test originally used only 1 connection. It was then modified
to use another connection to check the publishing effect into
the last activity. However, when polling for connz we were using
[0], but that may not necessarily match the connection we were
checking.
I don't think that there was a need for a new connection in this
test, so use a single connection.
2018-03-23 14:20:16 -06:00
Ivan Kozlovic
fb972bd0fc Remove ssl_required references 2018-03-23 13:40:10 -06:00
Derek Collison
b2a9ed97d6 Merge pull request #650 from nats-io/cncf
Move to CNCF and Apache 2 License
2018-03-16 16:23:10 -07:00
Ivan Kozlovic
4e9d785423 Capture possible error in Atoi conversion of url params 2018-03-16 10:42:52 -06:00
Derek Collison
00901acc78 Update license to Apache 2 2018-03-15 22:31:07 -07:00
Ivan Kozlovic
a3e8fba6b3 Add options for each monitoring endpoint and added Connz
Even for endpoints that currently do not need options (such as
Subsz and Varz), I added some options and a return of error so that
we don't break the API if we ever need to add options for those.
2018-03-13 10:30:46 -06:00
Ivan Kozlovic
c4470d4e68 Added Subsz and Routez and updated tests
In tests, replaced all code doing http.Get, etc.. with a readBody
helper function.
Combine server functions and monitor handler tests in one test
with a for loop to try both modes on a given test.

Still need to add Connz.
2018-03-09 20:48:43 -07:00
Tyler Treat
6c7d1dc847 Add Varz server method
Fixes #612 by adding a new Varz method to the Server. This can be used
by processes embedding gnatsd to access the Varz struct without relying
on the HTTP API. This starts with just Varz but the same pattern could
be expanded to other monitoring APIs in the future.
2018-03-09 19:37:56 -07:00
Ivan Kozlovic
5a351c56ce Merge branch 'master' into add-server-id-to-routez-connz 2018-01-30 14:59:26 -07:00
Ivan Kozlovic
6fad293a21 [FIXED] Connz would "block" for TLS clients still in TLS handshake
If server requires TLS and clients are connecting, and a Connz
request is made while clients are still in TLS Handshake, the
call to tls.Conn.ConnectionState() would block for the duration
of the handshake. This would cause the overall http request to
take too long.
We will now not try to gather TLSVersion and TLSCipher from a
client that is still in TLS handshake.

Resolves #600
2017-11-09 09:47:05 -07:00
Alberto Ricart
e0fe1247dd [FIX] #490 - Added server id to connz and routez 2017-10-24 19:36:10 -05:00
Ivan Kozlovic
56649b3273 [FIXED] Possible data race in routez when route disconnects (#540)
* [FIXED] Possible data race in routez when route disconnects

Resolves #539
2017-07-11 16:11:22 -06:00
Ivan Kozlovic
70b3b18535 Fix some tests
Fixing various tests that were failing locally when running in
parallel mode (without -p=1).
In reload_test.go, lots of nats.Conn.Close() were missing which
would require too much memory when running with `-race` mode.
2017-07-05 18:57:15 -06:00
Peter Miron
00744ff426 converted MonitorAddr and ClusterAddr to *net.TCPAddr 2017-06-12 17:40:36 -04:00
Peter Miron
5e640f099d clean up of log files. removed FatalError function to make sure I'm minimizing changes to actual server. 2017-06-11 16:20:04 -04:00
Peter Miron
da1cb9abb2 missed go fmt'ing. 2017-06-10 10:39:09 -04:00
Peter Miron
d1f38f38a2 changes to support random ports for clusters and profiler. 2017-06-10 10:35:01 -04:00
Ivan Kozlovic
a87050c546 [FIXED] Check for negative Offset and/or Limit when processing Connz
Ensure that if the offset is negative, it is set to 0. If the limit
is negative, it is set to the default value.

Resolves #491
2017-05-17 12:05:54 -06:00