Commit Graph

297 Commits

Author SHA1 Message Date
Derek Collison
7b1bea61e2 Merge pull request #1192 from nats-io/load_account
Do not fetch accounts on system events.
2019-11-16 18:33:23 -08:00
Derek Collison
f60266bc2e Merge pull request #1190 from nats-io/import_reply
Introduced wildcard handling of _R_ mapped replies.
2019-11-16 18:07:18 -08:00
Derek Collison
093b57ed40 Do not fetch accounts on system events.
Noticed we would lookup accounts, but would also fetch them when tracking remote connections, etc.

Signed-off-by: Derek Collison <derek@nats.io>
2019-11-16 18:05:42 -08:00
Ivan Kozlovic
0bfd03091b Clean tmp accounts map when race gets duplicate
Added check to the test to ensure that tmp map is empty.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-16 18:14:23 -07:00
Ivan Kozlovic
3e1728d623 [FIXED] Some accounts locking issues
- Risk of deadlock when checking if issuer claim are trusted. There
  was a RLock() in one thread, then a request for Lock() in another
  that was waiting for RLock() to return, but the first thread was
  then doing RLock() which was not acquired because this was blocked
  by the Lock() request (see e2160cc571)

- Use proper account/locking mode when checking if stream/service
  exports/signer have changed.

- Account registration race (regression from https://github.com/nats-io/nats-server/pull/890)

- Move test from #890 to "no race" test since only then could it detect
  the double registration.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-16 16:59:38 -07:00
Derek Collison
6ad8287bbe Introduced wildcard handling of _R_ mapped replies.
We had too much special processing, so reduced to a single wildcard
which will propagate across routes and gateways and is consistent
with gateway handling of globally routed subjects and timeouts.

Signed-off-by: Derek Collison <derek@nats.io>
2019-11-16 12:50:53 -08:00
Derek Collison
954a780421 Fix possible panic on nil sublist.
We may have the case that the account is held in tmpAccounts but does not have a sublist. When this happens if we process as RS+ and do LookupAccount and get it from the tmpAccount and before it was registered the route code could try to do an insert on the sl.

Signed-off-by: Derek Collison <derek@nats.io>
2019-11-15 17:11:52 -08:00
Ivan Kozlovic
bdf5cf63b3 Shutdown on Ctrl+C
Changed code on Windows to not use svc code if running in interactive
mode. The original code was running svc.debug.Run() which uses service
code (Execute()) but from the command line. We don't need that.

Also reduced salt on bcrypt password for a config file that started
to cause failures due to test taking too long to finish.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-14 20:05:32 -07:00
Derek Collison
3330820502 Fixed a bug where we leaked service imports. Also prior this would have leaked subscriptions as well.
Signed-off-by: Derek Collison <derek@nats.io>
2019-11-14 13:29:17 -08:00
Ivan Kozlovic
b561bde366 Alternate approach to GW reply mapping expiration
Use centralized sync map to gather *client that have GW replies.
Tested with concurrent receiving clients and perf is as good as
with timer per client but reduces need of that timer per client
object.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-11 13:36:24 -07:00
Ivan Kozlovic
8a8695d07c Backward compatibility with previous servers
Want to keep this commit separate so that we can easily remove
when we no longer want to support both prefixes.

- If this server receives a "$GR." message, it takes the subject
  and tries to process this locally. If there is no cluster race
  reply may be received ok (like before).
- If this server sends a routed reply, it detects if sending to
  an older server (then uses $GR.) or not (then uses $GNR)
- Gateway INFO has a new field that indicates if the server is
  using the new prefix.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-08 16:22:34 -07:00
Ivan Kozlovic
9b7dab0548 Updates based on code review
- Add atomic in client to skip check in processInboundClientMsg()
  if value is 0. Avoids getting the lock in fast path if not needed.
- Have a timer per client instead of the global server list that
  was expiring: noticed a lot of contention there when running
  some perf/profiling tests. The timer is also not reset for
  every timestamp that is not yet expired since this too affects
  performance. Instead fires are regular interval and cleared
  when map is empty after a cycle.
- Move processing of gw map rely on its own function (in inbound msg).
  I have verified that this is inlined same way as when code was
  directly in processInboundClientMsg.
- Use string(subj[]) for prefix detection: I have verified that
  it is actually faster.
- Builds the RMSG with appends to local buffer in handleGatewayReply()
  instead of using fmt.Sprintf().

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-08 15:56:28 -07:00
Ivan Kozlovic
aa843945c9 Work on Gateways reply mapping
- New prefix that includes origin server for the request
- Mapping done if request is service import or requestor has
  recent subscription
- Subscription considered recent if less than 250ms
- Destination server strip GW prefix before giving to client
  and restore when getting a reply on that subject
- Mapping removed aftert 250ms
- Server rejects client publish on "$GNR." (the new prefix)
- Cluster and server hash are now 8 chars long and from base 62
  alphabets
- Mapped replies need to be sent to leafnode servers due to race
  (cluster B sends RS+ on GW inbound then RMSG on outbound, the
  RS+ may be processed later and cluster A may have given message
  to LN before RS+ on reply subject. So LN needs to accept the
  mapped reply but will strip to give to client and reassemble
  before sending it back)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-06 16:06:49 -07:00
Derek Collison
8a69c5cb71 Updates to benchmarks
Allow disabling of short first ping timer for clients.
Adjust names so that full test suite results are aligned.
Removed the account lookup, we use sync.Map but also a no-lock cache.

Signed-off-by: Derek Collison <derek@nats.io>
2019-11-02 08:04:22 -07:00
Derek Collison
f0f807f99a After speaking with Ivan we are taking a better approach for initial RTT.
Ivan had the idea of using the CONNECT to establish a first estimate of RTT
without additional PING/PONGs.

Signed-off-by: Derek Collison <derek@nats.io>
2019-10-31 14:01:55 -07:00
R.I.Pienaar
bcf96fa1de Allows a descriptive server_name to be set
This adds a new config option server_name that
when set will be exposed in varz, events and more
as a descriptive name for the server.

If unset though the server_name will default to the pk

Signed-off-by: R.I.Pienaar <rip@devco.net>
2019-10-17 18:51:19 +02:00
Derek Collison
7cb6056a94 Account support for Connz and user or account filtering
1. Accounts will show up in connection info if auth=1.
2. You can filter by user (?auth=1&user=ivan) or account (?auth=1&acc=eng)

Signed-off-by: Derek Collison <derek@nats.io>
2019-10-11 10:22:08 -07:00
Ivan Kozlovic
18a1702ba2 [ADDED] Basic auth for leafnodes
Added a way to specify which account an accepted leafnode connection
should be bound to when using simple auth (user/password).

Singleton:
```
leafnodes {
  port: ...
  authorization {
    user: leaf
    password: secret
    account: TheAccount
  }
}
```
With above configuration, if a soliciting server creates a LN connection
with url: `nats://leaf:secret@host:port`, then the accepting server
will bind the leafnode connection to the account "TheAccount". This account
need to exist otherwise the connection will be rejected.

Multi:
```
leafnodes {
  port: ...
  authorization {
    users = [
      {user: leaf1, password: secret, account: account1}
      {user: leaf2, password: secret, account: account2}
    ]
  }
}
```
With the above, if a server connects using `leaf1:secret@host:port`, then
the accepting server will bind the connection to account `account1`.

If user/password (either singleton or multi) is defined, then the connecting
server MUST provide the proper credentials otherwise the connection will
be rejected.

If no user/password info is provided, it is still possible to provide the
account the connection should be associated with:
```
leafnodes {
  port: ...
  authorization {
    account: TheAccount
  }
}
```
With the above, a connection without credentials will be bound to the
account "TheAccount".

If credentials are used (jwt, nkey or other), then the server will attempt
to authenticate and if successful associate to the account for that specific
user. If the user authentication fails (wrong password, no such user, etc..)
the connection will be also rejected.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-30 19:42:11 -06:00
Derek Collison
7fe47ace2b Make sure to turn latency on with a claim update
Signed-off-by: Derek Collison <derek@nats.io>
2019-09-19 14:20:35 -07:00
Ivan Kozlovic
20a925ae86 Updates to registerAccount
Make it a function that grabs server lock/unlock and invokes
registerAccountNoLock(). Use that function when already under
the server's lock.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-18 12:45:12 -06:00
Ivan Kozlovic
150d47cab3 [FIXED] Locking issue around account lookup/updates
Ensure that lookupAccount does not hold server lock during
updateAccount and fetchAccount.
Updating the account cannot have the server lock because it is
possible that during updateAccountClaims(), clients are being
removed, which would try to get the server lock (deep down in
closeConnection/s.removeClient).
Added a test that would have show the deadlock prior to changes
in this PR.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-17 18:48:23 -06:00
Derek Collison
52430c304a System level services for debugging.
This is the first pass at introducing exported services to the system account for generally debugging of blackbox systems.
The first service reports number of subscribers for a given subject. The payload of the request is the subject, and optional queue group, and can contain wildcards.

Signed-off-by: Derek Collison <derek@nats.io>
2019-09-17 09:37:35 -07:00
Ivan Kozlovic
4253b31dcf [FIXED] Circular account service import dependency
If account A imports from B and B from A, when the account A
is built, it causes B to be fetch, but since B imports from A,
A was fetch/built again in an infinite loop.

Resolves #1117

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-10 18:05:21 -06:00
Derek Collison
7989118c3f First pass latency tracking for exported services
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 10:52:48 -07:00
Ivan Kozlovic
cd4b8d3fad [ADDED] /leafz endpoint
Resolves #1061

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 12:00:24 -06:00
Ivan Kozlovic
cd9f898eb0 Made a server's helper to set first ping timer
Defaults to 1sec but will be opts.PingInterval if value is lower.
All non client connections invoked this function for the first
PING.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 10:21:43 -06:00
Ivan Kozlovic
7ca8723942 [FIXED] Some Leafnode issues
- On startup, verify that local account in leafnode (if specified
  can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
  Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
  solicited Leafnode connection to not use user/password when
  trying an URL that was discovered through gossip. The server
  now saves the credentials of a configured URL to use with
  the discovered ones.

Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-23 14:08:07 -06:00
Derek Collison
8f5bc503e5 Add ability for cross account import services to return streams as well as singeltons.
Take into account tracking of response maps that are created and do proper cleanup.
Also fixes #1089 which was discovered while working on this.

Signed-off-by: Derek Collison <derek@nats.io>
2019-08-06 14:15:40 -07:00
Derek Collison
1d6c58074f Fix for #1065 (leaked subscribers from dq subs across routes)
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-22 17:17:43 -07:00
Ivan Kozlovic
37d08a6c56 [FIXED] Allow TLS InsecureSkipVerify again
This has an effect only on connections created by the server,
so routes and gateways (explicit and implicit).
Make sure that an explicit warning is printed if the insecure
property is set, but otherwise allow it.

Resolves #1062

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-12 12:10:28 -06:00
Andy Xie
c9221fd187 check for monitor server start error 2019-07-11 11:44:06 +08:00
Derek Collison
d7e5554630 Grab opts under correct lock, make cache decision more explicit
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-02 09:31:54 -07:00
Derek Collison
8168aa1f81 Allow sublist cache do be disabled globally
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-02 07:34:02 -07:00
Derek Collison
ebd4deb8b9 Stager first ping from server and suppress pings if a ping was received.
Signed-off-by: Derek Collison <derek@nats.io>
2019-06-29 15:43:15 -07:00
Derek Collison
5b42b99dc1 Allow operator to be inline JWT. Also preloads just warn on validation issues, do not stop starting or reloads.
We issue validation warnings now to the log.

Signed-off-by: Derek Collison <derek@nats.io>
2019-06-24 16:46:22 -07:00
Ivan Kozlovic
ed1901c792 Update go.mod to satisfy v2 requirements
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-06-03 19:45:47 -06:00
Derek Collison
da938dcb1e Cleaned up debug and fixed test
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-30 14:30:35 -07:00
Derek Collison
874f06a212 Fix bugs on reloadAuthorization
When tls is on routes it can cause reloadAuthorization to be called.
We were assuming configured accounts, but did not copy the remote map.
This copies the remote map when transferring for configured accounts
and also handles operator mode. In operator mode we leave the accounts
in place, and if we have a memory resolver we will remove accounts that
are not longer defined or have bad claims.

Signed-off-by: Derek Collison <derek@nats.io>
2019-05-29 13:19:58 -07:00
Ivan Kozlovic
f5991e8a2b Merge pull request #1015 from nats-io/restore_conn_error_default_attempts_to_one
Update to connect/reconnect error reports logic
2019-05-28 14:57:29 -06:00
Ivan Kozlovic
5478eaf01e Added /gatewayz endpoint
Such endpoint will list the gateway/cluster name, address and port
then list of outbound/inbound connections.
For each remote gateway there will be at most one outbound connection.
There can be 0 or more inbound connections for the same remote
gateway.

For each of these outbound/inbound connection, the connection info
similar to Connz is reported. Optionally, one can include the
interest mode/stats for each account.

Here are possible options:

* No specific options

http://host:port/gatewayz

* Limit to specific remote gateway, say name "B":

http://host:port/gatewayz/gw_name=B

* Include accounts (default limit to 1024 accounts)

http://host:port/gatewayz/accs=1

* Specific limit, say 200 (note accs=1 in this case is optional)

http://host:port/gatewayz/accs=1&accs_limit=200

* Specific account, say "acc_1". Note that accs=1 is not required then

http://host:port/gatewayz/acc_name=acc_1

* Above options can be mixed: specific remote gateway (B), with 100
  accounts reported

http://host:port/gatewayz/gw_name=B&accs_limit=200

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-28 12:41:09 -06:00
Ivan Kozlovic
d2578f9e05 Update to connect/reconnect error reports logic
Changed the introduced new option and added a new one. The idea
is to be able to differentiate between never connected and reconnected
event. The never connected situation will be logged at first attempt
and every hour (by default, configurable).
However, once connected and if trying to reconnect, will report every
attempts by default, but this is configurable too.

These two options are supported for config reload.

Related to #1000
Related to #1001
Resolves #969

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-26 17:51:01 -06:00
Ivan Kozlovic
55597a7e8b [ADDED] URLs to cluster{} in /varz and update of gateway ones
In varz's cluster{} section, there was no URLs field. This PR adds
it and displays the routes defined in the cluster{} config section.
The value gets updated should there be a config reload following
addition/removal of an url from "routes".

If config had 1 route to "nats://127.0.0.1:1234", here is what
it would look like now:
```
"cluster": {
    "addr": "0.0.0.0",
    "cluster_port": 6222,
    "auth_timeout": 1,
    "urls": [
      "127.0.0.1:1234"
    ]
  },
```
Adding route to "127.0.0.1:4567" and doing config reload:
```
"cluster": {
    "addr": "0.0.0.0",
    "cluster_port": 6222,
    "auth_timeout": 1,
    "urls": [
      "127.0.0.1:1234",
      "127.0.0.1:4567"
    ]
  },
```
Note that due to how we handle discovered servers in the cluster,
new urls dynamically discovered will not show in above output.
This could be done, but would need some changes in how we store
things (actually in this case, new urls are not stored, just
attempted to be connected. Once they connect, they would be visible
in /routez).

For gateways, however, this PR displays the combination of the
URLs defined in config and the ones that are discovered after
a connection is made to a give cluster. So say cluster A has a single
url to one server in cluster B, when connecting to that server,
the server on A will get the list of the gateway URLs that one
can connect to, and these will be reflected in /varz. So this is
a different behavior that for routes. As explained above, we could
harmonize the behavior in a future PR.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-24 13:42:41 -06:00
Derek Collison
034a9fd1e4 Merge pull request #990 from nats-io/smap
Optimize updates for leaf node smaps.
2019-05-09 18:12:29 -07:00
Derek Collison
042e5a539a Optimize updates for leaf node smaps.
Previously we would walk all clients bound to an account to
collect the leaf nodes for updating of the subscription maps.

Signed-off-by: Derek Collison <derek@nats.io>
2019-05-09 17:25:17 -07:00
Ivan Kozlovic
c014211318 [FIXED] Changes to Varz content and fixed race conditions
----------------------------------------------------------------
Backward-incompatibility note:

Varz used to embed *Info and *Options which are other server objects.
However, Info is a struct that servers used to send protocols to other
servers or clients and its content must contain json tags since we
need to marshal those to be sent over. The problem is that it made
those fields now accessible to users calling Varz() and also visible
to the http /varz output. Some fields in Info were introduced in the
2.0 branch that clashed with json tag in Options, which made cluster{}
for instance disappear in the /varz output - because a Cluster string
in Info has the same json tag, and Cluster in Info is empty in some
cases.
For users that embed NATS and were using Server.Varz() directly,
without the use of the monitoring endpoint, they were then given
access (which was not the intent) to server internals (Info and Options).
Fields that were in Info or Options or directly in Varz that did not
clash with each other could be referenced directly, for instace, this
is you could access the server ID:

v, _ := s.Varz(nil)
fmt.Println(v.ID)

Another way would be:

fmt.Println(v.Info.ID)

Same goes for fields that were brought from embedding the Options:

fmt.Println(v.MaxConn)

or

fmt.Println(v.Options.MaxConn)

We have decided to explicitly define fields in Varz, which means
that if you previously accessed fields through v.Info or v.Options,
you will have to update your code to use the corresponding field
directly: v.ID or v.MaxConn for instance.

So fields were also duplicated between Info/Options and Varz itself
so depending on which one your application was accessing, you may
have to update your code.
---------------------------------------------------------------

Other issues that have been fixed is races that were introduced
by the fact that the creation of a Varz object (pointing to
some server data) was done under server lock, but marshaling not
being done under that lock caused races.

The fact that object returned to user through Server.Varz() also
had references to server internal objects had to be fixed by
returning deep copy of those internal objects.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-09 14:33:04 -06:00
Derek Collison
6584a9a828 lint updates
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-06 15:41:38 -07:00
Derek Collison
acfe372d63 Changes for rename from gnatsd -> nats-server
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-06 15:04:24 -07:00
Derek Collison
5292ec1598 Various fixes, init smap for leafnodes with gateways too
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-02 14:22:51 -07:00
Derek Collison
2ec3eaeaa9 Leafnode account based connections limits
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-25 14:40:59 -07:00
Derek Collison
bfe83aff81 Make account lookup faster with sync.Map
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-23 17:13:23 -07:00