Commit Graph

78 Commits

Author SHA1 Message Date
Derek Collison
e9b9788fbe Various bug fixes, fixes for flappers
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:33:06 -07:00
Derek Collison
17aca11002 Small changes to event ids, good approach though with separate lock on account
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:27:45 -07:00
R.I.Pienaar
63845b8577 add type hints to service latency, use time.Time for timestamp
Signed-off-by: R.I.Pienaar <rip@devco.net>
2020-05-19 14:26:46 -07:00
Derek Collison
911e7ef35d Add additional fields to client info for latency
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:26:46 -07:00
Derek Collison
a7f1bca534 Additional service latency upgrades.
We now share more information about the responder and the requestor. The requestor information by default is not shared, but can be when declaring the import.

Also fixed bug for error handling on old request style requests that would always result on a 408 response.

Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:26:46 -07:00
Derek Collison
ea5e5bd364 Services rewrite #2
This contains a rewrite to the services layer for exporting and importing. The code this merges to already had a first significant rewrite that moved from special interest processing to plain subscriptions.

This code changes the prior version's dealing with reverse mapping which was based mostly on thresholds and manual pruning, with some sporadic timer usage. This version uses the jetstream branch's code that understands interest and failed deliveries. So this code is much more tuned to reacting to interest changes. It also removes thresholds and goes only by interest changes or expirations based around a new service export property, response thresholds. This allows a service provider to provide semantics on how long a response should take at a maximum.

This commit also introduces formal support for service export streamed and chunked response types send an empty message to signify EOF.

This commit also includes additions to the service latency tracking such that errors are now sent, not only successful interactions. We have added a Status field and an optional Error fields to ServiceLatency.

We support the following Status codes, these are directly from HTTP.

400 Bad Request (request did not have a reply subject)
408 Request Timeout (when system detects request interest went away, old request style to make dependable)..
503 Service Unavailable (no service responders running)
504 Service Timeout (The new response threshold expired)

Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:26:46 -07:00
R.I.Pienaar
78fdeb661d move events nuid to the server struct 2020-05-19 14:24:31 -07:00
R.I.Pienaar
3182db4c3a move to events having Type not Schema 2020-05-19 14:22:53 -07:00
R.I.Pienaar
3d5397add2 use constants for the schema ids
Signed-off-by: R.I.Pienaar <rip@devco.net>
2020-05-19 14:21:27 -07:00
R.I.Pienaar
0703f266cc add schema, id and time to client connect events
This bring these to same level as the JS events, these are the ones
I care for right now but will do this to the rest here in time as well
and document them in JSON schema

Signed-off-by: R.I.Pienaar <rip@devco.net>
2020-05-19 14:21:27 -07:00
Derek Collison
3ab76a6dcd Write performance tweaks
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:16:03 -07:00
R.I.Pienaar
fc6d8826f5 show basic jetstream info in varz and server info 2020-05-19 14:16:03 -07:00
Derek Collison
b7b98df4ee Server limits and account reservations
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:07:02 -07:00
Derek Collison
dd116fcfd4 JetStream first pass basics.
This is the first checkin for JetStream. Has some rudimentary basics working.

TODO
1. Push vs pull mode for observables. (work queues)
2. Disk/File store, memory only for now.
3. clustering code - design shaping up well.
4. Finalize account import semantics.
5. Lots of other little things.

Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:06:29 -07:00
Matthias Hanel
11c0669ae2 [FIXES] Unnecessary account reloads and pointer to old accounts
Fixes #1372 by updating s.sys.account pointer.

This issue also showed that accounts are unnecessarily reloaded.
This happened because account imports were not copied and thus,
deepEqual detected a difference were none was.
This was addressed by making the copy less shallow.

Furthermore did deepEqual detects a difference when it compared
slices that were appended to while processing a map.
This was fixed by sorting before comparison.

Noticed that Account.clients stored an unnecessary pointer.
Removed duplicated code in systemAccount.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-05-11 21:51:41 -04:00
Matthias Hanel
14c716052d Making monitoring endpoints available via system services.
Available via $SYS.REQ.SERVER.%s.%s and $SYS.REQ.SERVER.PING.%s
Last token is the endpoint name.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-05-04 13:31:50 -04:00
Derek Collison
dc55356096 Have events look at whether or not a leaf is a hub, regardless of solicit
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-13 15:25:21 -07:00
Derek Collison
82f585d83a Updated to also resend leafnode connect on GW connect via first INFO
Signed-off-by: Derek Collison <derek@nats.io>
2020-04-08 09:55:19 -07:00
Derek Collison
43fbe0ffed This commit allows new servers ina supercluster to be informed of accounts with active leafnode connections.
This is needed to put those accounts into interest only mode for inbound gateway connections. Also added code
to make sure we were doing proper account tracking and would track the global account as well, which used to
be excluded.

Fixes #977

Signed-off-by: Derek Collison <derek@nats.io>
2020-04-07 16:22:15 -07:00
Matthias Hanel
6a1c3fc29b Moving inbound tracing to the caller (client.parse)
Tracing for outgoing operations is always done while
holding the client lock.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-03-04 17:31:18 -05:00
Matthias Hanel
f5bd07b36c [FIXED] trace/debug/sys_log reload will affect existing clients
Fixed #1296, by altering client state on reload

Detect a trace level change on reload and update all clients.
To avoid data races, read client.trace while holding the lock,
pass the value into functionis that trace while not holding the lock.
Delete unused client.debug.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-03-04 13:54:15 -05:00
Matthias Hanel
bf952a3807 Adding option to enable tracing the system account. (default: false)
Use sys_trace option in config file or --sys_trace on the command line

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-03-01 19:42:40 -05:00
Ivan Kozlovic
b78ca2f63b Fixes for system events
- Call flushOutbound() for SYSTEM connections
- Flush in place in internalSendLoop when sending the shutdown event
- Fix some tests:
  - missing defer client connection Close()
  - ensure subs are registered and messages received before shutdown
    of leafnode server to check disconnected event's stats.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-04 20:55:55 -07:00
Derek Collison
07ab23af0d Need return when acc not found
Signed-off-by: Derek Collison <derek@nats.io>
2019-11-17 09:24:34 -08:00
Derek Collison
7b1bea61e2 Merge pull request #1192 from nats-io/load_account
Do not fetch accounts on system events.
2019-11-16 18:33:23 -08:00
Derek Collison
093b57ed40 Do not fetch accounts on system events.
Noticed we would lookup accounts, but would also fetch them when tracking remote connections, etc.

Signed-off-by: Derek Collison <derek@nats.io>
2019-11-16 18:05:42 -08:00
Derek Collison
6ad8287bbe Introduced wildcard handling of _R_ mapped replies.
We had too much special processing, so reduced to a single wildcard
which will propagate across routes and gateways and is consistent
with gateway handling of globally routed subjects and timeouts.

Signed-off-by: Derek Collison <derek@nats.io>
2019-11-16 12:50:53 -08:00
Derek Collison
3330820502 Fixed a bug where we leaked service imports. Also prior this would have leaked subscriptions as well.
Signed-off-by: Derek Collison <derek@nats.io>
2019-11-14 13:29:17 -08:00
Ivan Kozlovic
8a8695d07c Backward compatibility with previous servers
Want to keep this commit separate so that we can easily remove
when we no longer want to support both prefixes.

- If this server receives a "$GR." message, it takes the subject
  and tries to process this locally. If there is no cluster race
  reply may be received ok (like before).
- If this server sends a routed reply, it detects if sending to
  an older server (then uses $GR.) or not (then uses $GNR)
- Gateway INFO has a new field that indicates if the server is
  using the new prefix.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-08 16:22:34 -07:00
Ivan Kozlovic
9b7dab0548 Updates based on code review
- Add atomic in client to skip check in processInboundClientMsg()
  if value is 0. Avoids getting the lock in fast path if not needed.
- Have a timer per client instead of the global server list that
  was expiring: noticed a lot of contention there when running
  some perf/profiling tests. The timer is also not reset for
  every timestamp that is not yet expired since this too affects
  performance. Instead fires are regular interval and cleared
  when map is empty after a cycle.
- Move processing of gw map rely on its own function (in inbound msg).
  I have verified that this is inlined same way as when code was
  directly in processInboundClientMsg.
- Use string(subj[]) for prefix detection: I have verified that
  it is actually faster.
- Builds the RMSG with appends to local buffer in handleGatewayReply()
  instead of using fmt.Sprintf().

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-08 15:56:28 -07:00
Ivan Kozlovic
aa843945c9 Work on Gateways reply mapping
- New prefix that includes origin server for the request
- Mapping done if request is service import or requestor has
  recent subscription
- Subscription considered recent if less than 250ms
- Destination server strip GW prefix before giving to client
  and restore when getting a reply on that subject
- Mapping removed aftert 250ms
- Server rejects client publish on "$GNR." (the new prefix)
- Cluster and server hash are now 8 chars long and from base 62
  alphabets
- Mapped replies need to be sent to leafnode servers due to race
  (cluster B sends RS+ on GW inbound then RMSG on outbound, the
  RS+ may be processed later and cluster A may have given message
  to LN before RS+ on reply subject. So LN needs to accept the
  mapped reply but will strip to give to client and reassemble
  before sending it back)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-06 16:06:49 -07:00
Derek Collison
13f217635f Wait on requestor RTT when tracking latency.
If a client RTT for a requestor is longer than a service RTT, the requestor latency was often zero.
We now wait for the RTT (if zero) before sending out the metric.

Signed-off-by: Derek Collison <derek@nats.io>
2019-10-31 08:02:45 -07:00
Ivan Kozlovic
0da1afaf88 Fixed data race
Resolves #1176

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-30 20:10:37 -06:00
Ivan Kozlovic
12eb1f5b00 [ADDED] Server name in the RouteStat for statsz
Add the remote server name for a route in the statsz event

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-25 16:34:07 -06:00
R.I.Pienaar
bcf96fa1de Allows a descriptive server_name to be set
This adds a new config option server_name that
when set will be exposed in varz, events and more
as a descriptive name for the server.

If unset though the server_name will default to the pk

Signed-off-by: R.I.Pienaar <rip@devco.net>
2019-10-17 18:51:19 +02:00
Ivan Kozlovic
150d47cab3 [FIXED] Locking issue around account lookup/updates
Ensure that lookupAccount does not hold server lock during
updateAccount and fetchAccount.
Updating the account cannot have the server lock because it is
possible that during updateAccountClaims(), clients are being
removed, which would try to get the server lock (deep down in
closeConnection/s.removeClient).
Added a test that would have show the deadlock prior to changes
in this PR.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-17 18:48:23 -06:00
Derek Collison
52430c304a System level services for debugging.
This is the first pass at introducing exported services to the system account for generally debugging of blackbox systems.
The first service reports number of subscribers for a given subject. The payload of the request is the subject, and optional queue group, and can contain wildcards.

Signed-off-by: Derek Collison <derek@nats.io>
2019-09-17 09:37:35 -07:00
Derek Collison
94f143ccce Latency tracking updates.
Will now breakout the internal NATS latency to show requestor client RTT, responder client RTT and any internal latency caused by hopping between servers, etc.

Signed-off-by: Derek Collison <derek@nats.io>
2019-09-11 16:43:19 -07:00
Ivan Kozlovic
4253b31dcf [FIXED] Circular account service import dependency
If account A imports from B and B from A, when the account A
is built, it causes B to be fetch, but since B imports from A,
A was fetch/built again in an infinite loop.

Resolves #1117

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-10 18:05:21 -06:00
Derek Collison
7989118c3f First pass latency tracking for exported services
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 10:52:48 -07:00
Guangming Wang
927991321d Cleanup: fix some typos in code comment
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-22 21:36:37 +08:00
Derek Collison
8f5bc503e5 Add ability for cross account import services to return streams as well as singeltons.
Take into account tracking of response maps that are created and do proper cleanup.
Also fixes #1089 which was discovered while working on this.

Signed-off-by: Derek Collison <derek@nats.io>
2019-08-06 14:15:40 -07:00
Ivan Kozlovic
ed1901c792 Update go.mod to satisfy v2 requirements
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-06-03 19:45:47 -06:00
Ivan Kozlovic
7f2620904c Fixed setting timer for account connection updates
The timer was not set with the proper variable, which caused the
check to always think that a new timer should be created, which
would lead to more and more timers being created which translated
to updates being sent more and more frequently.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-29 14:28:26 -06:00
Derek Collison
6584a9a828 lint updates
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-06 15:41:38 -07:00
Derek Collison
acfe372d63 Changes for rename from gnatsd -> nats-server
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-06 15:04:24 -07:00
Derek Collison
5292ec1598 Various fixes, init smap for leafnodes with gateways too
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-02 14:22:51 -07:00
Derek Collison
2ec3eaeaa9 Leafnode account based connections limits
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-25 14:40:59 -07:00
Derek Collison
bfe83aff81 Make account lookup faster with sync.Map
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-23 17:13:23 -07:00
Derek Collison
bacb73a403 First pass at leaf nodes. Basic functionality working, including gateways.
What is not completed:
1. TLS
2. config to bind local account.
3. Info updates for solicitor to track topology changes like a client.
4. CONNECT sent after INFO for nonce authroization.
5. Authorization
6. Services and Streams tests.
7. config file parsing.

Signed-off-by: Derek Collison <derek@nats.io>
2019-03-25 08:54:47 -07:00