Commit Graph

403 Commits

Author SHA1 Message Date
Derek Collison
7b1bea61e2 Merge pull request #1192 from nats-io/load_account
Do not fetch accounts on system events.
2019-11-16 18:33:23 -08:00
Derek Collison
f60266bc2e Merge pull request #1190 from nats-io/import_reply
Introduced wildcard handling of _R_ mapped replies.
2019-11-16 18:07:18 -08:00
Derek Collison
093b57ed40 Do not fetch accounts on system events.
Noticed we would lookup accounts, but would also fetch them when tracking remote connections, etc.

Signed-off-by: Derek Collison <derek@nats.io>
2019-11-16 18:05:42 -08:00
Ivan Kozlovic
3e1728d623 [FIXED] Some accounts locking issues
- Risk of deadlock when checking if issuer claim are trusted. There
  was a RLock() in one thread, then a request for Lock() in another
  that was waiting for RLock() to return, but the first thread was
  then doing RLock() which was not acquired because this was blocked
  by the Lock() request (see e2160cc571)

- Use proper account/locking mode when checking if stream/service
  exports/signer have changed.

- Account registration race (regression from https://github.com/nats-io/nats-server/pull/890)

- Move test from #890 to "no race" test since only then could it detect
  the double registration.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-16 16:59:38 -07:00
Derek Collison
6ad8287bbe Introduced wildcard handling of _R_ mapped replies.
We had too much special processing, so reduced to a single wildcard
which will propagate across routes and gateways and is consistent
with gateway handling of globally routed subjects and timeouts.

Signed-off-by: Derek Collison <derek@nats.io>
2019-11-16 12:50:53 -08:00
Ivan Kozlovic
bdf5cf63b3 Shutdown on Ctrl+C
Changed code on Windows to not use svc code if running in interactive
mode. The original code was running svc.debug.Run() which uses service
code (Execute()) but from the command line. We don't need that.

Also reduced salt on bcrypt password for a config file that started
to cause failures due to test taking too long to finish.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-14 20:05:32 -07:00
Derek Collison
3330820502 Fixed a bug where we leaked service imports. Also prior this would have leaked subscriptions as well.
Signed-off-by: Derek Collison <derek@nats.io>
2019-11-14 13:29:17 -08:00
Ivan Kozlovic
3e5ede1d64 Relax check on reserved GW prefix for system clients
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-11 17:43:14 -07:00
Ivan Kozlovic
aa843945c9 Work on Gateways reply mapping
- New prefix that includes origin server for the request
- Mapping done if request is service import or requestor has
  recent subscription
- Subscription considered recent if less than 250ms
- Destination server strip GW prefix before giving to client
  and restore when getting a reply on that subject
- Mapping removed aftert 250ms
- Server rejects client publish on "$GNR." (the new prefix)
- Cluster and server hash are now 8 chars long and from base 62
  alphabets
- Mapped replies need to be sent to leafnode servers due to race
  (cluster B sends RS+ on GW inbound then RMSG on outbound, the
  RS+ may be processed later and cluster A may have given message
  to LN before RS+ on reply subject. So LN needs to accept the
  mapped reply but will strip to give to client and reassemble
  before sending it back)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-06 16:06:49 -07:00
Derek Collison
8a69c5cb71 Updates to benchmarks
Allow disabling of short first ping timer for clients.
Adjust names so that full test suite results are aligned.
Removed the account lookup, we use sync.Map but also a no-lock cache.

Signed-off-by: Derek Collison <derek@nats.io>
2019-11-02 08:04:22 -07:00
Derek Collison
f0f807f99a After speaking with Ivan we are taking a better approach for initial RTT.
Ivan had the idea of using the CONNECT to establish a first estimate of RTT
without additional PING/PONGs.

Signed-off-by: Derek Collison <derek@nats.io>
2019-10-31 14:01:55 -07:00
Derek Collison
13f217635f Wait on requestor RTT when tracking latency.
If a client RTT for a requestor is longer than a service RTT, the requestor latency was often zero.
We now wait for the RTT (if zero) before sending out the metric.

Signed-off-by: Derek Collison <derek@nats.io>
2019-10-31 08:02:45 -07:00
Ivan Kozlovic
cbbc21ac25 Some update to leafnode subscription handling
- Send all subs in place if smap is small
- Skip sending update until after sendAllLeafSubs() is done

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-30 20:01:49 -06:00
Ivan Kozlovic
fe27aec1dc Merge pull request #1170 from nats-io/fix_detect_leafnode_loop
[FIXED] Detect loop between LeafNode servers
2019-10-29 18:35:20 -06:00
Ivan Kozlovic
e3009ffb6e Fix latency test flapper.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-29 17:03:48 -06:00
Ivan Kozlovic
279cab2aaf [FIXED] Detect loop between LeafNode servers
This is achieved by subscribing to a unique subject. If the LS+
protocol is coming back for the same subject on the same account,
then this indicates a loop.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-29 16:14:35 -06:00
Derek Collison
35758ef7d4 Update the test CA and certs.
Expiration is now Oct 14 14:30:41 2029 GMT

Signed-off-by: Derek Collison <derek@nats.io>
2019-10-17 07:33:08 -07:00
Derek Collison
7cb6056a94 Account support for Connz and user or account filtering
1. Accounts will show up in connection info if auth=1.
2. You can filter by user (?auth=1&user=ivan) or account (?auth=1&acc=eng)

Signed-off-by: Derek Collison <derek@nats.io>
2019-10-11 10:22:08 -07:00
Waldemar Quevedo
d44b0dec51 Merge pull request #1136 from nats-io/svc-latency-values
Adjust to zero negative latency values
2019-09-20 11:39:33 -05:00
Waldemar Quevedo
d0e36f3b88 Adjust to zero negative latency values
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2019-09-20 09:24:18 -07:00
Derek Collison
7fe47ace2b Make sure to turn latency on with a claim update
Signed-off-by: Derek Collison <derek@nats.io>
2019-09-19 14:20:35 -07:00
Derek Collison
0551371b31 Add in JWT support for tracking latency
Signed-off-by: Derek Collison <derek@nats.io>
2019-09-18 08:51:43 -07:00
Derek Collison
b98b75b166 Merge pull request #1127 from nats-io/sysdebug
System level services for debugging.
2019-09-17 09:45:53 -07:00
Derek Collison
52430c304a System level services for debugging.
This is the first pass at introducing exported services to the system account for generally debugging of blackbox systems.
The first service reports number of subscribers for a given subject. The payload of the request is the subject, and optional queue group, and can contain wildcards.

Signed-off-by: Derek Collison <derek@nats.io>
2019-09-17 09:37:35 -07:00
Ivan Kozlovic
15201a19cd Fixed a lock inversion issue with account
In updateRouteSubscriptionMap(), when a queue sub is added/removed,
the code locks the account and then the route to send the update.
However, when a route is accepted and the subs are sent, the
opposite (locking wise) occurs. The route is locked, then the account.

This lock inversion is possible because a route is registered (added
to the server's map) and then the subs are sent.

Use a special lock to protect the send, but don't hold the acc.mu
lock while getting the route's lock.

The tests that were created for the original missed queue updates
issue, namely TestClusterLeaksSubscriptions() and
TestQueueSubWeightOrderMultipleConnections() pass with this change.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-13 14:30:00 -06:00
Derek Collison
94f143ccce Latency tracking updates.
Will now breakout the internal NATS latency to show requestor client RTT, responder client RTT and any internal latency caused by hopping between servers, etc.

Signed-off-by: Derek Collison <derek@nats.io>
2019-09-11 16:43:19 -07:00
Derek Collison
bb11f7bd2d Merge pull request #1111 from nats-io/latency
Track latency for exported services
2019-08-30 11:02:36 -07:00
Derek Collison
7989118c3f First pass latency tracking for exported services
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 10:52:48 -07:00
Ivan Kozlovic
2a8973a62b Fixed flushOutbound
With Go 1.12 (strangely was not able to reproduce with Go 1.11)
the test TestRouteNoCrashOnAddingSubToRoute() would frequently
locks up and consume all avail CPUs on the machine. Running
this test with GOMAXPROCS=2 you would see server.test CPU usage
pegged at 200% (assuming you have at least 2 CPUs).
The reason was that the writeLoop was spinning because another
routine was already in flushOutbound() and stack trace would
show that it was stuck in system calls. It seems that even though
the writeLoop does release the lock but grab it right away was
not allowing the syscall to complete.

So decided to put back the unlock/gosched/lock back in flushOutbound()
when flag is already set, but then protect the closeConnection()
with its own flag (similar to clearConnection) to not re-introduce
issue fixed in #1092.

Had to fix the benchmark test RoutedInterestGraph because after a
route is accepted, the initial PING will be sent after 1sec which
was breaking this test.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-29 12:59:27 -06:00
Ivan Kozlovic
90d592e163 Leaf and Route RTT
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).

For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 09:34:17 -06:00
Ivan Kozlovic
7ca8723942 [FIXED] Some Leafnode issues
- On startup, verify that local account in leafnode (if specified
  can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
  Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
  solicited Leafnode connection to not use user/password when
  trying an URL that was discovered through gossip. The server
  now saves the credentials of a configured URL to use with
  the discovered ones.

Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-23 14:08:07 -06:00
Ivan Kozlovic
2959b982ea Merge pull request #1101 from nats-io/route_rtt
[ADDED] RTT in routez's route info
2019-08-20 17:23:18 -06:00
Ivan Kozlovic
77c63dbce1 Fix flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 17:07:22 -06:00
Ivan Kozlovic
89dd13f134 [ADDED] RTT in routez's route info
Added the RTT field to each route reported in routez.
Ensure that when a route is accepted, we send a PING to compute
the first RTT and don't have to wait for the ping timer to fire.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:16:07 -06:00
Ivan Kozlovic
e230e7fde9 Attempt at fixing flapper again
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 09:06:56 -06:00
Ivan Kozlovic
fc8087daa7 Updates based on comments
- add sha256 algo
- move some mem hungry tests while running with -race to the norace
- remove GOGC=10

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 09:06:56 -06:00
Ivan Kozlovic
07e3db6b8e Prepare for v2.0.4 with goreleaser
Also fixed some flappers

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 09:06:56 -06:00
Derek Collison
507432648b flapper
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-28 07:10:37 -07:00
Derek Collison
8bfe14bbfd check response perms more often, make sure we limit memory growth
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-25 16:53:54 -07:00
Derek Collison
495a1a7ec3 Allow dynamic publish permissions based on reply subjects of received msgs
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-25 13:17:26 -07:00
Derek Collison
1d6c58074f Fix for #1065 (leaked subscribers from dq subs across routes)
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-22 17:17:43 -07:00
Alberto Ricart
273e5af0a8 Fixed an issue where the leaf authentication was not checking for account/signers, so user JWTs signed by a signer failed authentication. 2019-07-17 16:03:55 -04:00
Ivan Kozlovic
0873b46f67 [FIXED] LeafNode urls may be missing in INFO sent to LN connections
When a cluster of servers are having routes to each other, there
is a chance that the list of leafnode URLs maintained on each
server is not complete. This would result in LN servers connecting
to this cluster to not get the full list of possible URLs the
server could reconnect to.

Also fixed a DATA RACE that appeared when running the updated
TestLeafNodeInfoURLs test. Fixed the race and added specific
test that easily demonstrated the race: TestLeafNodeNoRaceGeneratingNonce

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-12 19:15:30 -06:00
Derek Collison
18a2c357e4 Merge pull request #1072 from nats-io/handshake
Report authorization error and use TLS hostname for IPs on leafnodes.
2019-07-12 14:11:53 -07:00
Derek Collison
a795920dc3 Report authorization error and use TLS hostname for IPs on leafnodes.
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-12 13:57:16 -07:00
Ivan Kozlovic
37d08a6c56 [FIXED] Allow TLS InsecureSkipVerify again
This has an effect only on connections created by the server,
so routes and gateways (explicit and implicit).
Make sure that an explicit warning is printed if the insecure
property is set, but otherwise allow it.

Resolves #1062

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-12 12:10:28 -06:00
Derek Collison
951ae49100 Prevent multiple solicited leafnodes from forming cycles.
When a solicited leafnode comes from multiple servers that themselves are a cluster, cycles were formed.
This change allows solicited leafnodes to behave similar to gateways in that each server of a cluster
is expected to have a solicted leafnode per destination account and cluster.

We no longer forward subscription interest or messages to a cluster from a server that has a solicited leafnode.

Signed-off-by: Derek Collison <derek@nats.io>
2019-07-10 20:16:47 -07:00
Derek Collison
10d4f1ab7a Convert leafnode solicited remotes to array
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-10 11:53:34 -07:00
Derek Collison
a61d32a82c Test for staggered leafnodes and sub/pub. Verifies fix for #1066
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-10 09:57:43 -07:00
Derek Collison
074c87d49e Merge pull request #1060 from nats-io/gr
Make sure we route  responses across leafnodes
2019-07-08 17:07:57 -07:00