Commit Graph

393 Commits

Author SHA1 Message Date
Ivan Kozlovic
cbbc21ac25 Some update to leafnode subscription handling
- Send all subs in place if smap is small
- Skip sending update until after sendAllLeafSubs() is done

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-30 20:01:49 -06:00
Ivan Kozlovic
17a7d0d866 [FIXED] Server should not send RTT PING before sending initial PONG
As soon as server has processed a client CONNECT, it was possible
that if Connz() or other was requested, the server will send a
PING to compute the RTT. This would cause clients that expect
the first PONG as part of synchronous CONNECT logic to fail.

Make sure that we delay the first RTT ping to after sending the
first PONG, or if client does not send PING as part of the CONNECT,
after 2 seconds have elapsed since the tcp connection was accepted.

Resolves #1174

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-30 19:50:19 -06:00
Jaime Piña
78966fbfa4 Reduce 2019-09-27 16:38:43 -07:00
Jaime Piña
64664946e7 Add QueueSubscribe permissions.
```
users = [
  {
    user: "foo", permissions: {
      sub: {
        # Allow plain subscription foo, but only v1 groups or *.dev queue groups
        allow: ["foo", "foo v1", "foo v1.>", "foo *.dev"]

        # Prevent queue subscriptions on prod groups
        deny: ["> *.prod"]
     }
  }
]
```

Signed-off-by: Jaime Piña <jaime@synadia.com>
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2019-09-27 16:08:24 -07:00
Waldemar Quevedo
d0e36f3b88 Adjust to zero negative latency values
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2019-09-20 09:24:18 -07:00
Jaime Piña
ab24cddc06 Add latency config
Currently, the config file doesn't recognize the latency config block in
account exports. This change exposes those settings in the config file.

Signed-off-by: Jaime Piña <jaime@synadia.com>
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2019-09-18 13:20:26 -07:00
Derek Collison
52430c304a System level services for debugging.
This is the first pass at introducing exported services to the system account for generally debugging of blackbox systems.
The first service reports number of subscribers for a given subject. The payload of the request is the subject, and optional queue group, and can contain wildcards.

Signed-off-by: Derek Collison <derek@nats.io>
2019-09-17 09:37:35 -07:00
Derek Collison
94f143ccce Latency tracking updates.
Will now breakout the internal NATS latency to show requestor client RTT, responder client RTT and any internal latency caused by hopping between servers, etc.

Signed-off-by: Derek Collison <derek@nats.io>
2019-09-11 16:43:19 -07:00
Ivan Kozlovic
effa30ce4a [FIXED] MaxPending > MaxInt32 causes client to be disconnected
Changed some of client.outbound fields to int64.
Moved fields around to minimize size of struct (checked with
unsafe.Sizeof())
Checked benchmark results before/after
Added test

Resolves #1118

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-11 14:29:02 -06:00
Derek Collison
67470911fe Prune remote reply tracking
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 17:35:20 -07:00
Derek Collison
bb11f7bd2d Merge pull request #1111 from nats-io/latency
Track latency for exported services
2019-08-30 11:02:36 -07:00
Derek Collison
7989118c3f First pass latency tracking for exported services
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 10:52:48 -07:00
Ivan Kozlovic
2a8973a62b Fixed flushOutbound
With Go 1.12 (strangely was not able to reproduce with Go 1.11)
the test TestRouteNoCrashOnAddingSubToRoute() would frequently
locks up and consume all avail CPUs on the machine. Running
this test with GOMAXPROCS=2 you would see server.test CPU usage
pegged at 200% (assuming you have at least 2 CPUs).
The reason was that the writeLoop was spinning because another
routine was already in flushOutbound() and stack trace would
show that it was stuck in system calls. It seems that even though
the writeLoop does release the lock but grab it right away was
not allowing the syscall to complete.

So decided to put back the unlock/gosched/lock back in flushOutbound()
when flag is already set, but then protect the closeConnection()
with its own flag (similar to clearConnection) to not re-introduce
issue fixed in #1092.

Had to fix the benchmark test RoutedInterestGraph because after a
route is accepted, the initial PING will be sent after 1sec which
was breaking this test.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-29 12:59:27 -06:00
Ivan Kozlovic
2f48ad5150 Fixed subscription close
I noticed that TestNoRaceRoutedQueueAutoUnsubscribe started to
fail a lot on Travis. Running locally I could see a 45 to 50%
failures. After investigation I realized that the issue was that
we have wrongly re-used `subscription.nm` and set to -1 on unsubscribe
however, I believe that it was possible that when subscription was
closed, the server may have already picked that consumer for a delivery
which then causes nm==-1 to be bumped to 0, which was wrong.
Commenting out the subscription.close() that sets nm to -1, I could
not get the test to fail on macOS but would still get 7% failure on
Linux VM. Adding the check to see if sub is closed in deliverMsg()
completely erase the failures, even on Linux VM.

We could still use `nm` set to -1 but check on deliverMsg(), the
same way I use the closed int32 now.

Fixed some flappers.
Updated .travis.yml to failfast if one of the command in the
`script` fails. User `set -e` and `set +e` as recommended in
https://github.com/travis-ci/travis-ci/issues/1066

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:39:23 -06:00
Waldemar Quevedo
5c776d4363 Fix typo
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2019-08-13 19:59:28 -07:00
Ivan Kozlovic
c20afd4016 [FIXED] Connection could be closed twice
This was introduced in PR#930. The first commit had the route's
check if the flushOutbound() returned false, and if so would
locally unlock/lock the connection's lock. Unfortunately, this
was replaced in the second commit (a6aeed3a6b)
to the flushOutbound() function itself.
This causes the function closeConnection() to possibly unlock
the connection while calling flushOutbound(), which if the
connection is closed due to both a tls timeout for instance
and explicitly, it would result in the connection being scheduled
for a reconnect (if explicit gateway connection, possibly route).

Added defensive code in Gateway to register a unique outbound gateway.

Fixed a test that was now failing with newer Go version in which
they fixed url.Parse()

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-13 20:11:03 -06:00
Derek Collison
8f5bc503e5 Add ability for cross account import services to return streams as well as singeltons.
Take into account tracking of response maps that are created and do proper cleanup.
Also fixes #1089 which was discovered while working on this.

Signed-off-by: Derek Collison <derek@nats.io>
2019-08-06 14:15:40 -07:00
Ivan Kozlovic
b537f130cc Use goto to remove entry from cache
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-29 20:52:57 -06:00
Ivan Kozlovic
6fd6ac2821 Update based on comments
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-29 20:38:22 -06:00
Ivan Kozlovic
887e744d07 [FIXED] Reduce memory usage on routes
When a route receives a message, it uses a thread local cache to
find the account and subscriptions match for a given subject.
When not found, an entry is added to this cache. The problem is
that this cache will reference subscriptions that in turn
reference connections.
When the subscriptions/connections are closed, this thread local
cannot be purged from those closed subscriptions (since it is
thread local - no lockin is used).
The real issue is that connection's buffer was not set to nil on
close, which then could cause more than expected memory to be
still referenced. Setting the buffer to nil will help reduce the
memory being used.
When an entry is added to the cache, the cache may reach a size
that will cause the server to prune some entries. From time to
time, the cache will be scanned to look for entries that contain
only closed subscriptions and remove those.

Resolves #1082

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-29 17:54:21 -06:00
Derek Collison
5bec08ac6a Added support for user and activation token revocation
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-28 06:49:39 -07:00
Derek Collison
8bfe14bbfd check response perms more often, make sure we limit memory growth
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-25 16:53:54 -07:00
Derek Collison
495a1a7ec3 Allow dynamic publish permissions based on reply subjects of received msgs
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-25 13:17:26 -07:00
Derek Collison
1d6c58074f Fix for #1065 (leaked subscribers from dq subs across routes)
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-22 17:17:43 -07:00
Andy Xie
2f99b144aa add ut for tracemsg 2019-07-15 14:02:02 +08:00
Derek Collison
7a3fb4ebe0 Merge pull request #1057 from andyxning/allow_limits_to_traced_message
allow limit to traced message
2019-07-14 21:34:31 -07:00
Andy Xie
cd214fca89 allow limit to traced message 2019-07-15 11:39:00 +08:00
Derek Collison
8262082289 If we read data and have an error, still process and parse data.
This is helpful for clients who send data and close the connection.
Also helpful to process errors like auth for solicited leafnodes.

Signed-off-by: Derek Collison <derek@nats.io>
2019-07-13 05:19:35 -07:00
Derek Collison
a795920dc3 Report authorization error and use TLS hostname for IPs on leafnodes.
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-12 13:57:16 -07:00
Derek Collison
951ae49100 Prevent multiple solicited leafnodes from forming cycles.
When a solicited leafnode comes from multiple servers that themselves are a cluster, cycles were formed.
This change allows solicited leafnodes to behave similar to gateways in that each server of a cluster
is expected to have a solicted leafnode per destination account and cluster.

We no longer forward subscription interest or messages to a cluster from a server that has a solicited leafnode.

Signed-off-by: Derek Collison <derek@nats.io>
2019-07-10 20:16:47 -07:00
Derek Collison
f76a6b9a5c When a bound account's maxpayload is not the same make sure we send it to clients that can do async INFO.
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-08 15:20:23 -07:00
Derek Collison
8168aa1f81 Allow sublist cache do be disabled globally
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-02 07:34:02 -07:00
Ivan Kozlovic
156511bba7 [FIXED] Check of maxpayload could be bypassed if size overruns int32
One could craft a PUB protocol to cause server to panic. This can
happen if the size in the PUB protocol overruns an int32.

(note that if authorization is enabled, the user would need to
authenticate first, limiting the impact).

Thank you to Aviv Sasson and Ariel Zelivansky from Twistlock
for the security report!

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-01 15:06:08 -06:00
Derek Collison
e11a959584 Send ping when RTT update needed
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-01 11:58:06 -07:00
Derek Collison
8a3db71ad5 Updates from comments
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-01 08:47:13 -07:00
Derek Collison
ebd4deb8b9 Stager first ping from server and suppress pings if a ping was received.
Signed-off-by: Derek Collison <derek@nats.io>
2019-06-29 15:43:15 -07:00
Derek Collison
d1a782e014 Messages not distributed evenly when sourced from leafnode.
When messages came from a leafnode there were not being distributed evenly to the destination cluster.

Signed-off-by: Derek Collison <derek@nats.io>
2019-06-11 20:37:49 -07:00
Derek Collison
2a8e630bf1 Fix for leafnode and dq selection over GWs
Signed-off-by: Derek Collison <derek@nats.io>
2019-06-01 16:43:54 -07:00
Derek Collison
adba6dc023 Add in leafnode bound account events for accounting
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-31 16:58:27 -07:00
Derek Collison
3cf6f6a5d2 Bug fix for service import with leafnodes and gws
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-31 11:22:02 -07:00
Derek Collison
42a7797a50 Add chunk and total bytes to slow consumer log
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-30 09:15:20 -07:00
Ivan Kozlovic
33505e4849 Print warning if code in readloop execute for more than threshold
Issue a warning in readLoop if execution of code after connection
Read() until end of for loop reaches a certain threshold.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-28 18:33:48 -06:00
Derek Collison
67bb08af8b Fixes for a few flappers.
TestJWTAccountImportActivationExpires
TestGatewayServiceImportWithQueue

Signed-off-by: Derek Collison <derek@nats.io>
2019-05-21 15:12:31 -07:00
Derek Collison
6584a9a828 lint updates
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-06 15:41:38 -07:00
Derek Collison
5292ec1598 Various fixes, init smap for leafnodes with gateways too
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-02 14:22:51 -07:00
Derek Collison
1d736ccc61 Make sure we use correct MSG prefix when mixing between leafnodes and routes.
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-01 15:08:20 -07:00
Derek Collison
17839518de Updates based on PR feedback
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-25 15:47:35 -07:00
Derek Collison
2ec3eaeaa9 Leafnode account based connections limits
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-25 14:40:59 -07:00
Derek Collison
f320f318b7 Fixed merge conflict
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-23 17:28:42 -07:00
Derek Collison
bfe83aff81 Make account lookup faster with sync.Map
Signed-off-by: Derek Collison <derek@nats.io>
2019-04-23 17:13:23 -07:00