Commit Graph

16 Commits

Author SHA1 Message Date
Ivan Kozlovic
1b2754475b Refactor async client tests
Updated all tests that use "async" clients.
- start the writeLoop (this is in preparation for changes in the
  server that will not do send-in-place for some protocols, such
  as PING, etc..)
- Added missing defers in several tests
- fixed an issue in client.go where test was wrong possibly causing
  a panic.
- Had to skip a test for now since it would fail without server code
  change.

The next step will be ensure that all protocols are sent through
the writeLoop and that the data is properly flushed on close (important
for -ERR for instance).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-12 11:58:24 -07:00
Ivan Kozlovic
0bfd03091b Clean tmp accounts map when race gets duplicate
Added check to the test to ensure that tmp map is empty.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-16 18:14:23 -07:00
Ivan Kozlovic
3e1728d623 [FIXED] Some accounts locking issues
- Risk of deadlock when checking if issuer claim are trusted. There
  was a RLock() in one thread, then a request for Lock() in another
  that was waiting for RLock() to return, but the first thread was
  then doing RLock() which was not acquired because this was blocked
  by the Lock() request (see e2160cc571)

- Use proper account/locking mode when checking if stream/service
  exports/signer have changed.

- Account registration race (regression from https://github.com/nats-io/nats-server/pull/890)

- Move test from #890 to "no race" test since only then could it detect
  the double registration.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-16 16:59:38 -07:00
Ivan Kozlovic
aa843945c9 Work on Gateways reply mapping
- New prefix that includes origin server for the request
- Mapping done if request is service import or requestor has
  recent subscription
- Subscription considered recent if less than 250ms
- Destination server strip GW prefix before giving to client
  and restore when getting a reply on that subject
- Mapping removed aftert 250ms
- Server rejects client publish on "$GNR." (the new prefix)
- Cluster and server hash are now 8 chars long and from base 62
  alphabets
- Mapped replies need to be sent to leafnode servers due to race
  (cluster B sends RS+ on GW inbound then RMSG on outbound, the
  RS+ may be processed later and cluster A may have given message
  to LN before RS+ on reply subject. So LN needs to accept the
  mapped reply but will strip to give to client and reassemble
  before sending it back)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-11-06 16:06:49 -07:00
Ivan Kozlovic
2f48ad5150 Fixed subscription close
I noticed that TestNoRaceRoutedQueueAutoUnsubscribe started to
fail a lot on Travis. Running locally I could see a 45 to 50%
failures. After investigation I realized that the issue was that
we have wrongly re-used `subscription.nm` and set to -1 on unsubscribe
however, I believe that it was possible that when subscription was
closed, the server may have already picked that consumer for a delivery
which then causes nm==-1 to be bumped to 0, which was wrong.
Commenting out the subscription.close() that sets nm to -1, I could
not get the test to fail on macOS but would still get 7% failure on
Linux VM. Adding the check to see if sub is closed in deliverMsg()
completely erase the failures, even on Linux VM.

We could still use `nm` set to -1 but check on deliverMsg(), the
same way I use the closed int32 now.

Fixed some flappers.
Updated .travis.yml to failfast if one of the command in the
`script` fails. User `set -e` and `set +e` as recommended in
https://github.com/travis-ci/travis-ci/issues/1066

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:39:23 -06:00
Ivan Kozlovic
07e3db6b8e Prepare for v2.0.4 with goreleaser
Also fixed some flappers

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 09:06:56 -06:00
Ivan Kozlovic
6fd6ac2821 Update based on comments
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-29 20:38:22 -06:00
Ivan Kozlovic
887e744d07 [FIXED] Reduce memory usage on routes
When a route receives a message, it uses a thread local cache to
find the account and subscriptions match for a given subject.
When not found, an entry is added to this cache. The problem is
that this cache will reference subscriptions that in turn
reference connections.
When the subscriptions/connections are closed, this thread local
cannot be purged from those closed subscriptions (since it is
thread local - no lockin is used).
The real issue is that connection's buffer was not set to nil on
close, which then could cause more than expected memory to be
still referenced. Setting the buffer to nil will help reduce the
memory being used.
When an entry is added to the cache, the cache may reach a size
that will cause the server to prune some entries. From time to
time, the cache will be scanned to look for entries that contain
only closed subscriptions and remove those.

Resolves #1082

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-29 17:54:21 -06:00
Ivan Kozlovic
5478eaf01e Added /gatewayz endpoint
Such endpoint will list the gateway/cluster name, address and port
then list of outbound/inbound connections.
For each remote gateway there will be at most one outbound connection.
There can be 0 or more inbound connections for the same remote
gateway.

For each of these outbound/inbound connection, the connection info
similar to Connz is reported. Optionally, one can include the
interest mode/stats for each account.

Here are possible options:

* No specific options

http://host:port/gatewayz

* Limit to specific remote gateway, say name "B":

http://host:port/gatewayz/gw_name=B

* Include accounts (default limit to 1024 accounts)

http://host:port/gatewayz/accs=1

* Specific limit, say 200 (note accs=1 in this case is optional)

http://host:port/gatewayz/accs=1&accs_limit=200

* Specific account, say "acc_1". Note that accs=1 is not required then

http://host:port/gatewayz/acc_name=acc_1

* Above options can be mixed: specific remote gateway (B), with 100
  accounts reported

http://host:port/gatewayz/gw_name=B&accs_limit=200

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-28 12:41:09 -06:00
Derek Collison
ecfd1a2c85 Max flapper less so
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-21 11:59:34 -07:00
Derek Collison
d7140a0fd1 Update for client rename
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-10 15:11:30 -07:00
Ivan Kozlovic
bb4e8ae0f9 Gateways: Fix race for request reply
This addresses the following race:
- client connection creates a subscription on a reply subject
- client connection sends a request
- server sends the subscription to inbound gateway
- server sends the message to outbound gateway (those may be
  to different servers)
- receiving server sends to sub interested in request subject
- app sends reply
- its server then check for interest on the reply's subject

In interestOnly mode, there is a possibility that this server
has not received the interest on the reply subject yet and would
then drop the reply.

This PR detects above scenario and will prefix the reply subject
to identify the origin cluster if it is detected that the last
subscription from the sending connection was created less than
a second ago.
Once the destination has this prefix, the destination cluster
will always send back that message to origin cluster even if
there is no registered interest.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-04-22 20:00:21 -06:00
Ivan Kozlovic
1817b354e3 Update tests on Travis with tweaked GC settings
Moved some tests to "no race" tests that are run separately.
Removing -v and adding -p=1.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-11-08 16:56:20 -07:00
Derek Collison
85c2edc314 Make sure to flush the sub
Signed-off-by: Derek Collison <derek@nats.io>
2018-07-02 12:10:17 -07:00
Derek Collison
bd972a9aca fixes
Signed-off-by: Derek Collison <derek@nats.io>
2018-07-02 11:46:40 -07:00
Ivan Kozlovic
9259da2d3d Moved some tests to test files that compile only in no race mode
Some tests consume too much memory when running with -race which
can cause some failures on Travis.
Moreover, those tests may not be meaningful if they are running
slow, which -race causes.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-06-26 18:52:56 -06:00