nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-15 10:40:41 -07:00

Author	SHA1	Message	Date
Derek Collison	75ae7c6032	When an account asked for connz should be client and leaf connections only by default. Signed-off-by: Derek Collison <derek@nats.io>	2021-08-15 11:04:23 -07:00
Derek Collison	f07a86c6db	Merge branch 'main' into acc-connz Signed-off-by: Derek Collison <derek@nats.io>	2021-08-14 18:13:43 -07:00
Derek Collison	cdb5a56329	Fix for flapping test Signed-off-by: Derek Collison <derek@nats.io>	2021-08-14 15:26:27 -07:00
Derek Collison	14572b080b	Fixed and moved large purge test to no race Signed-off-by: Derek Collison <derek@nats.io>	2021-08-14 13:07:46 -07:00
Derek Collison	10167b1bcf	Added in ability for normal accounts to access scoped connz info. Added in client kind and sub type for clients. Added in ability to filter connections based on matching subject interest. Signed-off-by: Derek Collison <derek@nats.io>	2021-08-13 10:19:12 -07:00
Matthias Hanel	d6de19c649	Made test more predictable by waiting for leader after leader shutdown Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-08-12 12:34:15 -04:00
Derek Collison	c9875e09a0	Fix for a flapper Signed-off-by: Derek Collison <derek@nats.io>	2021-08-12 06:13:47 -07:00
Derek Collison	29536629eb	Simplified flow control, avoid stalls due to msg loss Signed-off-by: Derek Collison <derek@nats.io>	2021-08-09 20:13:17 -07:00
Derek Collison	4e92b0ed6e	When a server was restarting, if a stream had a MaxAge and there were a very large amount of messages to expire, this would take too long. During normal operation and quick restarts the number of expired messages per cycle is manageable and correct. However if a server is shutdown for quite a long time and many messages have expired this process is too slow. This commit introduces an optimized expiration tailored for startup vs running state. Signed-off-by: Derek Collison <derek@nats.io>	2021-07-30 12:48:47 -07:00
Derek Collison	f13fa767c2	Remove the swapping of accounts during processing of service imports. When processing service imports we would swap out the accounts during processing. With the addition of internal subscriptions and internal clients publishing in JetStream we had an issue with the wrong account being used. This was specific to delyaed pull subscribers trying to unsubscribe due to max of 1 while other JetStream API calls were running concurrently.	2021-07-26 07:57:10 -07:00
Derek Collison	6eef31c0fc	Fixed peer info reports that had large last active values. Also put in safety for lag going upside down as well. Signed-off-by: Derek Collison <derek@nats.io>	2021-07-06 10:14:43 -07:00
Derek Collison	960c45df81	Use of sync.Pool for filestore could cause msg corruption. Signed-off-by: Derek Collison <derek@nats.io>	2021-07-06 08:41:01 -07:00
Derek Collison	63479ff8fd	Bump threshold Signed-off-by: Derek Collison <derek@nats.io>	2021-06-27 08:33:46 -07:00
Derek Collison	a27f198b83	Skip for now, covermode blows up memory and latency thresholds Signed-off-by: Derek Collison <derek@nats.io>	2021-06-23 13:50:14 -07:00
Derek Collison	225c8b4a85	Bump threshold Signed-off-by: Derek Collison <derek@nats.io>	2021-06-22 17:44:19 -07:00
Derek Collison	b3753aba1b	Improvements to filtered purge and general memory use for filestore. We optimized the filtered purge to skip msgBlks that are not in play. Also optimized msgBlock buffer usage by using two sync.Pools to enhance reuse. Signed-off-by: Derek Collison <derek@nats.io>	2021-06-22 15:47:26 -07:00
R.I.Pienaar	c6b85fd101	update for review Signed-off-by: R.I.Pienaar <rip@devco.net>	2021-06-22 08:47:08 +02:00
R.I.Pienaar	c9bf329a99	test to show slow purges Signed-off-by: R.I.Pienaar <rip@devco.net>	2021-06-21 17:01:49 +02:00
Derek Collison	6219f0381d	Test rename for no race versions Signed-off-by: Derek Collison <derek@nats.io>	2021-06-15 09:41:11 -07:00
Derek Collison	d9a0ff904c	Bump timeout threshold Signed-off-by: Derek Collison <derek@nats.io>	2021-06-15 08:53:11 -07:00
Derek Collison	08cdb2d2ea	Make filtered consumers in large mixed streams more efficient. Allow wider scoped filtered subjects. We introduce a per subject information tracking to filestore to optimize for large mux'd streams and more efficient filtered consumers. Signed-off-by: Derek Collison <derek@nats.io>	2021-06-15 04:44:05 -07:00
Derek Collison	820c76d3c8	Fix flapper Signed-off-by: Derek Collison <derek@nats.io>	2021-04-19 11:43:43 -07:00
Derek Collison	946335d62f	Increase clients and runtime Signed-off-by: Derek Collison <derek@nats.io>	2021-04-16 14:18:40 -07:00
Derek Collison	d7641b9d38	Move test to norace Signed-off-by: Derek Collison <derek@nats.io>	2021-04-16 14:00:11 -07:00
Derek Collison	adba4fde5a	Add large stress test, skipped by default Signed-off-by: Derek Collison <derek@nats.io>	2021-04-16 13:58:32 -07:00
Derek Collison	395728bab9	Allow control messages like heartbeats to pass the old sub test. Signed-off-by: Derek Collison <derek@nats.io>	2021-04-14 14:11:02 -07:00
Jaime Piña	d929ee1348	Check errors when removing test directories and files Currently in tests, we have calls to os.Remove and os.RemoveAll where we don't check the returned error. This hides useful error messages when tests fail to run, such as "too many open files". This change checks for more filesystem related errors and calls t.Fatal if there is an error.	2021-04-07 11:09:47 -07:00
Jaime Piña	6941bb3ade	Update Go client in tests	2021-03-30 13:17:34 -07:00
Derek Collison	2ed53035ed	Reworked flow control for sources and mirrors. Signed-off-by: Derek Collison <derek@nats.io>	2021-03-24 07:07:33 -07:00
Matthias Hanel	b316cccfd1	Fixed a quorum formation issue that caused truncation When a new leader is elected it has to give everyone a chance to reply, so that we can observe rejections with higher term. The maximum election timeout is 7.5 seconds. The new behavior of waiting for the election timeout caused unit tests to fail. Hence upping the timeout there as well. Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-03-11 19:44:47 -05:00
Derek Collison	2b2a776411	Disable flaky tests for now Signed-off-by: Derek Collison <derek@nats.io>	2021-03-11 07:11:05 -05:00
Ivan Kozlovic	e7e756034a	Switch Gateway JS accounts to interest-only mode + some other fixes - Fixed the close of a TLS connection which starting Go 1.16 set the deadline to 5 seconds. - Fixed an issue with setHeader that was causing these error messages ``` === RUN TestServiceImportReplyMatchCycleMultiHops nats: message could not decode headers on connection [4] for subscription on "foo" --- PASS: TestServiceImportReplyMatchCycleMultiHops (0.04s) ``` - Fixed names of tests in norace_test.go since they must start with TestNoRace in order to make sure that we execute them in Travis: ``` go test -v -run=TestNoRace --failfast -p=1 ./... ``` Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-03-03 19:15:28 -07:00
Derek Collison	401484299d	Flaps with cluster size of 5 too much Signed-off-by: Derek Collison <derek@nats.io>	2021-03-03 06:34:07 -08:00
Derek Collison	09e3d26fa3	Add in support for stream mirrors and sources. Add in proper support for stream updates in clustered mode. Don't send API updates without subjects, caused GW parser errors. Stream internal loops use their own clients now. Signed-off-by: Derek Collison <derek@nats.io>	2021-02-23 10:57:27 -08:00
Derek Collison	c16f6e193d	Move JetStream direct APIs to private. Signed-off-by: Derek Collison <derek@nats.io>	2021-02-07 15:19:22 -08:00
Ivan Kozlovic	46a4969813	Moved test to ones run without `-race` and cap number of conns Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-10-22 10:11:16 -06:00
Ivan Kozlovic	0c804f5ffb	Moving TestQueueAutoUnsubscribe to norace_test.go This test has been found to cause TestAccountNATSResolverFetch to fail on macOS. We did not find the exact reason yet, but it seem that with `-race`, the queue auto-unsub test (that creates 2,000 queue subs and sends 1,000 messages) cause mem to grow to 256MB (which we know -race is memory hungry) and that may be causing interactions with the account resolver test. For now, moving it to norace_test.go, which consumes much less memory (25MB) and anyway is a better place since it would stress better the "races" of having a queue sub being unsubscribed while messages were inflight to this queue sub. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-09-29 18:06:16 -06:00
Ivan Kozlovic	22833c8d1a	Fix sysSubscribe races Made changes to processSub() to accept subscription properties, including the icb callback so that it is set prior to add the subscription to the account's sublist, which prevent races. Fixed some other racy conditions, notably in addServiceImportSub() Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-08-03 14:59:00 -06:00
Ivan Kozlovic	96ccf91566	[FIXED] Possible deadlock with solicited leafnodes when cluster conflict We cannot call c.closeConnection() under the server lock because closeConnection() can invoke server lock in some cases. Created a test that should run without `-race` to reproduce the deadlock (which it does) but sometimes would fail because cluster would not be formed. This unconvered an issue with conflict resolution which test TestRouteClusterNameConflictBetweenStaticAndDynamic() can reproduce easily. The issue was that we were not updating a dynamic name with the remote if the remote was non dynamic. Resolves #1543 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-30 18:45:36 -06:00
aricart	e7590f3065	jwt2 testbed	2020-06-01 18:00:13 -04:00
Derek Collison	2bd7553c71	System Account on by default. Most of the changes are to turn it off for tests that were watching subscriptions and such. Signed-off-by: Derek Collison <derek@nats.io>	2020-05-29 17:56:45 -07:00
Ivan Kozlovic	d20efffccb	Fix TestNoRaceRouteCache test Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-05-25 06:58:23 -07:00
Ivan Kozlovic	1b2754475b	Refactor async client tests Updated all tests that use "async" clients. - start the writeLoop (this is in preparation for changes in the server that will not do send-in-place for some protocols, such as PING, etc..) - Added missing defers in several tests - fixed an issue in client.go where test was wrong possibly causing a panic. - Had to skip a test for now since it would fail without server code change. The next step will be ensure that all protocols are sent through the writeLoop and that the data is properly flushed on close (important for -ERR for instance). Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-12-12 11:58:24 -07:00
Ivan Kozlovic	0bfd03091b	Clean tmp accounts map when race gets duplicate Added check to the test to ensure that tmp map is empty. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-16 18:14:23 -07:00
Ivan Kozlovic	3e1728d623	[FIXED] Some accounts locking issues - Risk of deadlock when checking if issuer claim are trusted. There was a RLock() in one thread, then a request for Lock() in another that was waiting for RLock() to return, but the first thread was then doing RLock() which was not acquired because this was blocked by the Lock() request (see `e2160cc571`) - Use proper account/locking mode when checking if stream/service exports/signer have changed. - Account registration race (regression from https://github.com/nats-io/nats-server/pull/890) - Move test from #890 to "no race" test since only then could it detect the double registration. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-16 16:59:38 -07:00
Ivan Kozlovic	aa843945c9	Work on Gateways reply mapping - New prefix that includes origin server for the request - Mapping done if request is service import or requestor has recent subscription - Subscription considered recent if less than 250ms - Destination server strip GW prefix before giving to client and restore when getting a reply on that subject - Mapping removed aftert 250ms - Server rejects client publish on "$GNR." (the new prefix) - Cluster and server hash are now 8 chars long and from base 62 alphabets - Mapped replies need to be sent to leafnode servers due to race (cluster B sends RS+ on GW inbound then RMSG on outbound, the RS+ may be processed later and cluster A may have given message to LN before RS+ on reply subject. So LN needs to accept the mapped reply but will strip to give to client and reassemble before sending it back) Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-06 16:06:49 -07:00
Ivan Kozlovic	2f48ad5150	Fixed subscription close I noticed that TestNoRaceRoutedQueueAutoUnsubscribe started to fail a lot on Travis. Running locally I could see a 45 to 50% failures. After investigation I realized that the issue was that we have wrongly re-used `subscription.nm` and set to -1 on unsubscribe however, I believe that it was possible that when subscription was closed, the server may have already picked that consumer for a delivery which then causes nm==-1 to be bumped to 0, which was wrong. Commenting out the subscription.close() that sets nm to -1, I could not get the test to fail on macOS but would still get 7% failure on Linux VM. Adding the check to see if sub is closed in deliverMsg() completely erase the failures, even on Linux VM. We could still use `nm` set to -1 but check on deliverMsg(), the same way I use the closed int32 now. Fixed some flappers. Updated .travis.yml to failfast if one of the command in the `script` fails. User `set -e` and `set +e` as recommended in https://github.com/travis-ci/travis-ci/issues/1066 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-20 14:39:23 -06:00
Ivan Kozlovic	07e3db6b8e	Prepare for v2.0.4 with goreleaser Also fixed some flappers Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-15 09:06:56 -06:00
Ivan Kozlovic	6fd6ac2821	Update based on comments Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-07-29 20:38:22 -06:00
Ivan Kozlovic	887e744d07	[FIXED] Reduce memory usage on routes When a route receives a message, it uses a thread local cache to find the account and subscriptions match for a given subject. When not found, an entry is added to this cache. The problem is that this cache will reference subscriptions that in turn reference connections. When the subscriptions/connections are closed, this thread local cannot be purged from those closed subscriptions (since it is thread local - no lockin is used). The real issue is that connection's buffer was not set to nil on close, which then could cause more than expected memory to be still referenced. Setting the buffer to nil will help reduce the memory being used. When an entry is added to the cache, the cache may reach a size that will cause the server to prune some entries. From time to time, the cache will be scanned to look for entries that contain only closed subscriptions and remove those. Resolves #1082 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-07-29 17:54:21 -06:00

1 2

58 Commits