Commit Graph

74 Commits

Author SHA1 Message Date
Ivan Kozlovic
7b25755980 Adjust timing
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-09-23 09:56:42 -06:00
Ivan Kozlovic
23e8dc9902 Fix corrupt wal test that was flapping
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-09-23 09:36:46 -06:00
Derek Collison
ebb24006c2 Direct consumers used for mirroring should not be affected by max consumer limits
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-22 15:01:51 -07:00
Derek Collison
eab45b404a Fix for deadlock with stream mirrors or sources where origin is interest or workqueue policy.
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-22 10:59:02 -07:00
Derek Collison
052bb7ca54 Merge to fix conflicts
Signed-off-by: Derek Collison <derek@nats.io>
2021-09-21 08:16:51 -07:00
Derek Collison
de851e513f Fix for #2548
Replicated durable consumers that were backed by a memory store were bypassing snapshotting which also did compaction of the raft WAL.
This change adapts for memory store backed consumers by compacting the raft WAL directly on snapshot logic.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-21 08:02:11 -07:00
Derek Collison
63c242843c Avoid panic if WAL was truncated out from underneath of us.
If we were leader stepdown as well.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-21 07:26:03 -07:00
Derek Collison
12bb46032c Fix RAFT WAL repair.
When we stored a message in the raft layer in a wrong position (state corrupt), we would panic, leaving the message there.
On restart we would truncate the WAL and try to repair, but we truncated to the wrong index of the bad entry.

This change also includes additional changes to truncateWAL and also reduces the conditional for panic on storeMsg.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-20 18:41:37 -07:00
Derek Collison
620b56e12f During compaction the cache may not be loaded completely if msg block was lmb (active writing).
This could lead to the filtered subject state being incorrect.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-13 14:36:50 -07:00
Derek Collison
dadc3b9fae Fixed a bug when an interest retention stream with noack consumers is in clustered mode.
We were not properly propagating the ack state and proper cleanup of the stream messages.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-08 15:02:09 -07:00
Derek Collison
29eaa9c614 Fixed bug that could lead to perceived message loss.
Under load and pressure from concurrent publishing and consuming with multiple consumers the filestore would
return a partial or no cache error to the upper layers. For consumers this could result in us skipping a stream sequence when we should not.

This change stabilizes the filestore and removes the flush state for msg blocks. I also found some bugs that did not track last sequence properly
after snapshots / restore.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-05 16:36:23 -07:00
Derek Collison
ba4937f04e The slowdown was due to trying top expire messages without a proper index info.
So now we read and encrypt index info in place as well.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-25 13:22:18 -07:00
Derek Collison
4a6f1b4819 Do not use crypto rand for nonce generation.
Crypto rand is not needed for nonce generation and could drain entropy.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-24 12:51:13 -07:00
Derek Collison
3a20582ad5 Add in optional compression schemes for Accept-Encoding on server api requests.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-23 13:06:18 -07:00
Derek Collison
12c912d7f4 Only compact when msg is not first.
Make sure compact works with snapshots.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-20 06:47:53 -07:00
Derek Collison
d349edeeb6 When a JetStream stream was used as a KV, there could be times where we have lots of file storage unused.
This change introduces utilization, better interior block deletes, and individual block compaction when we are below 50% utilization of the block.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-19 18:24:41 -07:00
Derek Collison
75ae7c6032 When an account asked for connz should be client and leaf connections only by default.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-15 11:04:23 -07:00
Derek Collison
f07a86c6db Merge branch 'main' into acc-connz
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-14 18:13:43 -07:00
Derek Collison
cdb5a56329 Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-14 15:26:27 -07:00
Derek Collison
14572b080b Fixed and moved large purge test to no race
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-14 13:07:46 -07:00
Derek Collison
10167b1bcf Added in ability for normal accounts to access scoped connz info.
Added in client kind and sub type for clients.
Added in ability to filter connections based on matching subject interest.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-13 10:19:12 -07:00
Matthias Hanel
d6de19c649 Made test more predictable by waiting for leader after leader shutdown
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-08-12 12:34:15 -04:00
Derek Collison
c9875e09a0 Fix for a flapper
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-12 06:13:47 -07:00
Derek Collison
29536629eb Simplified flow control, avoid stalls due to msg loss
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-09 20:13:17 -07:00
Derek Collison
4e92b0ed6e When a server was restarting, if a stream had a MaxAge and there were a very large amount of messages to expire, this would take too long.
During normal operation and quick restarts the number of expired messages per cycle is manageable and correct.
However if a server is shutdown for quite a long time and many messages have expired this process is too slow.

This commit introduces an optimized expiration tailored for startup vs running state.

Signed-off-by: Derek Collison <derek@nats.io>
2021-07-30 12:48:47 -07:00
Derek Collison
f13fa767c2 Remove the swapping of accounts during processing of service imports.
When processing service imports we would swap out the accounts during processing.
With the addition of internal subscriptions and internal clients publishing in JetStream we had an issue with the wrong account being used.
This was specific to delyaed pull subscribers trying to unsubscribe due to max of 1 while other JetStream API calls were running concurrently.
2021-07-26 07:57:10 -07:00
Derek Collison
6eef31c0fc Fixed peer info reports that had large last active values.
Also put in safety for lag going upside down as well.

Signed-off-by: Derek Collison <derek@nats.io>
2021-07-06 10:14:43 -07:00
Derek Collison
960c45df81 Use of sync.Pool for filestore could cause msg corruption.
Signed-off-by: Derek Collison <derek@nats.io>
2021-07-06 08:41:01 -07:00
Derek Collison
63479ff8fd Bump threshold
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-27 08:33:46 -07:00
Derek Collison
a27f198b83 Skip for now, covermode blows up memory and latency thresholds
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-23 13:50:14 -07:00
Derek Collison
225c8b4a85 Bump threshold
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-22 17:44:19 -07:00
Derek Collison
b3753aba1b Improvements to filtered purge and general memory use for filestore.
We optimized the filtered purge to skip msgBlks that are not in play.
Also optimized msgBlock buffer usage by using two sync.Pools to enhance reuse.

Signed-off-by: Derek Collison <derek@nats.io>
2021-06-22 15:47:26 -07:00
R.I.Pienaar
c6b85fd101 update for review
Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-06-22 08:47:08 +02:00
R.I.Pienaar
c9bf329a99 test to show slow purges
Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-06-21 17:01:49 +02:00
Derek Collison
6219f0381d Test rename for no race versions
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-15 09:41:11 -07:00
Derek Collison
d9a0ff904c Bump timeout threshold
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-15 08:53:11 -07:00
Derek Collison
08cdb2d2ea Make filtered consumers in large mixed streams more efficient.
Allow wider scoped filtered subjects.

We introduce a per subject information tracking to filestore to optimize for large mux'd streams and more efficient filtered consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2021-06-15 04:44:05 -07:00
Derek Collison
820c76d3c8 Fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-19 11:43:43 -07:00
Derek Collison
946335d62f Increase clients and runtime
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-16 14:18:40 -07:00
Derek Collison
d7641b9d38 Move test to norace
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-16 14:00:11 -07:00
Derek Collison
adba4fde5a Add large stress test, skipped by default
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-16 13:58:32 -07:00
Derek Collison
395728bab9 Allow control messages like heartbeats to pass the old sub test.
Signed-off-by: Derek Collison <derek@nats.io>
2021-04-14 14:11:02 -07:00
Jaime Piña
d929ee1348 Check errors when removing test directories and files
Currently in tests, we have calls to os.Remove and os.RemoveAll where we
don't check the returned error. This hides useful error messages when
tests fail to run, such as "too many open files".

This change checks for more filesystem related errors and calls t.Fatal
if there is an error.
2021-04-07 11:09:47 -07:00
Jaime Piña
6941bb3ade Update Go client in tests 2021-03-30 13:17:34 -07:00
Derek Collison
2ed53035ed Reworked flow control for sources and mirrors.
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-24 07:07:33 -07:00
Matthias Hanel
b316cccfd1 Fixed a quorum formation issue that caused truncation
When a new leader is elected it has to give everyone a chance to reply,
so that we can observe rejections with higher term.

The maximum election timeout is 7.5 seconds.
The new behavior of waiting for the election timeout caused unit tests
to fail. Hence upping the timeout there as well.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-03-11 19:44:47 -05:00
Derek Collison
2b2a776411 Disable flaky tests for now
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-11 07:11:05 -05:00
Ivan Kozlovic
e7e756034a Switch Gateway JS accounts to interest-only mode + some other fixes
- Fixed the close of a TLS connection which starting Go 1.16
set the deadline to 5 seconds.

- Fixed an issue with setHeader that was causing these error messages
```
=== RUN   TestServiceImportReplyMatchCycleMultiHops
nats: message could not decode headers on connection [4] for subscription on "foo"
--- PASS: TestServiceImportReplyMatchCycleMultiHops (0.04s)
```

- Fixed names of tests in norace_test.go since they must start with
TestNoRace in order to make sure that we execute them in Travis:
```
go test -v -run=TestNoRace --failfast -p=1 ./...
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-03-03 19:15:28 -07:00
Derek Collison
401484299d Flaps with cluster size of 5 too much
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-03 06:34:07 -08:00
Derek Collison
09e3d26fa3 Add in support for stream mirrors and sources.
Add in proper support for stream updates in clustered mode.
Don't send API updates without subjects, caused GW parser errors.
Stream internal loops use their own clients now.

Signed-off-by: Derek Collison <derek@nats.io>
2021-02-23 10:57:27 -08:00