Commit Graph

132 Commits

Author SHA1 Message Date
Derek Collison
d04763eb7d CAS operations improved, hold lock past store. Use separate lock for consumer list and storage updates.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-24 18:30:44 -07:00
Ivan Kozlovic
02ecda535c Stop the raft node to not cause test to flap.
Test TestNoRaceJetStreamClusterCorruptWAL() would start to flap
because of the snapshot on cluster shutdown. Disable the snapshot
on exit for this test by stopping the raft node before shutdown.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-16 18:44:32 -06:00
Ivan Kozlovic
7de4497815 Install consumer snapshot on clean exit and few other fixes
- didRemove in applyMetaEntries() could be reset when processing
multiple entries
- change "no race" test names to include JetStream
- separate raft nodes leader stepdown and stop in server
shutdown process
- in InstallSnapshot, call wal.Compact() with lastIndex+1

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-16 17:05:49 -06:00
Derek Collison
8c04adc009 Improvements to filestore for large KVs.
Use better indexing for lookups, we used to do simple linear scan backwards, now track first and last block.
Will expire the fss cache at will to reduce memory usage.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-09 15:51:13 -05:00
Derek Collison
06112d6885 Reset activity interval on catchup to default vs ramp up. Tweak test.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
758b733d43 Attempt to improve long RTT catchup time during stream moves.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Ivan Kozlovic
3c9a7cc6e5 Move to Go 1.19, remote io/util, fix data race and a flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-05 09:55:37 -06:00
Ivan Kozlovic
fe370955c8 Merge pull request #3288 from nats-io/debug_test_failure
[FIXED] JetStream: Some scaling up issues
2022-07-26 08:57:17 -06:00
Ivan Kozlovic
1a6c5f1c90 [FIXED] JetStream: Some scaling up issues
- Send snapshot only if leader
- When processing snapshot, start with a smaller inactivity interval
  that will double up to 10sec or use 10sec directly once we get a
  message. Reason for that is that it is possible that the request
  for snapshot is sent while the leader has not yet setup the subscription
  that receives the requests (or subscription has not fully reached the
  cluster).
- Don't remember snapfile on err.
- Do not consider current if we have not had any activity.
- Stabilize stream scale up under active heavy publishing.
- Due to the publish pressure move the check for followers direct subs spinning up til after we stop publishing.

Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-07-25 18:44:18 -06:00
Ivan Kozlovic
ebeca00e20 [FIXED] JetStream/Cluster: Stream names/infos would return bad response
If there are more stream names that the current limit of 1024, getting
the list of names would return them all instead of using pagination.

For "stream infos", the Total amount returned would be the API limit
instead of the actual number of streams.

Resolves https://github.com/nats-io/natscli/issues/541

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-07-25 14:41:05 -06:00
Derek Collison
69f522cb9f Make sure to clean up client connection
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-06 19:29:32 -07:00
Derek Collison
f8939b40bc Do not unsubscribe from direct access on leader stepdown, only stopping.
Also wait for stream to have replicas and leader for test.

Signed-off-by: Derek Collison <derek@nats.io>
2022-07-06 16:20:12 -07:00
Derek Collison
81a9906ad9 Wait a bit longer for the direct sub
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-03 12:54:15 -07:00
Derek Collison
47bef915ed Allow all members of a replicated stream to participate in direct access.
We will wait until a non-leader replica is current to subscribe.

Signed-off-by: Derek Collison <derek@nats.io>
2022-07-03 11:08:24 -07:00
Ivan Kozlovic
4bf81420e2 [FIXED] Fast routed JetStream API requests were dropped
If a JS API request is received from a non client connection, it
was processed in its own go routine. To reduce the number of
such go routine, we were limiting the number of outstanding routines
to 4096. However, in some situations, it was possible to issue
many requests at the same time that would then cause those requests
to be dropped.

(an example was an MQTT benchmark tool that would create 5000
sessions, each with one QoS1 R1 consumer (with the use of consumer_replicas=1).
On abrupt exit of the tool, the consumers and their sessions needed
to be deleted. Since would cause fast incoming delete consumer requests
which would cause the original code to drop some of them)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-23 11:15:55 -06:00
Derek Collison
790d643431 Consumer's num pending can now rely on the stream's store vs trying to maintain furing runtime which could be wrong under certain conditions.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-20 08:45:43 -07:00
Derek Collison
ef3eea4d73 Speed up raft for tests
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-18 16:28:58 -07:00
Ivan Kozlovic
cadf921ed1 [FIXED] JetStream: PullConsumer MaxWaiting==1 and Canceled requests
There was an issue with MaxWaiting==1 that was causing a request
with expiration to actually not expire. This was because processWaiting
would not pick it up because wq.rp was actually equal to wq.wp
(that is, the read pointer was equal to write pointer for a slice
of capacity of 1).

The other issue was that when reaching the maximum of waiting pull
requests, a new request would evict an old one with a "408 Request Canceled".

There is no reason for that, instead the server will first try to
find some existing expired requests (since some of the expiration
is lazily done), but if none is expired, and the queue is full,
the server will return a "409 Exceeded MaxWaiting" to the new
request, and not a "408 Request Canceled" to an old one...

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-03 15:17:20 -06:00
Ivan Kozlovic
d4d37e67f4 [FIXED] JetStream: file store compact and when to write index
When deciding to compact a file, we need to remove from the raw
bytes the empty records, otherwise, for small messages, we would
end-up calling compact() too many times.

When removing a message from the stream, in FIFO cases we would
write the index every 2 seconds at most when doing it in place,
when when dealing with out of order deletes, we would do it for
every single delete, which can be costly. We are now writing
only every 500ms for non FIFO cases.

Also fixed some unrelated code:
- Decision to install a snapshot was based on incorrect logical
expression
- In checkPending(), protect against the timer being nil which
could happen when consumer is stopped or leadership change.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-28 12:35:19 -06:00
Ivan Kozlovic
0e2ab5eeea Changes to tests that run on Travis
- Remove code coverage from Travis and add it to a GitHub Action
that will be run as a nightly.
- Use tag builds to exclude some tests, such as the "norace" or
JS tests. Since "go test" does not support "negative" regexs, there
is no other way.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-26 14:11:31 -06:00
Ivan Kozlovic
b9463b322f [FIXED] JetStream: stream mirror issues in mixed mode clusters
Similar to PR #3061

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-20 23:21:15 -06:00
Ivan Kozlovic
df61a335c7 Merge pull request #3061 from nats-io/js_fix_stream_source
[FIXED] JetStream: stream sources issue in mixed mode clusters
2022-04-20 23:20:41 -06:00
Ivan Kozlovic
9975a38c6e [FIXED] JetStream: stream sources issue in mixed mode clusters
The main issue was that in mixed-mode, the interest through gateway
may still be in optimistic mode, which when creating the source
consumer would start delivery before we had a chance to setup
the subscription to receive those messages.

The approach is to create the subscription prior to sending
the consumer create request. Also refactored a bit the code in
the hope to make the retries a bit more bullet proof.

We may also look at making sure that gateways are switched to
interest-mode when detecting a mixed-mode setup.

Also fixed a defect that could cause a source to be canceled
when updating a stream.

Resovles #2801

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-20 21:02:35 -06:00
Matthias Hanel
ff5d60973d introducing max_age/dupe_window minimum value of 100ms. (#3056)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-20 13:58:19 -04:00
Ivan Kozlovic
a78ccdcb2f [FIXED] JetStream: some stream SOURCE issues
- Possibly missing some early messages from the sourced stream
- In some cancel situations, the processing of sourced messages
would not longer work

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-18 12:42:16 -06:00
Ivan Kozlovic
bd61d51a1c [IMPROVED] JetStream: reduce unnecessary leader election
- Wait of some sort of routing to be in place before starting
the raft run loop
- Remove use of lock in apiDispatch that was not necessary but
could have cause a route to block, causing memory growth, etc..

Unrelated rename of some tests so that they start with TestJetStream
and TestJetStreamCluster for cluster tests, fixed some flappers
and ensure that tests that change RAFT timeouts put them back
to default values on exit.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-14 10:47:14 -06:00
Derek Collison
3c0bced76e Move test to no race, rename others
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-12 16:23:36 -07:00
Ivan Kozlovic
50c3986863 [FIXED] JetStream stream catchup issues
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 16:05:12 -06:00
Derek Collison
5dfcc5e934 Fix for flapping WAL test
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 22:50:25 -07:00
Derek Collison
e330572cef Select next leader before truncating
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 17:04:29 -07:00
Derek Collison
c3612b57c7 Fixes for some flapping tests
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-10 13:02:03 -07:00
Ivan Kozlovic
c78f7f343c Add test that demonstrated the consumer filter perf degradation
This is a follow up to PR #3008.

This test fails on v2.7.4 but passes on main.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-06 09:27:56 -06:00
Derek Collison
eb16c35016 OrderedConsumer was very conservative with slow start and small max outstanding bytes. This is increasing perf for longer rtt.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-30 05:08:36 -07:00
Derek Collison
607858f213 Improved consumer snapshot logic in clustered mode and disk usage.
Also fixed a bug that could cause memory based replicated consumers to no longer work after snapshots and server restarts.

The snapshot logic would allow non-state changing updates to continously grow the raft logs. We also were too conservative on when we snapshotted and why.
Also added in ability to have FileStore.Compact() reclaim space from the block file from the head of last changed block.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-29 18:02:49 -07:00
Derek Collison
780d4c0dd8 Merge pull request #2960 from nats-io/mem_pool
Additional improvements to memory pooling and management.
2022-03-28 17:10:16 -07:00
Derek Collison
5e5aab378e Additional improvements to memory pooling and management. Also logic fix for firstMatching that did unnecessary work when matching all.
During contention to the head write blk, the system could perform worse memory wise compared to simple go runtime.
Also had some references for the subject of messages bloating memory.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 10:15:23 -07:00
Ivan Kozlovic
6ad93d9b34 Fix some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 18:24:17 -06:00
Derek Collison
ef8f543ea5 Improve memory usage through JetStream storage layer.
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.

The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.

Also fixed some flappers and a bug where de-dupe might not be reformed correctly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Ivan Kozlovic
29ff67e2ac Tests: Replace all Ack() with AckSync() for now
For reason explained in previous commit, for tests that were
expecting the number of ack/pending to be of a certain value after
an Ack(), they would be flapping. Replaced all references and
we can go back to selectively call Ack() when AckSync() is not
needed.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 20:25:01 -06:00
Ivan Kozlovic
b4128693ed Ensure file path is correct during stream restore
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.

Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 13:31:51 -07:00
Derek Collison
330a40009c Cleanup key files when removing message blocks.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-17 11:33:41 -08:00
Derek Collison
68104d7cf3 During a filestore snapshot we generate the fss files but were not cleaning them up if the block was deleted before a server restart.
https://gist.github.com/nekufa/010185dfb59261f222a0042d3a7d2a1c

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-09 17:12:08 -08:00
Derek Collison
d50febeeff Improved sparse consumers replay time.
When a stream has multiple subjects and a consumer filters the stream to a small and spread out list of messages the logic would do a linear scan looking for the next message for the filtered consumer.
This CL allows the store layer to utilize the per subject info to improve the times.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-07 17:26:32 -08:00
Derek Collison
6a3cf0f71e Added in ability to get number of subjects from StreamInfo, and optionally details per subject on how many messages each subject has.
This can also be filtered, meaning you can filter out the subjects when asking for details.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-02 08:51:13 -08:00
Derek Collison
6b5332249b This test was using fetch and failing if the complete batch was not filled.
This has nothing to do with the test, we just want to make sure the leader steps down and there were no low level errors on the fetch.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-01 13:34:00 -08:00
Derek Collison
8815072e34 Fix flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-30 14:54:24 -08:00
Derek Collison
6486cd8fc8 Added in /healthz endpoint for health and liveness probes in environments like k8s.
Currently this code returns a 200 and { "status": "ok" } iff all configured ports are open
and if JetStream is configured and we have contact with the metaleader and the cluster and all streams are up to date.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 19:30:10 -08:00
Derek Collison
c5fbb63614 JetStream ephemeral consumers could create a situation where the server would exhaust the OS thread limit - default 10k.
Under certain situations large number of consumers that are racing to update state or delete their stores during a delete
would start taking up OS threads due to blocking disk IO. When this happened and their were a bunch of Go routines becoming
runnable the Go runtime would create extra OS threads to fill in the runnable pool and would exhaust the max thread setting.

This code places a channel as a simple semaphore to limit the number of disk IO blocking OS threads.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-29 07:05:34 -08:00
Derek Collison
af4d7dbe52 Memory store tracked interior deletes for stream state, but under KV semantics this could be very large.
Actually faster to not track at all and generate on the fly. Saves lots of memory too.

When we update the stream state to include runs, etc will update this as well.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-20 17:37:16 -08:00
Derek Collison
98757253f9 Recreate client in case shutdown server was the one we were connected to
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 14:50:22 -08:00