Commit Graph

3843 Commits

Author SHA1 Message Date
Derek Collison
b6ebe34734 Merge pull request #3121 from nats-io/issue-3114
General improvements to accounting for the filestore.
2022-05-12 16:01:25 -07:00
Derek Collison
bcecae42ac Fix for #3119
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-12 15:45:29 -07:00
Derek Collison
4291433a46 General improvements to accounting for the filestore. This in response to tracking issue #3114.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-12 15:43:11 -07:00
Ivan Kozlovic
e304589da4 [FIXED] JetStream: Some data races
We were getting a data race checking the js.clustered field in
updateUsage() following fix for lock inversion in PR #3092.
```
=== RUN   TestJetStreamClusterKVMultipleConcurrentCreate
==================
WARNING: DATA RACE
Read at 0x00c0009db5d8 by goroutine 195:
  github.com/nats-io/nats-server/v2/server.(*jsAccount).updateUsage()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:1681 +0x8f
  github.com/nats-io/nats-server/v2/server.(*stream).storeUpdates()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/stream.go:2927 +0x1d9
  github.com/nats-io/nats-server/v2/server.(*stream).storeUpdates-fm()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/stream.go:2905 +0x7d
  github.com/nats-io/nats-server/v2/server.(*fileStore).removeMsg()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:2158 +0x14f7
  github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:2777 +0x18f
  github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs-fm()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:2770 +0x39
Previous write at 0x00c0009db5d8 by goroutine 128:
  github.com/nats-io/nats-server/v2/server.(*jetStream).setupMetaGroup()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:604 +0xfae
  github.com/nats-io/nats-server/v2/server.(*Server).enableJetStreamClustering()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:514 +0x20a
  github.com/nats-io/nats-server/v2/server.(*Server).enableJetStream()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:400 +0x1168
  github.com/nats-io/nats-server/v2/server.(*Server).EnableJetStream()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:206 +0x651
  github.com/nats-io/nats-server/v2/server.(*Server).Start()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:1746 +0x1804
  github.com/nats-io/nats-server/v2/server.RunServer·dwrap·4269()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server_test.go:90 +0x39
Goroutine 195 (running) created at:
  time.goFunc()
      /home/travis/.gimme/versions/go1.17.9.linux.amd64/src/time/sleep.go:180 +0x49
Goroutine 128 (finished) created at:
  github.com/nats-io/nats-server/v2/server.RunServer()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server_test.go:90 +0x278
  github.com/nats-io/nats-server/v2/server.RunServerWithConfig()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server_test.go:112 +0x44
  github.com/nats-io/nats-server/v2/server.(*cluster).restartServer()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_helpers_test.go:1004 +0x1d5
  github.com/nats-io/nats-server/v2/server.TestJetStreamClusterKVMultipleConcurrentCreate()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster_test.go:8463 +0x64b
  testing.tRunner()
      /home/travis/.gimme/versions/go1.17.9.linux.amd64/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /home/travis/.gimme/versions/go1.17.9.linux.amd64/src/testing/testing.go:1306 +0x47
==================
```

Running that test with adding some delay in several places also showed another race:
```
==================
WARNING: DATA RACE
Read at 0x00c00016adb8 by goroutine 160:
  github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/filestore.go:2777 +0x106
  github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs-fm()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/filestore.go:2771 +0x39

Previous write at 0x00c00016adb8 by goroutine 32:
  github.com/nats-io/nats-server/v2/server.(*fileStore).UpdateConfig()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/filestore.go:360 +0x1c8
  github.com/nats-io/nats-server/v2/server.(*stream).update()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/stream.go:1360 +0x852
  github.com/nats-io/nats-server/v2/server.(*jetStream).processClusterCreateStream()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:2704 +0x4a4
  github.com/nats-io/nats-server/v2/server.(*jetStream).processStreamAssignment()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:2452 +0xad9
  github.com/nats-io/nats-server/v2/server.(*jetStream).applyMetaEntries()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:1407 +0x7e4
  github.com/nats-io/nats-server/v2/server.(*jetStream).monitorCluster()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:887 +0xc75
  github.com/nats-io/nats-server/v2/server.(*jetStream).monitorCluster-fm()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:813 +0x39

Goroutine 160 (running) created at:
  time.goFunc()
      /usr/local/go/src/time/sleep.go:180 +0x49

Goroutine 32 (running) created at:
  github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:3013 +0x86
  github.com/nats-io/nats-server/v2/server.(*jetStream).setupMetaGroup()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:612 +0x1092
  github.com/nats-io/nats-server/v2/server.(*Server).enableJetStreamClustering()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:514 +0x20a
  github.com/nats-io/nats-server/v2/server.(*Server).enableJetStream()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:400 +0x1168
  github.com/nats-io/nats-server/v2/server.(*Server).EnableJetStream()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:206 +0x651
  github.com/nats-io/nats-server/v2/server.(*Server).Start()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:1746 +0x1804
  github.com/nats-io/nats-server/v2/server.RunServer·dwrap·4275()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server_test.go:90 +0x39
==================
```

Both are now addressed, either with proper locking, or with the use of an atomic in the place
where we cannot get the lock (without re-introducing the lock inversion issue).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-11 19:09:24 -06:00
Ivan Kozlovic
5c3be1ee68 [FIXED] JetStream: panic processing cluster consumer create
Before PR #3099, `waitQueue.isEmpty()` returned `wq.len() == 0`
and `waitQueue.len()` was protecting against the pointer being
nil (and then return 0).

The change in #3099 caused `waitQueue.isEmpty()` to return `wq.n == 0`,
which means that if `wq` was nil, then it would crash.

This PR restores `waitQueue.isEmpty()` to return `wq.len() == 0` and
add the protection for waitQueue being nil in `len()` similar to
how it was prior to PR #3099.

Resolves #3117

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-11 11:03:50 -06:00
Ivan Kozlovic
56d06fd8eb Bump version to 2.8.3-beta.1
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-10 17:35:32 -06:00
Ivan Kozlovic
2ce1dc1561 [FIXED] JetStream: possible lockup due to a return prior to unlock
This would happen in situation where a node receives an append
entry with a term higher than the node's (current leader).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-10 17:11:57 -06:00
Ivan Kozlovic
17cc205293 Merge pull request #3112 from nats-io/fix_3108
[FIXED] Accounts Export/Import isolation with overlap subjects
2022-05-10 14:38:47 -06:00
Matthias Hanel
f87c7d8441 altered move unit test to test tiered/non tiered setup (#3113)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-05-09 19:49:22 -04:00
Ivan Kozlovic
c4adf0ffed [FIXED] Accounts Export/Import isolation with overlap subjects
I tracked down this issue to have been introduced with PR #2369,
but the code also touched PR #1891 and PR #3088.

I added a test as described in issue #3108 but did not need
JetStream to demonstrate the issue. With the proposed fix, all
tests that were added in aforementioned PRs still pass, including
the new test.

Resolves #3108

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-09 12:59:12 -06:00
Derek Collison
88ebfdaee8 Merge pull request #3109 from nats-io/issue-3107-3069
[FIXED] Downstream sourced retention policy streams during restart have redelivered messages
2022-05-09 09:13:48 -07:00
Derek Collison
b35988adf9 Remember the last timestamp by not removing last msgBlk when empty and during purge pull last timestamp forward until new messages arrive.
When a downstream stream uses retention modes that delete messages, fallback to timebased start time for the new source consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-09 09:04:19 -07:00
Derek Collison
6507cba2a9 Fix for race on recovery
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-07 12:42:56 -07:00
Derek Collison
fbc9e16253 Fix for panic due to not loaded cache during compact
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-07 09:25:32 -07:00
Ivan Kozlovic
f20fe2c2d8 Bump version to dev 2.8.3-beta
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-04 13:20:53 -06:00
Ivan Kozlovic
10c020ed44 Release v2.8.2
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-04 11:52:55 -06:00
Ivan Kozlovic
3cdbba16cb Revert "[added] support for jwt operator option DisallowBearerToken" 2022-05-04 11:11:25 -06:00
Ivan Kozlovic
12dd727310 Merge pull request #3091 from nats-io/DisallowBearerToken
[added] support for jwt operator option DisallowBearerToken
2022-05-04 10:57:22 -06:00
Derek Collison
7246edc77d Bump up default block sizes
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-04 09:46:15 -07:00
Derek Collison
7c9a2d921a Bump to 2.8.2-beta.5
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-03 16:08:58 -07:00
Ivan Kozlovic
5d90c8eac7 [IMPROVED] JetStream: check max-per-subject once
There was a case where we may have done a check for max-per-subject
limit twice per message. That would apply to streams that have
max-per-subject and also discard_new, which is what KV configures.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-03 16:57:26 -06:00
Derek Collison
3fef1025fe Merge pull request #3100 from nats-io/rc-improvements
Raft and cluster improvements.
2022-05-03 15:53:06 -07:00
Derek Collison
6f54b032d6 Raft and cluster improvements.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-03 15:20:46 -07:00
Ivan Kozlovic
cadf921ed1 [FIXED] JetStream: PullConsumer MaxWaiting==1 and Canceled requests
There was an issue with MaxWaiting==1 that was causing a request
with expiration to actually not expire. This was because processWaiting
would not pick it up because wq.rp was actually equal to wq.wp
(that is, the read pointer was equal to write pointer for a slice
of capacity of 1).

The other issue was that when reaching the maximum of waiting pull
requests, a new request would evict an old one with a "408 Request Canceled".

There is no reason for that, instead the server will first try to
find some existing expired requests (since some of the expiration
is lazily done), but if none is expired, and the queue is full,
the server will return a "409 Exceeded MaxWaiting" to the new
request, and not a "408 Request Canceled" to an old one...

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-03 15:17:20 -06:00
Ivan Kozlovic
c9df6374b8 [FIXED] JetStream: possible panic checking for group leader less
Got this stack:
```
goroutine 247945 [running]:
github.com/nats-io/nats-server/v2/server.(*jetStream).isGroupLeaderless(0xc004794e70, 0xc0031b0300)
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:661 +0xc2
github.com/nats-io/nats-server/v2/server.(*Server).jsMsgDeleteRequest(0xc001dc9388, 0xc003e6de30, 0xc00222b980, 0xc001454f70, {0xc000668930, 0x24}, {0xc0011dbdb8, 0x11}, {0xc000da93f0, 0xa6, ...})
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream_api.go:2335 +0x67d
github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch.func1()
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream_api.go:716 +0x85
created by github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch
	/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/jetstream_api.go:715 +0x5c5
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-02 13:43:40 -06:00
Ivan Kozlovic
94b9c9b406 Merge pull request #3092 from nats-io/js_lock_inversion
[FIXED] JetStream: possible lock inversion
2022-05-02 11:26:17 -06:00
Ivan Kozlovic
5050092468 [FIXED] JetStream: possible lock inversion
When updating usage, there is a lock inversion in that the jetStream
lock was acquired while under the stream's (mset) lock, which is
not correct. Also, updateUsage was locking the jsAccount lock, which
again, is not really correct since jsAccount contains streams, so
it should be jsAccount->stream, not the other way around.

Removed the locking of jetStream to check for clustered state since
js.clustered is immutable.

Replaced using jsAccount lock to update usage with a dedicated lock.

Originally moved all the update/limit fields in jsAccount to new
structure to make sure that I would see all code that is updating
or reading those fields, and also all functions so that I could
make sure that I use the new lock when calling these. Once that
works was done, and to reduce code changes, I put the fields back
into jsAccount (although I grouped them under the new usageMu mutex
field).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-02 09:50:32 -06:00
Matthias Hanel
c9217bad33 review comments
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-29 20:00:37 -04:00
Matthias Hanel
bd2883122e [added] support for jwt operator option DisallowBearerToken
I modified an existing data structure that held a similar attribute already.
Instead this data structure references the claim.

change 3 out of 3. Fixes #3084
corresponds to:
https://github.com/nats-io/jwt/pull/177
https://github.com/nats-io/nsc/pull/495

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-29 14:18:11 -04:00
Derek Collison
0bb7abccba Bump to 2.8.2-beta.4
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-29 09:21:45 -07:00
Derek Collison
806877ebaa Merge pull request #3090 from nats-io/qsub-deny
Combined canSubscribe and canQueueSubscribe
2022-04-29 09:20:57 -07:00
Derek Collison
c20b52251b Combined canSubscribe and canQueueSubscribe for consistency in specialized deny clause handling.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-29 09:18:45 -07:00
Derek Collison
e0f5fcffb8 Fix for subject transforms and JetStream delivery subjects.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-28 15:50:28 -07:00
Derek Collison
0d928c0338 Merge pull request #3085 from nats-io/small-fss-improvement
Small improvement with fss processing
2022-04-28 13:56:53 -07:00
Ivan Kozlovic
d4d37e67f4 [FIXED] JetStream: file store compact and when to write index
When deciding to compact a file, we need to remove from the raw
bytes the empty records, otherwise, for small messages, we would
end-up calling compact() too many times.

When removing a message from the stream, in FIFO cases we would
write the index every 2 seconds at most when doing it in place,
when when dealing with out of order deletes, we would do it for
every single delete, which can be costly. We are now writing
only every 500ms for non FIFO cases.

Also fixed some unrelated code:
- Decision to install a snapshot was based on incorrect logical
expression
- In checkPending(), protect against the timer being nil which
could happen when consumer is stopped or leadership change.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-28 12:35:19 -06:00
Derek Collison
9a96bef4c7 Small improvement with fss processing
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-28 10:23:30 -07:00
Derek Collison
2b1c3374a5 Bump to 2.8.2-beta.2
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-27 08:35:56 -07:00
Derek Collison
5368b660dc Merge pull request #3081 from nats-io/kv-mem-perf
[IMPROVED] KV memory store performance
2022-04-27 08:35:04 -07:00
Matthias Hanel
d520a27c36 [fixed] step down timing, consumer stream seqno, clear redelivery (#3079)
Step down timing for consumers or streams.
Signals loss of leadership and sleeps before stepping down.
This makes it less likely that messages are being processed during step
down.

When becoming leader, consumer stream seqno got reset,
even though the consumer existed already.

Proper cleanup of redelivery data structures and timer

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-27 03:32:08 -04:00
Derek Collison
138034b3a1 For memory store KV with history of 1 we were scanning for our next first when we did not have to.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-26 19:55:47 -07:00
Derek Collison
f702e279ab Fix for a consumer recovery issue.
Also update healthz to check all assets that are assigned, not just running.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-26 19:22:19 -07:00
Ivan Kozlovic
06ff4b2b29 Split JS cluster and super clusters tests and compile only on 1.16
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-26 16:24:05 -06:00
Ivan Kozlovic
0e2ab5eeea Changes to tests that run on Travis
- Remove code coverage from Travis and add it to a GitHub Action
that will be run as a nightly.
- Use tag builds to exclude some tests, such as the "norace" or
JS tests. Since "go test" does not support "negative" regexs, there
is no other way.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-26 14:11:31 -06:00
Leander Kohler
966d9d56f4 Add JSConsumerDeliveryNakAdvisory
The advisory `JSAdvisoryConsumerMsgNakPre` will be triggered
when a message is naked
2022-04-25 16:13:32 +02:00
Ivan Kozlovic
646b3850bf Bump version to 2.8.2-beta
Following recommendations to push to a new version as soon as
we release.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-21 15:50:08 -06:00
Ivan Kozlovic
dcfd52413e Release v2.8.1
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-21 14:28:12 -06:00
Ivan Kozlovic
99f1bdf1d8 Bump version to 2.8.1-beta.1
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-21 11:28:24 -06:00
Ivan Kozlovic
363849bf3a [FIXED] JetStream: Mirrors would fail to be recovered
This is a continuation of PR #3060, but extends to clustering.

Verified with manual test that a mirror created with v2.7.4 has
the duplicates window set and on restart with main would still
complain about use of dedup in cluster mode. The mirror stream
was recovered but showing as R1.
With this fix, a restart of the cluster - with existing data -
will properly recover the stream as an R3 and messages that
were published while in a bad state are synchronized.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Matthias Hanel mh@synadia.com
2022-04-21 10:59:23 -06:00
Ivan Kozlovic
b9463b322f [FIXED] JetStream: stream mirror issues in mixed mode clusters
Similar to PR #3061

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-20 23:21:15 -06:00
Ivan Kozlovic
df61a335c7 Merge pull request #3061 from nats-io/js_fix_stream_source
[FIXED] JetStream: stream sources issue in mixed mode clusters
2022-04-20 23:20:41 -06:00