Use better indexing for lookups, we used to do simple linear scan backwards, now track first and last block.
Will expire the fss cache at will to reduce memory usage.
Signed-off-by: Derek Collison <derek@nats.io>
A message block is checking the filestore's cfg.Subjects to see
if it can "intern" the subject or not. The problem is that this
is done under the message block's lock, but not the filestore.
However, during a stream configuration update, the filestore's
cfg field is switched to a new one, causing the datarace.
By making sure we do the switch under all message blocks lock,
we remove the data race (that could be reproduce by running th
test TestJetStreamClusterMoveCancel with -count=10).
We investigating the use of a string interning library but it
showed a little performance degradation that this approach does
not suffer from.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This could happen when a consumer had not sent anything to the
attached NATS subscription and there was a consumer leader
step down or server restart.
Signed-off-by: Derek Collison <derek@nats.io>
We were getting a data race checking the js.clustered field in
updateUsage() following fix for lock inversion in PR #3092.
```
=== RUN TestJetStreamClusterKVMultipleConcurrentCreate
==================
WARNING: DATA RACE
Read at 0x00c0009db5d8 by goroutine 195:
github.com/nats-io/nats-server/v2/server.(*jsAccount).updateUsage()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:1681 +0x8f
github.com/nats-io/nats-server/v2/server.(*stream).storeUpdates()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/stream.go:2927 +0x1d9
github.com/nats-io/nats-server/v2/server.(*stream).storeUpdates-fm()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/stream.go:2905 +0x7d
github.com/nats-io/nats-server/v2/server.(*fileStore).removeMsg()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:2158 +0x14f7
github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:2777 +0x18f
github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs-fm()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:2770 +0x39
Previous write at 0x00c0009db5d8 by goroutine 128:
github.com/nats-io/nats-server/v2/server.(*jetStream).setupMetaGroup()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:604 +0xfae
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStreamClustering()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:514 +0x20a
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStream()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:400 +0x1168
github.com/nats-io/nats-server/v2/server.(*Server).EnableJetStream()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream.go:206 +0x651
github.com/nats-io/nats-server/v2/server.(*Server).Start()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:1746 +0x1804
github.com/nats-io/nats-server/v2/server.RunServer·dwrap·4269()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server_test.go:90 +0x39
Goroutine 195 (running) created at:
time.goFunc()
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/time/sleep.go:180 +0x49
Goroutine 128 (finished) created at:
github.com/nats-io/nats-server/v2/server.RunServer()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server_test.go:90 +0x278
github.com/nats-io/nats-server/v2/server.RunServerWithConfig()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server_test.go:112 +0x44
github.com/nats-io/nats-server/v2/server.(*cluster).restartServer()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_helpers_test.go:1004 +0x1d5
github.com/nats-io/nats-server/v2/server.TestJetStreamClusterKVMultipleConcurrentCreate()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster_test.go:8463 +0x64b
testing.tRunner()
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/testing/testing.go:1259 +0x22f
testing.(*T).Run·dwrap·21()
/home/travis/.gimme/versions/go1.17.9.linux.amd64/src/testing/testing.go:1306 +0x47
==================
```
Running that test with adding some delay in several places also showed another race:
```
==================
WARNING: DATA RACE
Read at 0x00c00016adb8 by goroutine 160:
github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/filestore.go:2777 +0x106
github.com/nats-io/nats-server/v2/server.(*fileStore).expireMsgs-fm()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/filestore.go:2771 +0x39
Previous write at 0x00c00016adb8 by goroutine 32:
github.com/nats-io/nats-server/v2/server.(*fileStore).UpdateConfig()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/filestore.go:360 +0x1c8
github.com/nats-io/nats-server/v2/server.(*stream).update()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/stream.go:1360 +0x852
github.com/nats-io/nats-server/v2/server.(*jetStream).processClusterCreateStream()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:2704 +0x4a4
github.com/nats-io/nats-server/v2/server.(*jetStream).processStreamAssignment()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:2452 +0xad9
github.com/nats-io/nats-server/v2/server.(*jetStream).applyMetaEntries()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:1407 +0x7e4
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorCluster()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:887 +0xc75
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorCluster-fm()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:813 +0x39
Goroutine 160 (running) created at:
time.goFunc()
/usr/local/go/src/time/sleep.go:180 +0x49
Goroutine 32 (running) created at:
github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:3013 +0x86
github.com/nats-io/nats-server/v2/server.(*jetStream).setupMetaGroup()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:612 +0x1092
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStreamClustering()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:514 +0x20a
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStream()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:400 +0x1168
github.com/nats-io/nats-server/v2/server.(*Server).EnableJetStream()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:206 +0x651
github.com/nats-io/nats-server/v2/server.(*Server).Start()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:1746 +0x1804
github.com/nats-io/nats-server/v2/server.RunServer·dwrap·4275()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server_test.go:90 +0x39
==================
```
Both are now addressed, either with proper locking, or with the use of an atomic in the place
where we cannot get the lock (without re-introducing the lock inversion issue).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a downstream stream uses retention modes that delete messages, fallback to timebased start time for the new source consumers.
Signed-off-by: Derek Collison <derek@nats.io>
There was a case where we may have done a check for max-per-subject
limit twice per message. That would apply to streams that have
max-per-subject and also discard_new, which is what KV configures.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When deciding to compact a file, we need to remove from the raw
bytes the empty records, otherwise, for small messages, we would
end-up calling compact() too many times.
When removing a message from the stream, in FIFO cases we would
write the index every 2 seconds at most when doing it in place,
when when dealing with out of order deletes, we would do it for
every single delete, which can be costly. We are now writing
only every 500ms for non FIFO cases.
Also fixed some unrelated code:
- Decision to install a snapshot was based on incorrect logical
expression
- In checkPending(), protect against the timer being nil which
could happen when consumer is stopped or leadership change.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is more of a regression introduced in v2.7.3 (with PR #2848).
When the store has a list of subjects, finding the next message
to deliver would go through the subjects map and have to match
to find out if it is a subset (if the filter had a wildcard).
In situations where there were lots of subjects (for instance 1
message per subject), but the consumer did not filter on anything
specific, then this processing was becoming slow.
We now check that if the stream has a single subject (even with
wildcard) and the consumer filters on that exact subject, then
we can do a linear scan. We also do a linear scan if the number
of messages in the block is 1/2 the number of subjects in the
subjects map.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Also fixed a bug that could cause memory based replicated consumers to no longer work after snapshots and server restarts.
The snapshot logic would allow non-state changing updates to continously grow the raft logs. We also were too conservative on when we snapshotted and why.
Also added in ability to have FileStore.Compact() reclaim space from the block file from the head of last changed block.
Signed-off-by: Derek Collison <derek@nats.io>
This may prevent memory copies when not necessary. Also fixed a bug
there that would check twice if there was only 1 subject and that
subject did not match (say configured subject is foo.* and key is
foo.bar).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
During contention to the head write blk, the system could perform worse memory wise compared to simple go runtime.
Also had some references for the subject of messages bloating memory.
Signed-off-by: Derek Collison <derek@nats.io>
We had a report of a panic on server restart with 2.8.0-beta.1. The panic was trying to malloc the size of a load block based off of the number of messages we thought the block had from the index.
Before, SkipMsg would decrement and when we added the record via writeMsgRecord we would add it back in. However we did release the lock, meaning other things could run.
If in between the decrement, say to 0 (we did protect against underflow there), then a remove and subsequent writeIndexInfo would stamp and underflow.
Signed-off-by: Derek Collison <derek@nats.io>
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.
The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.
Also fixed some flappers and a bug where de-dupe might not be reformed correctly.
Signed-off-by: Derek Collison <derek@nats.io>
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.
Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>