Commit Graph

8406 Commits

Author SHA1 Message Date
Derek Collison
7db162d40f [FIXED] Configuration Reload: possible panic if done during Shutdown (#4666)
If a configuration reload is issued as the server is being shutdown, we
could get 2 different panics. One due to JetStream if an account is
JetStream enabled, and one due to the send to a go channel that has been
closed.

```
panic: send on closed channel [recovered]
        panic: send on closed channel

goroutine 440 [running]:
testing.tRunner.func1.2({0x1038d58e0, 0x1039e1270})
        /usr/local/go/src/testing/testing.go:1545 +0x274
testing.tRunner.func1()
        /usr/local/go/src/testing/testing.go:1548 +0x448
panic({0x1038d58e0?, 0x1039e1270?})
        /usr/local/go/src/runtime/panic.go:920 +0x26c
github.com/nats-io/nats-server/v2/server.(*Server).reloadAuthorization(0xc00024fb00)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1998 +0x788
github.com/nats-io/nats-server/v2/server.(*Server).applyOptions(0xc00024fb00, 0xc00021dc00, {0xc00038e4e0, 0x2, 0xc00021dc28?})
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1746 +0x2b8
github.com/nats-io/nats-server/v2/server.(*Server).reloadOptions(0xc000293500?, 0xc000118a80, 0xc000293500)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1121 +0x178
github.com/nats-io/nats-server/v2/server.(*Server).ReloadOptions(0xc00024fb00, 0xc000293500)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1060 +0x368
github.com/nats-io/nats-server/v2/server.(*Server).Reload(0xc00024fb00)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:995 +0x104
```

and

```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x10077b224]

goroutine 8 [running]:
testing.tRunner.func1.2({0x101351640, 0x101b7d2a0})
	/usr/local/go/src/testing/testing.go:1545 +0x274
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1548 +0x448
panic({0x101351640?, 0x101b7d2a0?})
	/usr/local/go/src/runtime/panic.go:920 +0x26c
github.com/nats-io/nats-server/v2/server.(*Account).EnableJetStream(0xc00020fb80, 0xc000220240)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:1045 +0xa4
github.com/nats-io/nats-server/v2/server.(*Server).configJetStream(0xc000226d80, 0xc00020fb80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:707 +0xdc
github.com/nats-io/nats-server/v2/server.(*Server).configAllJetStreamAccounts(0xc000226d80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:768 +0x2b0
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStreamAccounts(0xc000226d80?)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:637 +0x128
github.com/nats-io/nats-server/v2/server.(*Server).reloadAuthorization(0xc000226d80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:2039 +0x93c
github.com/nats-io/nats-server/v2/server.(*Server).applyOptions(0xc000226d80, 0xc000171c50, {0xc000074600, 0x2, 0xc000171c78?})
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1746 +0x2b8
github.com/nats-io/nats-server/v2/server.(*Server).reloadOptions(0xc000276000?, 0xc0000a6000, 0xc000276000)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1121 +0x178
github.com/nats-io/nats-server/v2/server.(*Server).ReloadOptions(0xc000226d80, 0xc000276000)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1060 +0x368
github.com/nats-io/nats-server/v2/server.(*Server).Reload(0xc000226d80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:995 +0x104
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-10-16 15:32:40 -07:00
Ivan Kozlovic
49907c4537 [FIXED] Configuration Reload: possible panic if done during Shutdown
If a configuration reload is issued as the server is being shutdown,
we could get 2 different panics. One due to JetStream if an account
is JetStream enabled, and one due to the send to a go channel that
has been closed.

```
panic: send on closed channel [recovered]
        panic: send on closed channel

goroutine 440 [running]:
testing.tRunner.func1.2({0x1038d58e0, 0x1039e1270})
        /usr/local/go/src/testing/testing.go:1545 +0x274
testing.tRunner.func1()
        /usr/local/go/src/testing/testing.go:1548 +0x448
panic({0x1038d58e0?, 0x1039e1270?})
        /usr/local/go/src/runtime/panic.go:920 +0x26c
github.com/nats-io/nats-server/v2/server.(*Server).reloadAuthorization(0xc00024fb00)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1998 +0x788
github.com/nats-io/nats-server/v2/server.(*Server).applyOptions(0xc00024fb00, 0xc00021dc00, {0xc00038e4e0, 0x2, 0xc00021dc28?})
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1746 +0x2b8
github.com/nats-io/nats-server/v2/server.(*Server).reloadOptions(0xc000293500?, 0xc000118a80, 0xc000293500)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1121 +0x178
github.com/nats-io/nats-server/v2/server.(*Server).ReloadOptions(0xc00024fb00, 0xc000293500)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1060 +0x368
github.com/nats-io/nats-server/v2/server.(*Server).Reload(0xc00024fb00)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:995 +0x104
```

and

```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x10077b224]

goroutine 8 [running]:
testing.tRunner.func1.2({0x101351640, 0x101b7d2a0})
	/usr/local/go/src/testing/testing.go:1545 +0x274
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1548 +0x448
panic({0x101351640?, 0x101b7d2a0?})
	/usr/local/go/src/runtime/panic.go:920 +0x26c
github.com/nats-io/nats-server/v2/server.(*Account).EnableJetStream(0xc00020fb80, 0xc000220240)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:1045 +0xa4
github.com/nats-io/nats-server/v2/server.(*Server).configJetStream(0xc000226d80, 0xc00020fb80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:707 +0xdc
github.com/nats-io/nats-server/v2/server.(*Server).configAllJetStreamAccounts(0xc000226d80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:768 +0x2b0
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStreamAccounts(0xc000226d80?)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:637 +0x128
github.com/nats-io/nats-server/v2/server.(*Server).reloadAuthorization(0xc000226d80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:2039 +0x93c
github.com/nats-io/nats-server/v2/server.(*Server).applyOptions(0xc000226d80, 0xc000171c50, {0xc000074600, 0x2, 0xc000171c78?})
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1746 +0x2b8
github.com/nats-io/nats-server/v2/server.(*Server).reloadOptions(0xc000276000?, 0xc0000a6000, 0xc000276000)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1121 +0x178
github.com/nats-io/nats-server/v2/server.(*Server).ReloadOptions(0xc000226d80, 0xc000276000)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1060 +0x368
github.com/nats-io/nats-server/v2/server.(*Server).Reload(0xc000226d80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:995 +0x104
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-10-16 15:25:02 -06:00
Derek Collison
4df6c9aeb8 [ADDED] TLS: Handshake First for client connections (#4642)
A new option instructs the server to perform the TLS handshake first,
that is prior to sending the INFO protocol to the client.

Only clients that implement equivalent option would be able to connect
if the server runs with this option enabled.

The configuration would look something like this:
```
...
tls {
    cert_file: ...
    key_file: ...

    handshake_first: true
}
```

The same option can be set to "auto" or a Go time duration to fallback
to the old behavior. This is intended for deployments where it is known
that not all clients have been upgraded to a client library providing
the TLS handshake first option.

After the delay has elapsed without receiving the TLS handshake from the
client, the server reverts to sending the INFO protocol so that older
clients can connect. Clients that do connect with the "TLS first" option
will be marked as such in the monitoring's Connz page/result. It will
allow the administrator to keep track of applications still needing to
upgrade.

The configuration would be similar to:
```
...
tls {
    cert_file: ...
    key_file: ...

    handshake_first: auto
}
```
With the above value, the fallback delay used by the server is 50ms.

The duration can be explcitly set, say 300 milliseconds:
```
...
tls {
    cert_file: ...
    key_file: ...

    handshake_first: "300ms"
}
```

It is understood that any configuration other that "true" will result in
the server sending the INFO protocol after the elapsed amount of time
without the client initiating the TLS handshake. Therefore, for
administrators that do not want any data transmitted in plain text, the
value must be set to "true" only. It will require applications to be
updated to a library that provides the option, which may or may not be
readily available.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-10-16 07:49:08 -07:00
Derek Collison
a797f0d794 Add fan-in/out benchmarks (#4660)
Benchmarks for NATS core fan-in and fan-out pattern workloads. 

Signed-off-by: Reuben Ninan <reuben@nats.io>
2023-10-14 10:00:31 -07:00
Derek Collison
aa21ef778d Report the raft group name in stream and consumer info (#4661)
Report the raft group name in stream and consumer info
    
Signed-off-by: R.I.Pienaar <rip@devco.net>
2023-10-14 09:56:44 -07:00
R.I.Pienaar
d61ecf8a89 Report the raft group name in stream and consumer info
Signed-off-by: R.I.Pienaar <rip@devco.net>
2023-10-14 12:28:36 +03:00
Reuben Ninan
524c1f544a Add fan-in/out benchmarks
Signed-off-by: reubenninan <reuben@nats.io>
2023-10-14 00:56:09 -04:00
Waldemar Quevedo
1528434431 Release v2.10.3 (#4658)
Signed-off-by: Your Name <wally@nats.io>
2023-10-12 14:30:25 -07:00
Waldemar Quevedo
996bf2bf1c Release v2.10.3
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-10-12 13:46:11 -07:00
Derek Collison
e2414e6a04 Bump to 2.10.3-RC.3
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-12 13:12:19 -07:00
Derek Collison
2a7d70c8cb [FIXED] Only mark fs as dirty vs full write on mb compaction. (#4657)
On streams that were constantly removing items, like KVs, this could
become over active when not needed. Simply mark the store as dirty for
next check.

Signed-off-by: Derek Collison <derek@nats.io>
2023-10-12 13:11:37 -07:00
Derek Collison
0a64f18060 Only mark fs as dirty vs full write on mb compaction.
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-12 12:59:19 -07:00
Derek Collison
ea70590aa2 Bump to 2.10.3-RC.2
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-12 12:35:54 -07:00
Derek Collison
444a47e97c [FIXED] Stream / KV lookups fail after decreasing history size. (#4656)
Fixed a bug that was not correctly selecting next first because it was
not properly skipping new dbit entries.
This could result in lookups failing, e.g. after a change in max msgs
per subject to a lower value.

Also fixed a bug that would not properly update our psim during compact
when throwing away the whole block and a subject had more than one
message.

Signed-off-by: Derek Collison <derek@nats.io>

Resolves: #4643
2023-10-12 12:30:06 -07:00
Derek Collison
b7b40b0a69 Fixed a bug that was not correctly selecting next first because it was not skipping dbit entries.
This could result in lookups failing, e.g. after a change in max msgs per subject to a lower value.

Also fixed a bug that would not prperly update psim during compact when throwing away the whole block and a subject had more than one message.

Signed-off-by: Derek Collison <derek@nats.io>
2023-10-12 10:58:37 -07:00
Derek Collison
1e8f6bf1e1 Fix updating a non unique consumer on workqueue stream not returning an error (#4654)
This is a possible fix for #4653.

Changes made:
1. Added tests for creating and updating consumers on a work queue
stream with overlapping subjects.
2. Check for overlapping subjects before
[updating](a25af02c73/server/consumer.go (L770))
the consumer config.
3. Changed [`func (*stream).partitionUnique(partitions []string)
bool`](a25af02c73/server/stream.go (L5269))
to accept the consumer name being checked so we can skip it while
checking for overlapping subjects (Required for
[`FilterSubjects`](a25af02c73/server/consumer.go (L75))
updates), wasn't needed before because the checks were made on creation
only.

There's only 1 thing that I'm not sure about.

In the [current work queue stream conflict
checks](a25af02c73/server/consumer.go (L796)),
the consumer config `Direct` is being checked if `false`, should we also
make this check before the update?

Signed-off-by: Pierre Mdawar <pierre@mdawar.dev>
2023-10-12 07:27:27 -07:00
Neil Twigg
ea0843fe26 Update DEPENDENCIES.md
Signed-off-by: Neil Twigg <neil@nats.io>
2023-10-12 11:21:45 +01:00
Pierre Mdawar
c46d8093bc Fix updating a non unique consumer on workqueue stream not returning an error 2023-10-12 12:18:24 +03:00
Byron Ruth
a25af02c73 Bump Travis Go version to 1.21.3 and 1.20.10 (#4649)
Signed-off-by: Byron Ruth <byron@nats.io>
2023-10-11 13:46:17 -04:00
Derek Collison
38794e5af9 Bump to 2.10.3-RC.1
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-11 08:26:09 -07:00
Derek Collison
94545f3206 [FIXED] Compaction with compression and added out of band compaction (#4645)
This will also reclaim more space for streams with lots of interior
deletes.


Signed-off-by: Derek Collison <derek@nats.io>
2023-10-11 08:22:10 -07:00
Derek Collison
842d600e3f Grab blk fn while mb lock held
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-11 07:54:36 -07:00
Derek Collison
9a551186d8 Fixed a crash in MQTT outgoing PUBREL (#4646)
This really was a cut/paste/typo error, the `else` should not have been
there. Came up in my testing.

The effect was that when there was a pending `PUBREL` in JetStream, and
a matching client connects - we would sometimes attempt to deliver the
PUBREL immediately once connected. `cpending` was already initialized,
but the pubrel map was not (yet).
2023-10-10 19:09:43 -07:00
Lev Brouk
de1282c98d Fixed a crash in MQTT outgoing PUBREL
This really was a cut/paste/typo error.

The effect was that when there was a pending PUBREL in JetStream, we would sometimes attempt to deliver it immediately once the client connected, cpending was already initialized, but the pubrel map was not (yet).
2023-10-10 18:08:18 -07:00
Derek Collison
f4387ec74e Fix for compaction with compression and added an out of band compaction in syncBlocks to reclaim more space.
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-10 17:17:55 -07:00
Ivan Kozlovic
ce96de2ed5 [ADDED] TLS: Handshake First for client connections
A new option instructs the server to perform the TLS handshake first,
that is prior to sending the INFO protocol to the client.

Only clients that implement equivalent option would be able to
connect if the server runs with this option enabled.

The configuration would look something like this:
```
...
tls {
    cert_file: ...
    key_file: ...

    handshake_first: true
}
```

The same option can be set to "auto" or a Go time duration to fallback
to the old behavior. This is intended for deployments where it is known
that not all clients have been upgraded to a client library providing
the TLS handshake first option.

After the delay has elapsed without receiving the TLS handshake from
the client, the server reverts to sending the INFO protocol so that
older clients can connect. Clients that do connect with the "TLS first"
option will be marked as such in the monitoring's Connz page/result.
It will allow the administrator to keep track of applications still
needing to upgrade.

The configuration would be similar to:
```
...
tls {
    cert_file: ...
    key_file: ...

    handshake_first: auto
}
```
With the above value, the fallback delay used by the server is 50ms.

The duration can be explcitly set, say 300 milliseconds:
```
...
tls {
    cert_file: ...
    key_file: ...

    handshake_first: "300ms"
}
```

It is understood that any configuration other that "true" will result
in the server sending the INFO protocol after the elapsed amount of
time without the client initiating the TLS handshake. Therefore, for
administrators that do not want any data transmitted in plain text,
the value must be set to "true" only. It will require applications
to be updated to a library that provides the option, which may or
may not be readily available.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-10-10 09:46:01 -06:00
Neil
6a5304cfac Add CONTRIBUTING.md, simplify PR template (#4619)
This simplifies the PR template, which is a bit cumbersome, and instead
replaces it with a simpler notice that includes a template sign-off and
a new `CONTRIBUTING.md` document.

Signed-off-by: Neil Twigg <neil@nats.io>
Co-authored-by: Byron Ruth <byron@nats.io>
2023-10-10 08:39:38 -04:00
Neil
8b39af0c5f Bump v2.10.3 (#4635) 2023-10-10 10:00:40 +01:00
Ivan Kozlovic
eadb19f539 Fixed code coverage GithHub Action (#4641)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-10-09 15:21:24 -06:00
Ivan Kozlovic
0a4f2e642e Fixed code coverage GithHub Action
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-10-09 13:07:54 -06:00
Waldemar Quevedo
72430e7998 Rename MQTT test.yaml to MQTT_test.yaml (#4637) 2023-10-06 15:39:51 -07:00
Byron Ruth
4ab65b1871 Bump v2.10.3
Signed-off-by: Byron Ruth <byron@nats.io>
2023-10-06 16:39:45 -04:00
Byron Ruth
203c4b9c2d Release v2.10.2 (#4634) 2023-10-06 16:30:39 -04:00
Byron Ruth
f8c9d8e686 Release v2.10.2
Signed-off-by: Byron Ruth <byron@nats.io>
2023-10-06 15:23:06 -04:00
Byron Ruth
95dd8e7a71 Pin Go versions in Travis CI (#4633)
Signed-off-by: Byron Ruth <byron@nats.io>
2023-10-06 12:09:27 -07:00
Derek Collison
0c3609ed2a Bump to 2.10.2-RC.15
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-06 09:58:55 -07:00
Derek Collison
f29c7863e7 [FIXED] Setting initial min on dmap caused subtle bugs with dmap. (#4631)
Under heavy load with max msgs per subject of 1 the dmap, when
considered empty and resetting the initial min, could cause lookup
misses that would lead to excess messages in a stream and longer restore
issues.

Signed-off-by: Derek Collison <derek@nats.io>
2023-10-06 09:58:17 -07:00
Derek Collison
dd646f6b71 Set initial min on dmap caused subtle bugs with dmap. Some minor cleanup.
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-06 09:42:09 -07:00
Lev
beee6fc72a [FIXED] MQTT PUBREL header incompatibility (#4616)
https://hivemq.github.io/mqtt-cli/docs/test/ pointed out the
incompatibility.
2023-10-05 08:07:50 -07:00
Waldemar Quevedo
4e414f1f05 Skip processing consumer assignments after JS has shutdown (#4625)
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-10-04 13:17:22 -07:00
Neil
4c791d6288 Reduce allocations in WebSockets (#4623) 2023-10-04 20:07:21 +01:00
Neil Twigg
7124dc7bdc Revert changes to nbPoolPut, force compressor to forget byte buffer
Signed-off-by: Neil Twigg <neil@nats.io>
2023-10-04 17:41:36 +01:00
Neil Twigg
e20ca9043f Don't append empty slices in the unfragmented path
Signed-off-by: Neil Twigg <neil@nats.io>
2023-10-04 17:18:47 +01:00
Neil Twigg
6b65452bc7 Reduce allocations in WebSocket compression
Signed-off-by: Neil Twigg <neil@nats.io>
2023-10-04 12:36:32 +01:00
Derek Collison
dbe700d192 Bump to 2.10.0-RC.14
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 16:11:30 -07:00
Derek Collison
3f1afb4ca2 [IMPROVED] Bumped inflight updates to 16 and move one lock to rlock. (#4621)
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 16:10:59 -07:00
Derek Collison
21e272360d [IMPROVED] Memory growth on compressed websocket connections. (#4620)
Holding onto the compressor and not recycling the internal byte slice
could cause havoc with GC.

This needs to be improved but this at least should allow the GC to
cleanup more effectively.

Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 15:37:01 -07:00
Derek Collison
2d21bc7008 Fix datarace
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 15:35:20 -07:00
Derek Collison
1ccc6dbf30 Bumped inflight updates to 16 and move one lock to rlock.
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 15:01:34 -07:00
Derek Collison
2f1a384bcb Holding onto the compressor and not recycling the interbal byte slice was causing havoc with GC.
This needs to be improved but this at least should allow the GC to cleanup more effectively.

Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 14:39:00 -07:00