Commit Graph

5598 Commits

Author SHA1 Message Date
Ivan Kozlovic
29ea280fe7 [FIXED] JetStream: send "bad request" response for malformed API requests
An example was a "consumer info" request with a consumer name that
had tokens, which is illegal. This results in the request being
dropped in apiDispatch() because there was no interest.
The server will now return a "bad request" error in such case.

Resolves #2995

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-04 11:25:55 -06:00
Ivan Kozlovic
ee1341fa17 Merge pull request #2996 from nats-io/ws_mqtt_in_varz
[ADDED] Monitoring: MQTT and Websocket blocks in `/varz` endpoint
2022-04-04 10:53:19 -06:00
Ivan Kozlovic
14f54b8dd7 [ADDED] Monitoring: MQTT and Websocket blocks in /varz endpoint
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-04 10:11:55 -06:00
Ivan Kozlovic
18bdabff35 Merge pull request #2994 from nats-io/add_rate_limited_warnings
[CHANGED] Rate limit (some) similar warnings
2022-04-01 18:19:34 -06:00
Ivan Kozlovic
366d217f44 Some changes based on review
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-01 17:55:33 -06:00
Ivan Kozlovic
19783a9f11 [CHANGED] Rate limit similar warnings
Some warnings, especially when dealing with JS limits that were
printed on a per-message basis, are now limited to ~1 per second
if the content of the warning is already found in a map.

This is also for "client" warnings, but the client porting of the
warning is not taken into account so that helps with reducing logging
for similar content, but coming from different clients.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-01 15:24:03 -06:00
Matthias Hanel
a77f95faa8 error handling and info when moving a stream from non existing tier (#2992)
adds unit test to test this scenario
improves reporting of correct error
only show info for non existing tiers where streams exist

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-01 14:21:35 -04:00
Matthias Hanel
33d9f189cc start using unit test TestJWTClusteredJetStreamTiers, size was off (#2988)
If both servers sent a remote update of their local use,
the limit was hit. But that limit wass to small by 200

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-01 12:51:28 -04:00
Derek Collison
3c09064117 Merge pull request #2991 from nats-io/seal_restart
[FIXED] Sealed streams would not recover on server restart.
2022-04-01 06:47:24 -07:00
Derek Collison
7f78d3e618 Not allowing streams to be created meant we could not recover on server restart.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-01 06:41:22 -07:00
Jaime Piña
32b17f7a7e Skip SystemLimitsPlacement if we can't get the desired leader (#2989) 2022-03-31 16:24:29 -07:00
Ivan Kozlovic
c917141df8 Merge pull request #2987 from nats-io/js_tweak_snapshot_defaults
Updated snapshot default chunk size and window size
2022-03-31 16:30:08 -06:00
Ivan Kozlovic
52200ab8bb Updated snapshot default chunk size and window size
Down from 256KB to 128KB for chunk and 32MB to 8MB.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-31 16:14:18 -06:00
Matthias Hanel
241bf5df0d Fixed wrong error check (#2986)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-31 18:03:06 -04:00
Ivan Kozlovic
d4e2fde45c Bump version to beta.7
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-31 12:36:15 -06:00
Matthias Hanel
64feb142a9 In Merge validate nkey and subsequent saveIfNewer error on invalid jwt (#2985)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-31 14:25:25 -04:00
Matthias Hanel
92f4dc986a added max_ack_pending setting to js account limits (#2982)
* added max_ack_penind setting to js account limits

because of the addition, defaults now have to be set later (depend on
these new limits now)

also re-organized the code to closer track how stream create looks

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-31 14:17:16 -04:00
Ivan Kozlovic
ee98a81469 Merge pull request #2984 from nats-io/data_race_and_flappers
Fixed data race and some flappers
2022-03-31 11:11:46 -06:00
Ivan Kozlovic
34650e9dd5 Fixed data race and some flappers
Data race that has been seen:
```
Read at 0x00c00134bec0 by goroutine 159:
  github.com/nats-io/nats-server/v2/server.(*client).msgHeaderForRouteOrLeaf()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:2935 +0x254
  github.com/nats-io/nats-server/v2/server.(*client).processMsgResults()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4364 +0x2147
(...)
Previous write at 0x00c00134bec0 by goroutine 201:
  github.com/nats-io/nats-server/v2/server.(*Server).addRoute()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1475 +0xdb4
  github.com/nats-io/nats-server/v2/server.(*client).processRouteInfo()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:641 +0x1704
```

Also fixed some flappers and removed use of `s.js.` since we have
already captured `js` in Jsz monitoring.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-31 10:05:34 -06:00
Derek Collison
96aa5eee7a Merge pull request #2983 from ripienaar/skipjz_nonjs
skips jsz on non js machines when leader only requested
2022-03-31 05:50:46 -07:00
R.I.Pienaar
4c4aa3e87f skips jsz on non js machines when leader only requested
This is a regression introduced in 055703f4fa
that leads to panics in management tooling

Signed-off-by: R.I.Pienaar <rip@devco.net>
2022-03-31 12:02:09 +02:00
Derek Collison
d634d237ca Bump version to 2.8.0-beta.6
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-30 20:26:49 -07:00
Derek Collison
2d7f941fea Merge pull request #2978 from nats-io/issue-2969
Fixes #2969, on reload stream import was not removed for js streams
2022-03-30 19:57:15 -07:00
Derek Collison
5182154cd2 We were not accounting for some newer internal clients (JETSTREAM, ACCOUNT, etc) when reloading authorization, etc.
We were also not copying over local state that has been added over the years to track different types of clients.
We also needed to make sure to reuse the account's internal client and the subscription id (acc.isid).

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-30 19:12:18 -07:00
Ivan Kozlovic
f207f90728 Merge pull request #2981 from nats-io/fix_2980
[FIXED] Monitoring: verify_and_map in tls{} config would break monitoring
2022-03-30 19:39:15 -06:00
Ivan Kozlovic
7bb7309f4c [FIXED] Monitoring: verify_and_map in tls{} config would break monitoring
This was introduced in v2.6.6. In order to solve a config reload
issue, we used tls.Config.GetConfigForClient which allowed the
TLS configuration to be "refreshed" with the latest. However, in
this case, the tls.Config.ClientAuth was not reset to tls.NoClientCert
which we need for monitoring port.

Resolves #2980

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-30 18:50:52 -06:00
Ivan Kozlovic
520aa322e4 Merge pull request #2977 from nats-io/js_pull_cons_req_cross_accounts_panic
[FIXED] Possible panic due to data races
2022-03-30 16:49:54 -06:00
Ivan Kozlovic
4ddbdbd74c Rewrite trackDownAccountAndInterest() to make it easier to read
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-30 16:41:22 -06:00
Matthias Hanel
3933c1f3d8 Fixes #2969, on reload stream import was not removed for js streams
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-30 18:12:57 -04:00
Ivan Kozlovic
c0ab2d4959 [FIXED] Possible panic due to data races
A panic was reported that looked like this:
```
fatal error: concurrent map read and map write
goroutine 200 [running]:
runtime.throw({0xa366ce, 0xe620e0})
	/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/panic.go:1198 +0x71 fp=0xc00105f098 sp=0xc00105f068 pc=0x434ff1
runtime.mapaccess1_faststr(0x0, 0x0, {0xc0054b6f18, 0x11})
	/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:21 +0x3a5 fp=0xc00105f100 sp=0xc00105f098 pc=0x412285"
github.com/nats-io/nats-server/v2/server.(*consumer).processNextMsgReq(0xc000681000, 0xc00105f2a8, 0x4503e9, 0x11, {0x0, 0xc000246900}, {0xc0054b6f18, 0x11}, {0xc0002469c4, 0x90, ...})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/consumer.go:2454 +0x8ce fp=0xc00105f250 sp=0xc00105f100 pc=0x77dc2e
github.com/nats-io/nats-server/v2/server.(*consumer).processNextMsgReq-fm(0x9c, 0x7f302e954fff, 0xc00105f2f8, {0xc000774280, 0x400}, {0xc0054b6f18, 0x40}, {0xc0002469c4, 0x90, 0x63c})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/consumer.go:2380 +0x77 fp=0xc00105f2b8 sp=0xc00105f250 pc=x91e337
github.com/nats-io/nats-server/v2/server.(*client).deliverMsg(0xc0015f8000, 0xc003034f00, 0x41642f, {0xc000246969, 0x4b6166, 0x697}, {0xc0002469a9, 0x4b60be, 0x657}, {0xc0015f9480, ...}, ...)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3180 +0xbb0 fp=0xc00105f530 sp=0xc00105f2b8 pc=0x764470
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults(0xc0015f8000, 0x8cd7a5, 0xc0089fb440, {0xc0002469c4, 0x92, 0x63c}, {0x0, 0x0, 0x4}, {0xc000246969, ...}, ...)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4163 +0x9af fp=0xc00105fa48 sp=0xc00105f530 pc=0x769e4f
github.com/nats-io/nats-server/v2/server.(*client).processInboundRoutedMsg(0xc0015f8000, {0xc0002469c4, 0xc0015f8220, 0x63c})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:443 +0x159 fp=0xc00105fae8 sp=0xc00105fa48 pc=0x8ce299
github.com/nats-io/nats-server/v2/server.(*client).processInboundMsg(0xc0015f8000, {0xc0002469c4, 0x92, 0x79e})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3493 +0x36 fp=0xc00105fb18 sp=0xc00105fae8 pc=0x765c76
github.com/nats-io/nats-server/v2/server.(*client).parse(0xc0015f8000, {0xc000246800, 0x800, 0xc087258a5d30c937})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x246a fp=0xc00105fd98 sp=0xc00105fb18 pc=0x8a4f6a
github.com/nats-io/nats-server/v2/server.(*client).readLoop(0xc0015f8000, {0x0, 0x0, 0x0})"
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1227 +0xe1f fp=0xc00105ffb0 sp=0xc00105fd98 pc=0x75841f
github.com/nats-io/nats-server/v2/server.(*Server).createRoute.func1()
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1372 +0x25 fp=0xc00105ffe0 sp=0xc00105ffb0 pc=0x8d46a5
runtime.goexit
```

Writting a test showed the data race:
```
==================
WARNING: DATA RACE
Read at 0x00c0008ea240 by goroutine 62:
  runtime.mapaccess1_faststr()
      /usr/local/go/src/runtime/map_faststr.go:12 +0x0
  github.com/nats-io/nats-server/v2/server.(*consumer).processNextMsgRequest()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/consumer.go:2567 +0xa64
(...)
Previous write at 0x00c0008ea240 by goroutine 15:
  runtime.mapdelete_faststr()
      /usr/local/go/src/runtime/map_faststr.go:300 +0x0
  github.com/nats-io/nats-server/v2/server.(*Account).checkForReverseEntry()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1759 +0x61c
  github.com/nats-io/nats-server/v2/server.(*client).unsubscribe()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:2838 +0xa27
(...)
```

After fixing this data race, another showed up:
```
==================
WARNING: DATA RACE
Read at 0x00c000352200 by goroutine 99:
  github.com/nats-io/nats-server/v2/server.(*Account).checkForReverseEntry()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1752 +0x4b3
  github.com/nats-io/nats-server/v2/server.(*client).unsubscribe()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:2838 +0xa27
(...)
Previous write at 0x00c000352200 by goroutine 92:
  runtime.slicecopy()
      /usr/local/go/src/runtime/slice.go:284 +0x0
  github.com/nats-io/nats-server/v2/server.(*Account).checkForReverseEntry()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1737 +0x871
  github.com/nats-io/nats-server/v2/server.(*Account).removeRespServiceImport()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1622 +0x24c
(...)
```

This PR addresses both.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-30 13:51:52 -06:00
Matthias Hanel
1445153130 Adding max stream bytes check (#2970)
* Adding max stream bytes check

Also start checking on  stream update

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-30 15:50:28 -04:00
Derek Collison
083b5efb6c Bump to 2.8.0-beta.5
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-30 09:20:28 -07:00
Derek Collison
b47c946a85 Merge pull request #2975 from nats-io/oc-perf
[IMPROVED] Performance of OrderedConsumer with longer RTT.
2022-03-30 09:19:02 -07:00
Derek Collison
76eaa5ba8b Update catchup as well
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-30 08:58:41 -07:00
Ivan Kozlovic
f0abf4627f Merge pull request #2976 from samuel-form3/add-healthz-logs
[ADDED] Logs to healthcheck handler
2022-03-30 09:21:22 -06:00
Samuel Torres
9868bb71a7 Add logs to healthcheck handler
Kubernetes probes don't use nor log the reponse body of health
endpoints. This means that for some reason a nats node running in
Kubernetes becomes on a Not Ready state we won't have a way to know why
other than to manually access the cluster and call the /healthz endpoint
manually and see the error.

This change adds an error log so we can observe what is going wrong with
a nats node that is not ready.

Signed-off-by: Samuel Torres <samuel.torres@form3.tech>
2022-03-30 14:14:22 +01:00
Derek Collison
eb16c35016 OrderedConsumer was very conservative with slow start and small max outstanding bytes. This is increasing perf for longer rtt.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-30 05:08:36 -07:00
Derek Collison
7faa8bf5b7 Bump to 2.8.0-beta.4
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-29 19:16:13 -07:00
Derek Collison
76b56b6b0e Fix for a flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-29 19:15:40 -07:00
Ivan Kozlovic
f388839c59 Merge pull request #2974 from nats-io/fix_races_and_some_tests
Fix races and some tests
2022-03-29 19:37:00 -06:00
Derek Collison
bfc1462fb3 Merge pull request #2973 from nats-io/issue-2936
[IMPROVED] Consumer snapshot logic in clustered mode and disk usage.
2022-03-29 18:29:31 -07:00
Ivan Kozlovic
e9b9c39853 Ping staticcheck to previous release since getting:
```
go: downloading golang.org/x/mod v0.6.0-dev.0.20220106191415-9b9b3d81d5e3
../../../../pkg/mod/honnef.co/go/tools@v0.3.0/go/ir/builder.go:36:2: //go:build comment without // +build comment
The command "go install honnef.co/go/tools/cmd/staticcheck@latest" failed and exited with 1 during
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-29 19:11:00 -06:00
Derek Collison
607858f213 Improved consumer snapshot logic in clustered mode and disk usage.
Also fixed a bug that could cause memory based replicated consumers to no longer work after snapshots and server restarts.

The snapshot logic would allow non-state changing updates to continously grow the raft logs. We also were too conservative on when we snapshotted and why.
Also added in ability to have FileStore.Compact() reclaim space from the block file from the head of last changed block.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-29 18:02:49 -07:00
Ivan Kozlovic
98c1f0ecb2 Fixed some data race and some flappers
Got a data race:
```
==================
WARNING: DATA RACE
Write at 0x00c001c736b0 by goroutine 605:
  runtime.mapassign_faststr()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:202 +0x0
  github.com/nats-io/nats-server/v2/server.(*Account).addServiceImport()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1868 +0xb7b
  github.com/nats-io/nats-server/v2/server.(*Account).AddServiceImportWithClaim()
...
Previous read at 0x00c001c736b0 by goroutine 301:
  runtime.mapaccess2_faststr()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:107 +0x0
  github.com/nats-io/nats-server/v2/server.(*Server).registerSystemImports()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:1577 +0x284
  github.com/nats-io/nats-server/v2/server.(*Server).updateAccountClaimsWithRefresh()
...
```

Also, remove some condition in gateway.go on how we were checking
if a subject was a serviec reply, which was causing a test to flap.

Finally, used AckSync() in a rest (instead of m.Respond(nil)) to
prevent it from flapping.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-29 19:02:41 -06:00
Ivan Kozlovic
953dad4405 Merge pull request #2972 from nats-io/js_lower_default_max_ack_pending
[CHANGED] JetStream: lower default consumer's maximum ack pending
2022-03-29 16:54:52 -06:00
Ivan Kozlovic
e1c581334e [CHANGED] JetStream: lower default consumer's maximum ack pending
The default value is lowered from 20,000 to 1,000. This does not
seem to have a performance degradation impact, but may help
with scalability at scale.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-29 15:30:40 -06:00
Matthias Hanel
1aeaaf0ca3 Adding server limits (max ack pending/dedupe window) to js config (#2967)
* Adding server limits (max ack pending/dedupe window) to js config

Also shifting consumer config check to jsConsumerCreate as in clustered
mode this was enforced in the wrong place

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-29 13:19:36 -04:00
Matthias Hanel
9ee03aedd1 update message size was too short when nothing needed to be sent (#2968)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-28 23:37:30 -04:00
Derek Collison
4a2dee125e Bump version to 2.8.0-beta.3
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 17:58:06 -07:00
Matthias Hanel
0c5f3688a7 [ADDED] Tiered limits and fix limit issues on updates (#2945)
* Adding tiered limits and fix limit issues on updates

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-28 20:47:54 -04:00