From 355dc40016261a11bf43a8b56e6a72e91b1a9723 Mon Sep 17 00:00:00 2001 From: Stephen Asbury Date: Mon, 20 May 2019 12:43:12 -0700 Subject: [PATCH] Added whats new for 2.0 --- SUMMARY.md | 1 + whats_new/whats_new_20.md | 63 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+) create mode 100644 whats_new/whats_new_20.md diff --git a/SUMMARY.md b/SUMMARY.md index eddbe06..421072e 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -1,6 +1,7 @@ # Summary * [Introduction](README.md) +* [What's New in 2.0](whats_new/whats_new_20.md) ## NATS Server diff --git a/whats_new/whats_new_20.md b/whats_new/whats_new_20.md new file mode 100644 index 0000000..e7c8316 --- /dev/null +++ b/whats_new/whats_new_20.md @@ -0,0 +1,63 @@ +# What's New in NATS 2.0 + +NATS 2.0 is the largest feature release since the original code base for the server was released. It includes features for security, multi-tenancy, larger networks, and secure sharing, allowing a new way of thinking about NATS as a globally shared utility and as a new and innovative way to solve on-premise communications for modern systems design. + +## Multi-Tenancy and Accounts + +Ever since NATS was originally designed, it has always been considered to be single tenant. Even when we added support for multiple users, the subject space was still a shared space, and care had to be taken when designing subject spaces so as to not interfere with other applications and services on the NATS systems. +As a matter of fact, designing the subject space, and most importantly the prefixes based on organization, or business unit or business function/service, was encouraged as an early exercise. However, even with the best intentions, large NATS systems became overly complex and managing and understanding the subject space was challenging at large scale. Even as we introduced authorization primitives to control which subjects users could publish or subscribe to, the challenge still remained. +NATS 2.0 introduces accounts. Accounts can be viewed as containers for a subject space. Instead of there being a global subject space that was bound to a server, NATS 2.0 now recognizes accounts, including the default global account $G that defines which data structures the server will utilize to match publishers to subscribers. These subject spaces are isolated and by default, no messages cross from one account to another. This allows multiple groups or organizations to use a simplified subject space that in the past, could have overlapped with others in a larger installation. It also allows for a single NATS infrastructure to be utilized by different users without the need to maintain many silos of NATS servers and clusters. +An account binds the subject space data structures, a sublist, and its associated caches, to a list of users. A client application is not aware of this account binding and this was by design to preserve existing clients and the simplicity of the NATS philosophy. When connected to a NATS server, the user is authenticated and validated, and the configuration tells the server which account to bind to the user. The messages then flow as normal, as published messages are matched to subscribers. + +## Streams and Services - Account Sharing + +With the introduction of secure multi-tenancy in NATS 2.0, the desire for accounts expanded beyond just isolation. The NATS core team felt that this could possibly be used to simplify and ease the use of NATS when being utilized for many applications and services that want to communicate over certain subjects. Think of this as the microservices equivalent for NATS applications. In NATS 2.0 we define two abstract concepts that form the basis of sharing messages across accounts; Streams and Services. Streams are published messages defined by a subject, which could include a wildcard. Services are listening endpoints bound to subscribers that usually receive requests and return responses. + +Both Streams and Services can be exported by an account owner for subsequent import by other accounts. There are two very important aspects of export and import that make account sharing even more powerful. First, exports can be public, meaning anyone can import, or then can require authorization. An account that wants to import the stream or service must have permission from the exporting account owner for the messages to flow between accounts. Second, the account owner always controls their own subject space. To this end, when importing, the account owner can pick and choose where the import will show up in their own subject space. This is a very powerful concept that allows some tremendous opportunities. + +For instance, in NGS, a commercial offering from Synadia, the power of accounts and sharing is utilized by providing a service imported to all new users of the system allowing them to get their current usage. As users connect and disconnect from the system, the usage servers in NGS gather the events from the system account and calculate the real-time usage information for each account. There is also a usage demo account that can export a single service with a wildcard, ____ Anyone on the demo account can send a request to **ngs.usage.ACCOUNT_PUB_NKEY** with the body of the message being duration, e.g. 1h, 2d, etc. and receive the usage for that time period for the account. Although the usage is available in NGS, no one actually has access to the demo account since that would mean anyone on the system could get the usage for any known account just by the account’s public key. Which would not be desirable. The export on ____ requires authorization, so no one can just blindly import the usage data. When a new account registers on the NGS system, the system will get a signed authorization token for the ability to import ONLY their account usage, e.g. ****. Moreover, the new account owner is allowed to specify where the import is placed, and in this case, we place it on ****. Thus for every account, the account can send a simple request to **** and receive their own, and only their own, account usage. All accounts look exactly the same. And in this example we utilized three (3) accounts to essentially have secure domains, or DMZs, to provide the security and control what we want to achieve. + +Under the covers, the NGS system does several things to pull this off. For streams, the import that is present, will be checked when the importing account does a subscription. If any of the import is covered by the subscription, the system will create a secure shadow subscription on the exporting account subject space for the exported subject. This will map message sent to the original export subject (or a subset scope if it contains wildcards) into the import account. Services are a bit trickier, in essence, you are importing and mapping a subject that the import account can publish to. In our example above with ****, this is exactly what happens. When a new account publishes a message to **** the system can map that message into the exporting account’s subject space. But only doing this would be insufficient, since most of the time the published message is a request, meaning it has a reply subject and expects a response. The system needs to ensure that the response can also cross the account boundary in a secure and meaningful way. NATS 2.0 goes further and anonymizes the reply subject such that the responder has no way of knowing the requestor (unless the body of the message has a clue). NATS knows the requestor is guaranteed to be authorized, but there is no other way to understand or track the requestor. The system sets up ephemeral state to map the random and anonymous reply subject back into the importing account and with the correct reply subject. +We believe as time goes on, accounts and more specifically account sharing of streams and services, will be the most influential feature we introduced in the NATS 2.0 release. It will change the way people architect system on technologies like NATS. It will allow systems to scale but maintain simplicity and security in a way that was not possible before. + +## Security and Authentication + +Prior to NATS 2.0, NATS allowed multi-user deployments and supported username and password. As NATS evolved we supported bcrypt-based password and encouraged NATS users to utilize the tool provided to avoid having any plaintext passwords in their configuration files. However, this still felt insufficient. For NATS 2.0 we wanted to have a system that never had any access whatsoever to private keys, etc. The most common way to do this is via public key technologies. There are ways to prove one’s identity without ever sharing the private key from a public/private key pair. The team chose [Ed25519](https://ed25519.cr.yp.to/index.html) as its key algorithm for NATS 2.0. We may add others in the future, but our selection was specific in that Ed25519 can only be used to sign, and has very few branches in its code, providing a higher level of resistance to side-channel attacks like Spectre and Meltdown. NATS has an encoding of these keys making them easily distinguishable and selectable from a copy/paste perspective. + +The NATS 2.0 server upon client connect will pass a nonce field in the INFO. Clients use this nonce and their private key to sign the nonce. Servers are configured with authorized users’ public keys only. They can use the public key to validate that the client has the private key that corresponds to the public key. This combined with TLS provides a powerful way to secure and manage NATS clients and users without ever having access to a client’s private key. + +In addition for NATS 2.0, we also allow the system to extract a user identity from an x.509 client certificate. So for those who require client certificates and have the management and system infrastructure in place for client-side certificates, you will no longer have to duplicate identity to map to authorization primitives in NATS 2.0. + +## Network Topologies (Rewrite for Routes and Introduction of Gateways) + +With the introduction of accounts, the existing clustering protocol for NATS needed to be rewritten. The protocol had some inefficiencies we wanted to correct for quite some time, and the need to introduce account scoping into the clustering protocol gave us an opportunity to correct some of these wrongs. Each message traveling between servers now had to designate which account it was bound to, or the account under which it was published. This allows remote servers to then match the account and subsequently match subscribers from that account for any given message. + +Two inefficiencies that existed in the original protocol were handling of duplicates in the interest graph, and message delivery across a route in the presence of both normal subscribers and queue-based subscribers. Subscriptions (the interest graph) were always sent across a route to the other side even if the subjects were the same. The other side was smart enough not to send the message more than once for a given subject, however, the interest graph traffic could get very chatty even under normal behaviors. We also would send the message again between servers for every queue group that existed for a given message. This has served us well to date, but the team wanted to be able to update and improve this. + +In NATS 2.0 we have greatly reduced the interest graph traffic for duplicative subjects and there are more optimizations we can make without needing to change the protocol. We also only send a message once regardless of the interest being represented by normal and queue-based subscribers. The way these two features work is fairly straightforward. For interest propagation, we keep a suppression map that uses reference counts to only send interest for a given subject once, when it goes from 0->1, and when it goes from 1->0. All other changes on the local cluster simply increment or decrement the count in the map. For sending a message only once in the face of any interest, we force the protocol for clusters to require an affirmative list of queue groups to which the receiving server can possibly deliver the message. This allows the distributed queues to work correctly without requiring the “directed” message sends between servers. + +Also in NATS 2.0 we introduced a new connection type called gateways. Gateways allow multiple NATS clusters to be connected together into a supercluster. The team was faced with many challenges in the design of gateways. We wanted the abstract to look fully connected, but we did not want an explosion of connections at large scale. We were also concerned with interest graph traffic that would need to propagate globally when all changes were actually local to a cluster. + +Gateways can either be inbound or outbound, but not both. Every server in an origin cluster has exactly one outbound connection where it can send messages to the remote cluster. In addition, they may have zero or more inbound connections for which they send interest and receive messages themselves from the remote cluster. This forms a type of full mesh but with much fewer connections per server, and of course avoids any SPOF scenarios where only one server would be selected to perform communications between the clusters. + +In addition, outbound connections can be in one of two states, optimistic mode or full knowledge. In full knowledge, all interest graph changes need to be propagated globally for every 0->1 or 1->0 change. This could result in quite a bit of wasted control messages being sent and received. The team designed a way to have a connection in optimistic send mode to start. In this mode, a message will be sent over if we have no information at all about the message’s subject. If the remote side truly did not have any interest, it can asynchronously send a message indicating so, either for the subject itself, or the whole account. Obviously, this has some edge cases where we are sending more messages (both data and control) then we were saving. So a connection can change its state and move to full knowledge, meaning it wants to know all interest and it will not send a message unless it has knowledge that it should do so. We are continually tweaking this algorithm, but extremely excited about the power it has to create large superclusters. Synadia uses this technology in its NGS offering. + +Gateways and SuperClusters support distributed queue subscriptions just like today. However, we have added selection criteria that will favor queue subscribers in a local cluster to the publisher, and only cross a gateway connection if no local cluster queue subscribers for a given group are available. We feel this is a powerful feature on top of what distributed queues give us already with transparent horizontal scalability. Combined with the recently introduced drain capabilities, allowing queue subscribers to exit a group without dropping any messages, this is an extremely powerful way to design scalable service endpoints for modern systems. This also provides zero configuration and high availability failover across multiple locations - entirely transparent to clients.. + +## System Account + +With NATS 2.0, NATS servers can now send and receive messages themselves. Prior to 2.0, NATS could only send and receive messages on behalf of connected clients. NATS servers now can generate events on things like connect and disconnect events and can receive messages and respond to requests. This is exciting for the team as we can now enable many things that were only possible before with protocol changes. In addition, internal messaging is bound to an account just like all users are in NATS 2.0. So all of the security, sharing of stream and services and mapping of subject spaces applies equally to system accounts as it does to any other user and account. Synadia’s NGS system uses system accounts to enforce limits globally and generate events that allow us to track usage. It also allows push-based updates in our decentrally managed system, described below. + +## Decentralized Management + +As systems get larger, management through config files only becomes challenging and brittle. With the introduction of accounts and superclusters, we wanted to encourage large, shared NATS clusters that could power large organizations, even entire companies and possibly even a global utility. Config file-based management would not suffice. + +In NATS 2.0 the system can be configured to be completely de-centrally managed. In addition to Nkeys (Ed25519) keys, we introduce JWTs (only signed by Nkeys) into our system and a hierarchy that would utilize a root of trust. In NATS 2.0, the system understands operators, accounts and users. Each one is represented by a JWT. The operator JWT is self-signed and is the only thing required to be configured for each server. Account owners have JWTs signed by an operator and then generate user JWTs which are self-contained in terms of identity and authorization. + +When a server receives a client connect request, the client presents its user JWT (signed by an account) and a signed nonce, generated by its private key. The server validates the user JWT, checks the signature and if all are valid, will then use a resolver to pull the account JWT based on the public key in the user JWT. It will then validate the account JWT and finally check that the account JWT was signed by a signing key owned by the operator for which the server was configured. + +This allows a very scalable, yet very secure mechanism to allow rapid change of permissions, authentication, limits, etc. to a global shared NATS system. + +## LeafNodes - The best of both worlds + +We are also introducing leaf nodes. A leaf node is a NATS server or small NATS cluster, running in a designated mode, that will connect to another cluster/supercluster. It will securely and transparently bind to an account on the remote cluster, and bridge communications between local clients and those on the remote cluster. This allows existing clients and systems to integrate with NATS 2.0 systems with increased security, accounts, and sharing all with no changes to an existing NATS application. For instance, we could run a leaf node server that allows existing applications with their own authorization to communicate with new clients that are secured by accounts and nkeys and utilize accounts for communication isolation and the sharing as the systems interfaces. With NATS 2.0 and a local leaf node server (a normal NATS 2.0 server) you could import streams and services and have them accessed by current applications with their own authentication (or none if physical security is in place). This means you can transparently use NGS, our global utility, and combine that with on-premise servers and clusters. You could securely and transparently use NGS to connect different regions or access meaningful streams and services as they become available on NGS. Leaf nodes utilizing accounts effectively create a DMZ between a local or private NATS deployment and NGS or external NATS clusters or superclusters.