Cross Region RPC

When Sentry is deployed in a multi-region deployment (like sentry.io) there are many scenarios and workflows where a region silo requires information that is stored in Control Silo. Similarly there are flows where Control Silo operations need to read or mutate state stored in regions.

When data needs to be read, or mutated synchronously, we use Remote Procedure Calls (RPC).

Design Context

Before we get into specifics, it is useful to understand the requirements that the system was designed under. We needed the following attributes in our solution:

  • Needs to have consistent parameter and return types in local development, CI and multi-region production.
  • Serialization and transport concerns should be hidden from most application logic, as the transport depends on how the application is deployed.
  • Strong support for Python typing
  • Solid support for schema generation, and validation tooling.

Components of the RPC framework

The cross-region RPC framework is composed of several RPC services, and RPC models. Each RPC service provides a bundle of methods for different product domains. For example, organizations, users and integrations are separate RPC services.

Each RPC service is composed of a service interface and local implementation.

Copied
from sentry.services.hybrid_cloud.region import ByOrganizationSlug
from sentry.services.hybrid_cloud.rpc import RpcService, regional_rpc_method
from sentry.silo.base import SiloMode
  
class OrganizationService(RpcService):
    key = "organization"
    local_mode = SiloMode.REGION
    
    @classmethod
    def get_local_implementation(cls) -> RpcService:
        from sentry.services.hybrid_cloud.organization.impl import (
          DatabaseBackedOrganizationService
        )

        return DatabaseBackedOrganizationService()

    @regional_rpc_method(
      resolve=ByOrganizationSlug(),
      return_none_if_mapping_not_found=True
    )
    @abstractmethod
    def get_org_by_slug(
        self,
        *,
        slug: str,
        user_id: int | None = None,
    ) -> RpcOrganizationSummary | None:
        ...

Service classes act as stubs that define the interface a service will have. In the above example, we see a few important pieces of the RPC machinery:

  • key defines the service name that is used in URLs and as a key when building method indexes.
  • local_mode defines which silo mode this service uses it’s ‘local implementation’. Services can have local implementations in either silo mode.
  • get_local_implementation is used by the RPC machinery to find the implementation service when the silo mode matches or is MONOLITH.

RPC methods like get_org_by_slug must be defined as abstractmethod and must have either rpc_method or regional_rpc_method applied. If a method has local_mode = REGION it should use regional_rpc_method with a resolve ‘resolver’. There are several resolvers that accomodate a variety of method call signatures:

  • ByOrganizationSlug will extract the organization_slug parameter and use it to locate the region using sentry_organizationmapping.
  • ByOrganizationId will extract the organization_id parameter and use it to locate the organization’s region using sentry_organizationmapping
  • ByRegionName uses the region_name parameter to choose a destination region.
  • ByOrganizationIsAttribute(param) Will extract organization_id from param to resolve the region.

The implementation for get_org_by_slug looks like:

Copied
from sentry.services.hybrid_cloud.organization.service import OrganizationService
from sentry.services.hybrid_cloud.organization.model import RpcOrganizationSummary
from sentry.models.organization import Organization, OrganizationStatus

class DatabaseBackedOrganizationService(OrganizationService):

    def get_org_by_slug(
        self,
        *,
        slug: str,
        user_id: int | None = None,
    ) -> RpcOrganizationSummary | None:
        query = Organization.objects.filter(slug=slug)
        if user_id is not None:
            query = query.filter(
                status=OrganizationStatus.ACTIVE,
                member_set__user_id=user_id,
            )
        try:
            return serialize_organization_summary(query.get())
        except Organization.DoesNotExist:
            return None

RPC method implementations are simple python methods, but there are a few rules that should be followed:

  1. All parameters and return values must either be simple values, or RpcModel instances. This ensures that requests and responses can be serialized.
  2. Response values should be scalar values or RpcModel instances. Avoid tuples or lists that are not sequences of objects.

RPC Authentication

All cross-region RPC requests go to a single endpoint /api/0/internal/rpc/:service/:method The RPC endpoint uses HMAC based signature comparison to validate authenticity. Before a request is made, the request body has an HMAC signature created using a secret that is shared by region and control silo instances.

When an RPC message is received, the signature header is compared with a locally generated HMAC using the receiver’s secret. The request is only processed if the checksums match.

RPC serialization and transport

Cross-Region RPC uses pydantic and openapi to provide typehinted parameter and response values, and fulfill our requirements for schema generation and validation. How RPC methods behave varies based on how the application is configured.

In monolith mode

  • RPC methods are invoked synchronously on the implementation service.

In tests

  • During tests we emulate as much of the RPC stack as we can.
  • Request bodies and responses are serialized as JSON.
  • Requests are made with django.test.Client to the RPC endpoint.

In siloed mode

  • If a service is in the local silo mode, no serialization or network requests are made.
  • Request bodies and response are serialized as JSON
  • Requests are made using requests and the region configuration.

Adding and changing RPC methods

RPC methods must maintain backwards compatibility between deploys. Because deploys are not atomic, and updating all regions takes time, making changes to RPC methods requires additional care.

Adding RPC methods

You cannot add an RPC method and call it in the same pull request or deploy. Doing so can result in callers being updated before the receiving side has been updated. Instead do the following:

  1. Deploy the new RPC method to all regions.
  2. Deploy callers to the new RPC method.

Adding parameters to RPC methods

You cannot add new required parameters to existing RPC methods. To add a new parameter to an RPC method do the following:

  1. Deploy the new parameter with a default value.
  2. Update all callsites to provide a value for the parameter. Deploy those changes.
  3. Make the parameter required by removing the default value.

Removing an RPC method

  1. Remove all callers of the RPC method. Deploy those changes.
  2. Once there are no callers, the RPC method is safe to remove.

Removing parameters from an RPC method

  1. Make the parameter optional. Deploy that change.
  2. Remove all usage of the parameter. Deploy those changes.
  3. Remove the parameter.

Renaming RPC methods or parameters

Renaming methods and parameters requires adding a new method, shifting callers to the new method and removing the old method.

You can edit this page on GitHub.