Skip to content

AWS High Availablity Cluster

Posit Package Manager can be configured to run on AWS in a High Availability (HA) cluster configuration for a non-air-gapped environment. In this architecture, Posit Package Manager can handle a large number of users and have the reliability that comes with running an HA deployment.

This configuration is suitable for teams of hundreds of data scientists who want or require high availability within their organization.

Most companies don’t need to run in this configuration unless they have many concurrent package uploads/downloads or are required to run in HA for compliance reasons. Instead, a single node instance of Package Manager would be more suitable for small teams that don’t require HA.

Architecture Overview#

This Posit Package Manager implementation leverages deploying in a High Availability configuration. It additionally leverages:

  • An AWS Application Load Balancer (ALB) for ingress.
  • Two EC2 instances running in HA.
  • An S3 bucket to act as Posit Package Manager’s object storage.
  • An RDS instance that includes a Postgres database for Posit Package Manager metadata.

Architecture Diagram#

Diagram of Package Manager configuration running in HA on AWS

Sizing and Performance#

Nodes#

Posit Package Manager can be run across two EC2 instances in a clustered configuration. We have tested with c5.4xlarge instances and can serve 30 million package installs per month, or 1 million package installs per day. This configuration can also handle 100 Git builders concurrently building packages from Git repositories.

Note

Each Posit Package Manager user could be downloading dozens or hundreds of packages a day. There are also other usage patterns such as an admin uploading local packages or the server building packages for Git builders, but package installations give a good idea of what load and throughput this configuration can handle. This is the configuration that the Posit Public Package Manager service runs, so we don’t anticipate any individual customer needing to scale beyond this configuration.

Database#

This configuration uses RDS with Postgres on a db.t3.xlarge instance with 100GB of storage. This is a very generous configuration. In our testing, the Postgres database handled 1,000,000+ package installs per day without exceeding 10-20% CPU utilization.

Storage#

The S3 bucket is used to store data about packages and sources, as well as cached metadata to decrease response times for requests. An S3 bucket with default settings is sufficient for this Posit Package Manager configuration.

Configuration Details#

EC2 Instances#

We recommend creating an AMI containing all required software and configuration for easy deployment.

The EC2 instances in an HA configuration require the following configuration:

  • Matching versions of Posit Package Manager
  • Shared encryption keys for every node
  • Shared configuration file for every node
  • All the necessary versions of R and Python (if using Git building functionality)

The Package Manager Admin Guide offers an HA Checklist to follow when setting up this configuration.

Networking#

Posit Package Manager should be deployed inside a single private subnet with ingress using an Application Load Balancer. This should run within a single availability zone.

S3#

An S3 bucket with default settings is sufficient for this Posit Package Manager configuration. S3 can also be used with KMS for client-side encryption.

RDS#

The RDS instance should be configured with an empty Postgres database for the Posit Package Manager metadata.

Resiliency and Availability#

This configuration of Posit Package Manager has been deployed on the Posit Public Package Manager service. As a publicly available service, the architecture is tested by the R and Python communities that use it. Package Manager is used by many more users than any private Posit Package Manager instance, the current uptime for the Posit Public Package Manager service can be found on the status page.