Load Balanced
Enhanced Advanced
Posit Package Manager can be configured to run on AWS in a Load Balanced (LB) cluster configuration for a non-air-gapped environment. In this architecture, Posit Package Manager can handle a large number of users and have the reliability that comes with running behind a load balancer.
This configuration is suitable for teams of hundreds of data scientists who want or require high availability within their organization.
Most companies don’t need to run in this configuration unless they have many concurrent package uploads/downloads or are required to have high availability for compliance reasons. Instead, the single server architecture of Package Manager would be more suitable for small teams that don’t require HA.
Architecture Overview#
This Posit Package Manager implementation leverages deploying in a Load Balanced configuration. It additionally leverages:
- Two Amazon Elastic Compute Cloud (EC2) instances running in HA.
- AWS Relational Database Service (RDS) for PostgreSQL, serving as the application database for Posit Package Manager.
- AWS Simple Storage Service (S3) for Posit Package Manager’s object storage.
- AWS Application Load Balancer (ALB) to route requests to the Posit Package Manager service.
Architecture Diagram#
Nodes#
We recommend running Posit Package Manager on two EC2 instances across different availability zones and private subnets. We have tested with m6i.2xlarge
instances (8 vCPUs, 32 GiB memory) and can serve 30 million package installs per month, or one million package installs per day. This configuration can also handle 100 Git builders concurrently building packages from Git repositories.
Note
Each Posit Package Manager user could be downloading dozens or hundreds of packages a day. There are also other usage patterns such as an admin uploading local packages or the server building packages for Git builders, but package installations give a good idea of what load and throughput this configuration can handle.
The EC2 instances in a load balanced configuration require the following configuration:
- Matching versions of Posit Package Manager
- Shared encryption keys for every node
- Shared configuration file for every node
- All the necessary versions of R and Python (if using Git building functionality)
The Package Manager Admin Guide offers an HA Checklist to follow when setting up Package Manager behind a load balancer.
Database#
This configuration uses RDS with PostgreSQL on a db.t3.xlarge
instance (4 vCPUs, 16 GiB memory) with 100 GiB of General Purpose SSD (gp3) storage, and Multi-AZ enabled with one standby.
Multi-AZ allows for the RDS instance to run in an active/passive configuration across 2 availability zones, with auto-failover when the primary instance goes down.
The RDS instance should be configured with an empty Postgres database for the Posit Package Manager metadata.
This is a very generous configuration. In our testing, the Postgres database handled one million package installs per day without exceeding 10-20% CPU utilization.
Storage#
The S3 bucket is used to store data about packages and sources, as well as cached metadata to decrease response times for requests. S3 can also be used with KMS for client-side encryption.
Networking#
This configuration uses an Application Load Balancer for load balancing requests to the Posit Package Manager cluster.
The Application Load Balancer is deployed across 3 availability zones.
Configuration Details#
No additional configuration is required for this architecture beyond the initial setup steps outlined in the load balanced installation steps.
FAQ#
See the Architecture Frequently Asked Questions page for more information for the general FAQ.