RTN-044: USDF Butler Postgres Design

  • Dan Speck

Latest Revision: 2022-09-14

Note

This technote is a work-in-progress.

1 Abstract

Postgres database design for Butler in the USDF. Includes details on Postgres architecture in Kubernetes, authentication, backups, and monitoring/logging.

2 Architecture

Postgres is deployed on Kubernetes using the CloudNativePG (CNPG) Kuberneter opeator. CNPG is an open source project developed by Enterprise DB. The CNPG Operator includes built in capabilities to manage upgrades, high availability, replication, and backup. Commerical support is available for a fee.

A seperate development Kubernetes cluster and production Kubernetes cluster are deployed. This allows for testing both operator functionality and configuration prior to deploying in production. 2 replicas are deployed in active > passive. 16 CPUs and 64 GB is allocated. For scaling read replicas will be provisioned to handle the read load. Standby replicas are exposed through a seperate PgBouncer read only instance.

The following are outstanding questions.

  • Authentication and user management approach

  • A Butler registry will be distinct for each data release. Will there be a seperate cluster per release or database? Will joins be needed across data registries?

3 Deployment Repository

All butler database related manifests with current settings are kept at https://github.com/slaclab/rubin-usdf-butler-deploy/.

4 Version

Postgres version 14 is deployed. Previously Postgres 12 was deployed at NCSA and Postgres 13 at IDF. Postgres 14 has performance improvments and is the latest stable major version. There are no known limitations that would prevent Butler from running on Postgres 14.

5 Storage

The storage for the CNGP clusters is on Weka distributed NVMe storage using the Container Storage Interface (CSI) driver. 1,000 GB is provisioned for production. Volume expansion is supported to increase the size of the disks. Both CNGP and the Weka CSI Plugin support for volume expansion.

Total storage is forecast to be 100s of Terabytes per year and will require coordination with the USDF infrastructure team to expand the Weka storage.

6 Access Methods

All end user access to the database will be through Butler. Butler uses SQL Alchemy in driver mode and not ORM mode. No direct access through PSQL or other Postgres administrative tools is required by end users. Butler Postgres does not require external connectivity outside of USDF so no external IP address is needed on the database.

How will Cloud RSP users connect to butler?

Postgres administrators will be able to use PSQL and other tools to perform administrative functions. PSQL can be run through the S3DF Rubin Servers.

PgBouncer is a lightweight connection pooler for client connections. PgBouncer front ends connections for connection pooling and protecting access to the database. Session mode is used for the connection pool because temporary tables are used as part of queries. The default pool size was increased because the rubin account is shared by end users.

7 Authentication and Access Control

Security requirements and considerations for authentication and authorization are:

  • Expected initial user count is 200

  • Limit management overhead as there is not staff to reset passwords

  • Initial setup requires a postgres user and roles assigned regardless of whether the password is stored in LDAP or in Postgres

  • Track activity by user to determine who made changes. Difficult to track by IP Address because all connections will come from a different Kubernetes cluster and NATed.

Considerations for discussion - What would be damaging operations to Butler? Drops, deletes?, etc..? - Does everyone need access to write to personal schemas and Butler schemas? Or subset? - Client/Server Butler may be available at some point which removes requirements for individual user accounts for most users - Access model for PaNDA?

There are four types of access needed to Butler Postgres.

  1. Read only access - Read data through Butler

  2. Developer write access - Write data to Butler

  3. PaNDA Service account - Query butler from jobs and store results of job runs

  4. Administrative access - Create databases, tables, edit roles

For individual user accounts in Postgres a username needs to be created and role assigned.

  • Shared Username (rubin) with shared password
    • Pros
      • There is no support needed to reset passwords

      • Easy to deploy and build into image without end user intervention

      • Removes risk of an developer creating a production database with only their ownership and schema

    • Cons:
      • Not able to track who made changes

      • PGPooler default_pool_size considerations around connection limits. A single user could more easily exhaust available connections.

  • Individual username with shared password
    • Pros:
      • Able to track who made changes

      • No overhead of password resets

    • Cons:
      • Additional work to setup user. Script user setup?

      • Security implications of shared password

      • Could have users set to wrong user

  • Individual username with passwords stored in Vault
    • Pros:
      • Able to track who made changes

      • Users already using Vault for other access to secrets

      • Provisioning model in Vault to also create role

    • Cons:
      • Vault’s model assumes you are using temporary credentials. We can create long lived temporary credential, but not ideal. Best option would be to create own connector. Seems possible, but additional overhead to test and install on Vault cluster

      • Users would need obtain the temporary user-password from vault and modify their db-auth.yaml file

  • LDAP
    • Pros:
      • Works with test LDAP Server

    • Cons
      • Passwords set in clear text

      • SLAC LDAP server would need to be modified to support LDAP or new LDAP server deployed

  • LDAPS
    • Cons - Does not currently work. LDAPS and LDAP with Start TLS were tested for authentication. An unknown error was returned by Postgres. It also appears that PG Bouncer does not support LDAP based on an open issue in the PG Bouncer GitHub repository. - Passwords would need to updated every few months based on the password expiration policies and would have increases support tickets to help end users resolve

scram-sha-256 will be used for password encryption as is now is the default for Postgres 14. This encryption method was previously used by Butler in other environments.

8 Backups

CNPG has built in backups through Barman. Backups are integrated with the WAL logs for both incremental full backups. CNGP and Barman require an S3 or Google Cloud Storage interface to save backups. Full backups are configured to run nightly at midnight. Backups are saved to a Weka S3 interface. Please note that this is same storage location that the database is stored.

The long term backup requirements are to:

  • Store backup in physically outside of S3DF?

  • Backup every X amount of time?

9 Monitoring

CNPG has built in Prometheus support for the Pooler and the Database cluster. The S3DF Prometheus instance scrapes and stores metrics. Metrics are displayed in the S3DF Grafana at https://grafana.slac.stanford.edu/d/z7FCA4Nnk/cloud-native-postgresql. Metrics will need to be available for <update> days.

The requirements for monitoring are:

  • Per Cluster
    • Cluster uptime

    • CPU

    • Memory
      • Available

      • Working Memory

    • Storage
      • used, available overall

      • per database

    • Connections
      • Number of available connections

      • Connections per database

    • Replication and Backup
      • Replication Lag

      • WAL archive failures

      • Successful and Failed backups

    • Indexes
      • Most and least frequently scanned

    • Database activity
      • Rows inserted

      • Rows updated

      • Rows deleted

      • Dead Rows

    • Cache
      • Cache hit rate

  • Checks per database
    • Operations
      • Analyze

      • Vaccuum

      • Freeze

      • Bloat

    • Locks
      • Locks by Lock Mode

      • Deadlocks

10 Logging

CNPG logs to stdout and stderr. Logs are available via the kubectl logs command. Currently there is not a solution for long term retention of logging. The options are using Loki, Elasticsearch, or Gooogle Cloud Logging. Logs will be be available for <update days>

The requirements for logs are: - Store logs for X days? - Provide log access to administrators and developers?

See the reStructuredText Style Guide to learn how to create sections, links, images, tables, equations, and more.