Multitenancy Architecture

Date: 7 August 2024
Author: Sean Bailey

Decision Summary:

Title: Building back-end multitenancy
Description: Evaluate two options for separating organizations' data through multitenant architecture

Background

For compliance and data security reasons, we need to separate the data for each customer into its own database. Since we are dealing with patient health data and other PII, data isolation is important to ensure HIPAA compliance and good data security practices.

Options Considered

architecture diagram comparison

Option 1: Front-end multitenancy

An authentication microservice logs users in and identifies their tenant. That is passed to the front-end via a Next.js URL parameter (app.trially.ai/tenant-name/recruiting). Next.js then routes requests to individual backend instances, containerized with their own individual database. Each tenant has their own backend.

Pros: simpler backend (each backend only knows about one database), better isolation
Cons: potentially more GCP cost, deploying updates is hard, will need to automate creation of tenants, other services will need to handle multitenancy on their own

Option 2: Back-end multitenancy

There is one front-end instance and one back-end instance. Authenticated requests to the backend are used to route data operations to their respective database in one Postgres instance. Each tenant has their own database.

Pros: less expensive, less change to existing infrastucture, syncing data can happen through one backend
Cons: complexity of implementing multitenancy on backend, less secure overall,

Evaluation Criteria

Functionality

Functionality for the end-user would be similar, or at least not noticably different. Customers would be assured of data segregation and would be able to access their data quickly via the webapp.

Cost

Compute cost through GCP will be the main cost variable. Front-end multitenancy would potentially be more expensive, as we would have to run N backend instance for N tenants. Fringe costs also could change for Doppler and other services that the backend depends on.

Cost implications for back-end multitenancy should be minimal compared to our current compute spend.

Simplicity

Back-end multitenancy is simpler to deploy and likely has a simpler developer experience. Front-end multitenancy, while providing increased isolation, also provides additional complexity on the front-end and for the architecture of our tech -- but makes backend development far simpler.

Integration

Both solutions are compatible with our existing cloud and containerization setup including Authress. Front-end multitenancy would require more infrastructural tweaks, but nothing tha standard cloud providers can't handle.

Scalability

Both solutions are highly scalable -- front-end multitenancy scales horizontally, and back-end multitenancy should initially scale vertically until we need to move to a sharded/replicated database environment due to scale.

Support and Maintenance

Maintaining and orchestrating multiple backends is significantly more complex than a single backend.

Compliance

Both solutions are compatible with HIPAA compliance and other data security requirements. Potential on-prem hosting would require modifications to both solutions.

Decision

Chosen Option: Back-end multitenancy
Justification: It's less expensive and less complex to maintain, and best matches our existing architecture while still meeting customer data segregation requirements
Implementation Plan: Build multitenancy support into all backend API routes that involve data fetching, using Authress.