The Obvious Problem (And the Real One)

We were spending heavily on enterprise SSO. The solution seemed obvious: find a cheaper provider, migrate our configurations, pocket the savings.

But expensive SSO wasn’t our problem. It was a symptom.

Our real problem was architectural. We had built a monolith where all tenants shared the same domain. Every tenant’s users logged into the same URL. Multi-tenancy was a software concern - filtering by tenantId in application code - rather than an architectural one.

This created fundamental security vulnerabilities. Cross-tenant data leakage wasn’t just possible, it was easy. Any forgotten WHERE clause, any missed authorization check, any bug in any controller could expose Tenant A’s data to Tenant B’s users.

We were spending heavily on enterprise SSO because when your architecture makes security bugs easy to write, you compensate by fortifying the perimeter. But this was treating symptoms, not the disease.

Mapping the System

The monolith architecture looked like this:

Monolith architecture diagram showing data flow from all clients through shared-app.com to application code with tenantId filtering to database

Every security guarantee depended on developers remembering to filter by tenantId. We had hundreds of tables, thousands of queries, dozens of controllers. The attack surface was enormous. Each new feature increased the risk.

The expensive SSO was our hedge against this systemic risk. We were trying to solve an architecture problem with a procurement decision.

Component Thinking vs. Systems Thinking

Component-level thinking would have led us to:

  • Find a cheaper SSO provider (saves money)
  • Add more code review for tenantId filtering (reduces bugs)
  • Build automated testing for cross-tenant isolation (catches regressions)

Each of these is locally optimal. Each solves a piece of the problem. But together, they still leave us with architecture where security depends on humans not making mistakes.

Systems-level thinking led us to ask different questions:

  • Why are we spending so much on authentication? (Because our architecture creates risk)
  • Why does our architecture create risk? (Because tenant isolation is a software concern, not architectural)
  • How can we make isolation architectural? (Subdomains, isolated auth providers, database-level filtering, integrity checking)
  • If we redesign this system, what other problems disappear? (Cost, manual setup, operational complexity)

Defense in Depth: Four Layers of Isolation

We rebuilt the system with isolation at every layer:

Layer 1: Subdomain Isolation

We moved from a shared domain to per-tenant subdomains:

Tenant A → acme.platform.com
Tenant B → techco.platform.com

Each tenant gets their own namespace. Browser security policies (CORS, cookies, localStorage) now work for us instead of against us.

Layer 2: Authentication Provider Isolation

We auto-generate isolated SSO providers based on subdomain:

subdomain: "acme"
SAML Provider ID: "saml.prod.acme"
OIDC Provider ID: "oidc.prod.acme"
Authorized Domain: "acme.platform.com"

Tenant A’s SAML configuration cannot interfere with Tenant B’s OIDC setup. Authentication itself enforces tenant boundaries.

Layer 3: Database Query Filters

We implemented global query filters at the ORM level that automatically restrict every database query:

// Applied to all tables with TenantId column
builder.HasQueryFilter(record =>
    record.TenantId == currentUserTenantId
);

Even if a developer forgets to filter by tenantId in a controller, the database layer enforces it. You can’t write a cross-tenant data leak because the architecture won’t execute it.

Layer 4: Save Operation Integrity Checking

We added runtime validation that inspects every database save operation:

public override int SaveChanges()
{
    var userTenantId = GetCurrentUserTenantId();
    var changedRecords = ChangeTracker.Entries()
        .Where(e => HasTenantId(e));

    foreach (var record in changedRecords)
    {
        if (record.TenantId != userTenantId)
            throw new InvalidOperationException(
                "Cross-tenant data contamination detected"
            );
    }
    return base.SaveChanges();
}

If code tries to save data to the wrong tenant - through bug, misconfiguration, or attack - the system throws an exception. Operations fail loudly rather than silently corrupting data.

One System, Multiple Emergent Benefits

By redesigning the architecture rather than swapping vendors, we solved an entire class of problems simultaneously:

  • Security: Cross-tenant data leakage went from “statistically inevitable” to “architecturally impossible.” An attacker needs to bypass all four layers.
  • Cost: Significant annual savings by replacing our enterprise SSO provider with an in-house solution. Our architecture itself provides the security guarantees we were paying premium prices for.
  • Operations: Tenant onboarding and operational admin essentially disappeared. New tenants automatically get SSO providers configured, authorized domains added, and data isolated. No tickets, no manual configuration, no human error. What used to require coordination across multiple teams now happens automatically.
  • Developer Velocity: Engineers building features don’t need to remember to filter by tenantId. The system enforces it automatically. Code reviews focus on business logic instead of security checklists.
  • Auditability: Instead of auditing thousands of queries across hundreds of files, we verify the ORM implementation once and trust it everywhere. Security becomes provable rather than probabilistic.
  • Reputation: When prospects ask about our security model, we can demonstrate architectural guarantees rather than relying on process and vigilance. Cross-tenant isolation isn’t a feature we promise - it’s impossible not to have.

Building Levers, Not Patches

The ORM query filter is roughly 30 lines of code, but it secures millions of queries across hundreds of tables. The subdomain-based provider ID generation is simple string concatenation, but it eliminates configuration errors and manual setup for every tenant we onboard.

Compare this to component-level approaches:

  • Still manually configuring SSO for each tenant
  • Still doing extensive code review to catch missing tenantId filters
  • Still running penetration tests to find security holes
  • Still treating cross-tenant leakage as a bug to fix rather than an impossibility by design

Patches scale linearly - each tenant needs configuration, each feature needs review, each bug needs fixing.

Levers scale exponentially - the 50th tenant gets the same automatic security as the 1st, the 100th feature inherits the same query filters as the 10th.

What This Doesn’t Solve

This architecture isn’t a silver bullet:

  • Application-level authorization still requires careful design. We’ve solved cross-tenant data access, but not role-based permissions within a tenant.
  • Subdomain management adds operational complexity. DNS, SSL certificates, and CDN configuration all need to handle dynamic subdomains.
  • Testing complexity increases because you can’t easily test cross-tenant scenarios in automated tests (that’s the point, but it makes certain test cases harder).
  • Migration cost was significant. Moving from shared domain to per-tenant subdomains required coordinating with dozens of active tenants.
  • Operational complexity for the authentication layer increased. Building and maintaining an in-house SSO provider system requires ongoing engineering investment and operational expertise.

The Lesson: Treat Symptoms or Fix Systems

When we encountered expensive SSO, we had two paths:

Path 1 (Symptom): Swap vendors, negotiate pricing, optimize what we had.

Path 2 (System): Ask why it’s expensive, identify root causes, redesign the architecture.

Path 1 seemed easier upfront. But Path 2 took only a couple of months because the design was elegant and simple. When you solve the right problem, implementation becomes straightforward.

Path 1 would have left us with the same fundamental problems: manual configuration, security depending on human vigilance, operational complexity scaling with tenant count.

Path 2 gave us a system where good outcomes emerge from the architecture itself. Where adding tenants makes the system more valuable rather than more fragile. Where security is provable rather than hopeful.

Small teams move mountains by building systems that create leverage, not by working harder on components.

The forest, not the trees. The system, not the parts. That’s how you solve real problems.