Case study

Building a Ticketing System That Ops Teams Can Actually Run

An internal support operations platform built across email workflows, admin controls, attachment reliability, and operator-facing tooling.

Overview

How I approached building an internal ticketing platform that combined support workflows, email infrastructure, admin tooling, and reliability engineering.

Overview

A lot of internal software is built to be functional, not operable.

It gets the job done, but only if the people using it already know the unwritten rules. Edge cases fall back to engineering. Admin surfaces are technically present but hard to reason about. Reliability problems show up as vague user complaints, and the system slowly accumulates operator anxiety.

I’m interested in internal tools that do better than that.

This project focused on an internal ticketing platform for support operations that sat at the intersection of email infrastructure, workflow automation, admin tooling, and reliability engineering. On the surface, it looked like a support system: inboxes, tickets, replies, routing, attachments, dashboards. In practice, it was much closer to an operating system for support work.

The problem

Support workflows look simpler than they are.

From a distance, the problem sounds straightforward: receive emails, turn them into tickets, assign them, reply, track status, and close the loop.

In practice, the system had to hold together across several layers at once:

inbound mailboxes
ticket routing and assignment
automation rules
SLA and business-hour policies
admin controls
reply quality and safety
attachment ingestion and storage
operational observability
day-to-day usability for non-engineering teams

That is where many internal tools start to break down. They may support the workflow, but they do not make it easy to operate safely or confidently.

Constraints

This was not just a UI problem.

The platform depended heavily on external email infrastructure and identity behavior. Mailbox configuration, replies, auto-replies, attachments, and threading all had to remain coherent across system boundaries.

A few constraints shaped the work:

support workflows were mailbox-driven rather than self-contained
admin users needed fine-grained control without unsafe complexity
external API behavior introduced identity and consistency edge cases
attachment retrieval had to survive failures across multiple layers
non-engineering teams needed to operate the system without depending on tribal knowledge

The difficulty was not simply building features. It was making the system dependable enough to function as real operational infrastructure.

Approach

The goal was not just to build a ticketing interface.

It was to create a system that support and ops teams could actually run with confidence. That meant thinking across workflow, reliability, admin controls, recovery behavior, and operator usability rather than treating them as separate concerns.

At a product level, the system combined:

inbox and outbox workflows for support teams
group- and assignee-based routing
automation rules for ticket creation and assignment
mailbox-specific configuration
global and per-mailbox auto-replies
configurable ticket fields for status, priority, type, and disposition
business hours and SLA policy controls
proofread tooling for support replies
ops dashboards for health and monitoring
attachment preview, download, ingestion, and recovery flows
an internal help and documentation experience for ops users

The work was not just about adding power. It was about reducing ambiguity and making the system easier to operate safely.

Execution

One of the most important engineering threads in the project was attachment ingestion.

At one point, some attachments worked and some did not. On the surface, the symptom was simple: preview failed. But the real issue sat across multiple boundaries.

A few things were going wrong at once:

stored Microsoft Graph message IDs were sometimes stale
attachment IDs were not always durable enough to trust later
Graph lookups could fail depending on which form of ID was being used
even successful fetches could still fail downstream because storage uploads were using the wrong MIME type

The fix ended up being multi-layered:

standardizing Graph ID handling
re-resolving stale messages using more stable identifiers like internetMessageId
refreshing attachment IDs from live Graph metadata
self-healing message records when a newer valid ID was discovered
backfilling failed historical attachment ingests
fixing storage upload MIME handling so successfully fetched PDFs were actually stored correctly

Alongside the reliability work, the system was also made easier to operate:

ticket fields became backend-controlled so rules and dropdowns stayed consistent
auto-replies could be configured per mailbox rather than only globally
proofread tooling was tuned to improve grammar and clarity without becoming unsafe rewriting
contextual help links were added across admin surfaces
an internal help experience was added with troubleshooting guidance, workflow explanations, and a printable checklist for ops users

None of that sounds dramatic in isolation. Together, it changed the character of the system.

Outcome

By the end, the platform felt less like a support dashboard and more like an operational system for support work.

It became more dependable across messy email-driven workflows, easier for admins to reason about, more recoverable when things broke, and more usable for non-engineering teams operating it day to day.

The most important shift was not just feature coverage. It was confidence. The system became easier to trust.

Key lessons

A few lessons stood out from the project.

Internal tools need the same seriousness as external products when they sit close to execution. In many cases, they need more.

Reliability work often hides behind mundane user complaints. A failed preview, a missing attachment, or an inconsistent reply flow may look small, but the underlying issue can cut across identity, external APIs, storage, and recovery logic.

And operator confidence matters. A system becomes much more valuable when people can understand it, trust it, and recover from problems without engineering rescuing them every time.

That is where internal tools start becoming real infrastructure.

Related notes

A few adjacent ideas.

Shorter pieces that extend the same themes into product judgment, operations, and systems behavior.

Internal systems · Feb 12, 2026

Why most internal tools fail before they scale

Internal tools usually fail before scale because weak workflow models, unstable state design, and unclear ownership harden into the product long before traffic is the real constraint.

Read note

Systems · Mar 30, 2026

Building a Network Intelligence Map for Provider Coverage in India

Built an internal coverage-intelligence map that combined pincode geometry, provider data, and nearby external availability to make network gaps easier to see and investigate.

Read note