Overview
A lot of internal software is built to be functional, not operable.
It gets the job done, but only if the people using it already know the unwritten rules. Edge cases fall back to engineering. Admin surfaces are technically present but hard to reason about. Reliability problems show up as vague user complaints, and the system slowly accumulates operator anxiety.
I’m interested in internal tools that do better than that.
This project focused on an internal ticketing platform for support operations that sat at the intersection of email infrastructure, workflow automation, admin tooling, and reliability engineering. On the surface, it looked like a support system: inboxes, tickets, replies, routing, attachments, dashboards. In practice, it was much closer to an operating system for support work.
The problem
Support workflows look simpler than they are.
From a distance, the problem sounds straightforward: receive emails, turn them into tickets, assign them, reply, track status, and close the loop.
In practice, the system had to hold together across several layers at once:
- inbound mailboxes
- ticket routing and assignment
- automation rules
- SLA and business-hour policies
- admin controls
- reply quality and safety
- attachment ingestion and storage
- operational observability
- day-to-day usability for non-engineering teams
That is where many internal tools start to break down. They may support the workflow, but they do not make it easy to operate safely or confidently.
Constraints
This was not just a UI problem.
The platform depended heavily on external email infrastructure and identity behavior. Mailbox configuration, replies, auto-replies, attachments, and threading all had to remain coherent across system boundaries.
A few constraints shaped the work:
- support workflows were mailbox-driven rather than self-contained
- admin users needed fine-grained control without unsafe complexity
- external API behavior introduced identity and consistency edge cases
- attachment retrieval had to survive failures across multiple layers
- non-engineering teams needed to operate the system without depending on tribal knowledge
The difficulty was not simply building features. It was making the system dependable enough to function as real operational infrastructure.
Approach
The goal was not just to build a ticketing interface.
It was to create a system that support and ops teams could actually run with confidence. That meant thinking across workflow, reliability, admin controls, recovery behavior, and operator usability rather than treating them as separate concerns.
At a product level, the system combined:
- inbox and outbox workflows for support teams
- group- and assignee-based routing
- automation rules for ticket creation and assignment
- mailbox-specific configuration
- global and per-mailbox auto-replies
- configurable ticket fields for status, priority, type, and disposition
- business hours and SLA policy controls
- proofread tooling for support replies
- ops dashboards for health and monitoring
- attachment preview, download, ingestion, and recovery flows
- an internal help and documentation experience for ops users
The work was not just about adding power. It was about reducing ambiguity and making the system easier to operate safely.
Execution
One of the most important engineering threads in the project was attachment ingestion.
At one point, some attachments worked and some did not. On the surface, the symptom was simple: preview failed. But the real issue sat across multiple boundaries.
A few things were going wrong at once:
- stored Microsoft Graph message IDs were sometimes stale
- attachment IDs were not always durable enough to trust later
- Graph lookups could fail depending on which form of ID was being used
- even successful fetches could still fail downstream because storage uploads were using the wrong MIME type
The fix ended up being multi-layered:
- standardizing Graph ID handling
- re-resolving stale messages using more stable identifiers like
internetMessageId - refreshing attachment IDs from live Graph metadata
- self-healing message records when a newer valid ID was discovered
- backfilling failed historical attachment ingests
- fixing storage upload MIME handling so successfully fetched PDFs were actually stored correctly
Alongside the reliability work, the system was also made easier to operate:
- ticket fields became backend-controlled so rules and dropdowns stayed consistent
- auto-replies could be configured per mailbox rather than only globally
- proofread tooling was tuned to improve grammar and clarity without becoming unsafe rewriting
- contextual help links were added across admin surfaces
- an internal help experience was added with troubleshooting guidance, workflow explanations, and a printable checklist for ops users
None of that sounds dramatic in isolation. Together, it changed the character of the system.
Outcome
By the end, the platform felt less like a support dashboard and more like an operational system for support work.
It became more dependable across messy email-driven workflows, easier for admins to reason about, more recoverable when things broke, and more usable for non-engineering teams operating it day to day.
The most important shift was not just feature coverage. It was confidence. The system became easier to trust.
Key lessons
A few lessons stood out from the project.
Internal tools need the same seriousness as external products when they sit close to execution. In many cases, they need more.
Reliability work often hides behind mundane user complaints. A failed preview, a missing attachment, or an inconsistent reply flow may look small, but the underlying issue can cut across identity, external APIs, storage, and recovery logic.
And operator confidence matters. A system becomes much more valuable when people can understand it, trust it, and recover from problems without engineering rescuing them every time.
That is where internal tools start becoming real infrastructure.