Achieving 95% Accuracy in GenAI-based Classification

SMS Country, a global communications platform, automated the classification of millions of daily message routing decisions using a GenAI-based classification agent—achieving 95% accuracy with full explainability.

4 min readNovember 28, 2024

Primary Impact

95%

Classification Accuracy

95.3%

Classification Accuracy

Up from 76% with the rule-based system—a 25% relative improvement

30+ hrs/week

Ops Team Time Saved

Time previously spent maintaining routing rules and resolving disputes, now redirected to strategic work

< 2 minutes

Dispute Resolution Time

AI-generated explanations resolve routing questions instantly vs. multi-day manual investigation

+3.2%

Delivery Rate Improvement

Better routing decisions improved overall message delivery rates across the network

The Challenge

“SMS Country processes over 50 million messages daily across 200+ countries, each requiring routing decisions that optimize for delivery rate, cost, and regulatory compliance by destination country. Manual rule-based routing systems, built over years, had grown to thousands of overlapping rules that were brittle to new operator relationships, inconsistent in edge cases, and impossible to explain to enterprise customers who questioned routing decisions. The operations team spent 30+ hours per week maintaining routing rules and resolving classification disputes.”

The Solution

Eficens built a GenAI-based routing classification agent that replaced the rule-based system with a model-driven approach. The classifier uses a fine-tuned LLM to analyze each message routing request, considering destination country, message type (transactional, promotional, OTP), sender profile, historical delivery data, and current operator status to produce a ranked routing recommendation with a plain-language explanation of the decision logic.

Implementation

Training Data Preparation

The foundation of the classifier was a curated training dataset of 2.8 million historical routing decisions, annotated by routing specialists with the "correct" routing decision and the reason for it. Data from the prior six months was excluded from training (reserved for validation), and the remaining data was cleaned to remove periods of known system instability. Feature engineering extracted 47 input features from each routing request, including country-operator reliability scores computed from rolling 30-day delivery statistics.

Model Development and Validation

The classifier was developed in three iterations. The first iteration used a standard fine-tuned classification model, achieving 88% accuracy on the validation set—a substantial improvement over the rule-based system's 76% but below the 95% target. Analysis of misclassifications revealed that the most common failure mode was insufficient reasoning about regulatory constraints for specific destination countries. The second iteration added a retrieval step that augmented each classification request with the latest regulatory guidance for the destination country, improving accuracy to 93%. The third iteration added an adversarial dataset of historically disputed routing decisions, fine-tuning the model on cases where the initial decision was incorrect, reaching 95.3% accuracy.

Production Deployment and Monitoring

The classifier was deployed in a shadow mode first—running in parallel with the rule-based system for 30 days without affecting live routing decisions—allowing direct comparison of the two systems' recommendations. The shadow evaluation confirmed the accuracy improvement and identified a small set of message categories where the classifier underperformed, which were excluded from the initial production rollout and addressed in the next training iteration. Full production deployment followed a phased rollout: 10% of traffic on day 1, 50% on day 7, 100% on day 21.

Related Resources

View all

Blog

Deterministic Validation: Ensuring AI Outputs Meet Strict JSON Contracts

LLMs are probabilistic. Enterprise systems are not. Closing this gap requires deterministic validation—a set of strict contracts that every AI output must satisfy before it's allowed to act on the world.

Blog

Managed Autonomy: Balancing Supervised and Autonomous Agent Execution

Full autonomy isn't always the goal. The most reliable enterprise AI deployments use a dynamic autonomy spectrum—knowing precisely when agents should act and when they must ask.

Blog

From Chatbots to Agentic AI: Why Orchestration is the New Standard

The shift from reactive chatbots to proactive agentic systems is not an upgrade—it's a fundamental architectural rethink. Here's why orchestration is the only path forward for enterprise AI.