Hardware Sizing

Hardware and infrastructure requirements for running ticket classification models at different scales.

Hardware Sizing

Understand the hardware requirements for running ticket classification models at different scales.

Overview

Hardware requirements for ticket classification depend on several factors:

  • Model size and complexity
  • Number of tickets processed
  • Classification frequency
  • Response time requirements
  • Budget constraints

Quick Reference

ScaleTickets/DayMin RAMMin CPUGPUModel Type
Small<1,000512 MB1 coreNoSimple ML
Medium1,000-10,0002 GB2 coresOptionalBERT-based
Large10,000-100,0008 GB4 coresRecommendedBERT/Large
Enterprise>100,00016+ GB8+ coresRequiredCustom/Fine-tuned

Deployment Models

CPU-Only Deployment

Best for:

  • Small to medium ticket volumes (<10,000/day)
  • Budget-conscious deployments
  • Simpler models (distilled BERT, small transformers)

Recommended Specs:

Small Scale:
  CPU: 1-2 cores (2.0+ GHz)
  RAM: 512 MB - 2 GB
  Storage: 5 GB
  Network: Standard

Medium Scale:
  CPU: 2-4 cores (2.5+ GHz)
  RAM: 2-4 GB
  Storage: 10 GB
  Network: Standard

Expected Performance:

  • Classification time: 200-500ms per ticket
  • Throughput: 100-500 tickets/minute
  • Model loading time: 5-30 seconds

GPU-Accelerated Deployment

Best for:

  • Large ticket volumes (>10,000/day)
  • Real-time classification requirements
  • Large transformer models
  • Fine-tuning and retraining

Recommended Specs:

Medium-Large Scale:
  CPU: 4-8 cores
  RAM: 8-16 GB
  GPU: NVIDIA T4 or better (16 GB VRAM)
  Storage: 20 GB SSD
  Network: High bandwidth

Enterprise Scale:
  CPU: 8-16 cores
  RAM: 16-32 GB
  GPU: NVIDIA A10/A100 (24-80 GB VRAM)
  Storage: 50+ GB NVMe SSD
  Network: High bandwidth, low latency

Expected Performance:

  • Classification time: 10-50ms per ticket
  • Throughput: 1,000-10,000 tickets/minute
  • Model loading time: 2-10 seconds

Model Size Impact

Small Models (50-150 MB)

Examples:

  • DistilBERT
  • MiniLM
  • TinyBERT

Requirements:

  • RAM: 512 MB - 1 GB
  • CPU: 1-2 cores sufficient
  • GPU: Not required

Use Cases:

  • Low-volume environments
  • Cost-sensitive deployments
  • Edge deployments

Medium Models (300-500 MB)

Examples:

  • BERT-base
  • RoBERTa-base
  • Custom fine-tuned models

Requirements:

  • RAM: 2-4 GB
  • CPU: 2-4 cores recommended
  • GPU: Optional, improves performance 5-10x

Use Cases:

  • Most production deployments
  • Balanced accuracy/performance
  • Standard ticket volumes

Large Models (1-5 GB)

Examples:

  • BERT-large
  • RoBERTa-large
  • GPT-based models
  • Custom ensemble models

Requirements:

  • RAM: 8-16 GB
  • CPU: 4-8 cores minimum
  • GPU: Highly recommended (T4 or better)

Use Cases:

  • High-accuracy requirements
  • Complex classification tasks
  • Multi-label classification
  • High-volume processing

Containerized Deployments

Docker Resource Limits

Configure appropriate resource limits:

# docker-compose.yml
services:
  ticket-classifier:
    image: openticketai/engine:latest
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G

Kubernetes Pod Sizing

# kubernetes-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: ticket-classifier
spec:
  containers:
    - name: classifier
      image: openticketai/engine:latest
      resources:
        requests:
          memory: '2Gi'
          cpu: '1000m'
        limits:
          memory: '4Gi'
          cpu: '2000m'

Resource Monitoring

Monitor these metrics:

  • CPU Usage: Should be <80% average
  • Memory Usage: Should have 20% headroom
  • Classification Latency: P95 latency under target
  • Queue Depth: Tickets waiting for classification

Scaling Strategies

Vertical Scaling

Increase resources on a single instance:

# Start
RAM: 2 GB, CPU: 2 cores

# Scale up
RAM: 4 GB, CPU: 4 cores

# Further scaling
RAM: 8 GB, CPU: 8 cores

Pros:

  • Simple to implement
  • No code changes required
  • Easy to manage

Cons:

  • Limited by hardware maximums
  • Single point of failure
  • Potentially expensive

Horizontal Scaling

Deploy multiple instances:

# Load balancer
└── Classifier Instance 1 (2 GB, 2 cores)
└── Classifier Instance 2 (2 GB, 2 cores)
└── Classifier Instance 3 (2 GB, 2 cores)

Pros:

  • Better reliability
  • Handles traffic spikes
  • More cost-effective at scale

Cons:

  • More complex setup
  • Requires load balancer
  • Shared state considerations

Auto-Scaling

Dynamic scaling based on load:

# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ticket-classifier
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ticket-classifier
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Storage Requirements

Model Storage

  • Base models: 100 MB - 5 GB
  • Fine-tuned models: +100-500 MB
  • Cache: 1-5 GB
  • Logs: 100 MB - 1 GB/day
Disk Layout:
├── /models/ (10-20 GB, SSD)
├── /cache/ (5 GB, SSD)
├── /logs/ (rotating, 10 GB)
└── /data/ (variable, standard storage)

Network Requirements

Bandwidth

  • Model downloads: Initial 1-5 GB, then minimal
  • API traffic: 1-10 KB per ticket
  • Monitoring: 1-5 MB/hour

Latency

  • Internal: <10ms ideal
  • External APIs: <100ms acceptable
  • Model serving: <50ms target

Cost Optimization

Development Environment

Minimal cost setup for testing:

Cloud Instance:
  Type: t3.small (AWS) / e2-small (GCP)
  vCPU: 2
  RAM: 2 GB
  Cost: ~$15-20/month

Production Small Scale

Cost-effective production:

Cloud Instance:
  Type: t3.medium (AWS) / e2-medium (GCP)
  vCPU: 2
  RAM: 4 GB
  Cost: ~$30-40/month

Production Large Scale

High-performance production:

Cloud Instance:
  Type: c5.2xlarge (AWS) / c2-standard-8 (GCP)
  vCPU: 8
  RAM: 16 GB
  GPU: Optional T4
  Cost: ~$150-300/month (CPU) or ~$400-600/month (GPU)

Performance Testing

Benchmarking Your Setup

Test classification performance:

# Load test with 100 concurrent requests
ab -n 1000 -c 100 http://localhost:8080/classify

# Monitor during test
docker stats ticket-classifier

# Check response times
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:8080/classify

Performance Targets

MetricTargetMeasurement
Latency P50<200msMedian response time
Latency P95<500ms95th percentile
Latency P99<1000ms99th percentile
Throughput>100/minTickets classified
CPU Usage<80%Average utilization
Memory Usage<80%Peak utilization

Troubleshooting

Out of Memory Errors

Symptoms:

MemoryError: Unable to allocate array
Container killed (OOMKilled)

Solutions:

  1. Increase memory allocation
  2. Use smaller model variant
  3. Reduce batch size
  4. Enable model quantization

Slow Classification

Symptoms:

  • Latency >1 second per ticket
  • Growing processing queue

Solutions:

  1. Enable GPU acceleration
  2. Use model distillation
  3. Optimize batch processing
  4. Add more replicas

High CPU Usage

Symptoms:

  • CPU constantly >90%
  • Throttled performance

Solutions:

  1. Add more CPU cores
  2. Optimize model inference
  3. Implement request queuing
  4. Scale horizontally

Best Practices

DO ✅

  • Start with CPU-only for testing
  • Monitor resource usage continuously
  • Set appropriate resource limits
  • Plan for 2x current load
  • Use caching where possible
  • Implement health checks

DON’T ❌

  • Under-provision memory (causes OOM)
  • Skip performance testing
  • Ignore monitoring metrics
  • Over-provision unnecessarily
  • Mix production and development workloads

Next Steps

After sizing your hardware:

  1. Deploy Infrastructure: Set up servers/containers
  2. Install Model: Download and configure classification model
  3. Performance Test: Validate against your requirements
  4. Monitor: Set up metrics and alerting