42,000 households in range
Dispatch triggers at FPS ≥ 56 · Current: 77 ✓ dispatched
PROCESS & LOGIC
OPERATIONAL WORKFLOW
The system moves issues through structured operational stages, ensuring clear ownership, escalation logic, and resolution tracking from detection to closure.
Detection / Issue Intake
Issues detected proactively or reported manually
Ticket Created
Detect unusual network behavior
Initial Triage
Determine severity, impact, and routing path
Specialist Review
Investigate complex technical incidents
Escalation
Assign to responsible roles
Supervisor Review
Handle escalations and decision routing
Direct Field Dispatch
Send technician for physical network issues
On-site Investigation
Agent Review
Attempt quick resolution for known issues
Issue Found
Action/Fix
Agent
Specialist
Technician
Resolution
Issue addressed and validated
Validation
Ticket Closure
Feedback
Capture resolution insights
Knowledge Base
System Learning
Improve future detection and routing
Diagnosis
Decision & Routing
Issue Fixed
Issue Fixed
ROLE BASED INTERFACES
The system provides role-specific interfaces designed around operational responsibilities, ensuring clarity, efficiency, and accountability across the incident lifecycle
Agent
Specialist
Supervisor
Technician
Incident Resolution
Ticket / Incident System
Agent Dashboard
Ticket List
Ticket Detail
Escalation
Diagnosis Tools
Investigation View
Root Cause Analysis
Resolution Actions
Field Assignment
Site Investigation
Repair Actions
Report Upload
System Learning
Operations Dashboard
Escalation Control
SLA Monitoring
Decision Routing
Agent
↓
Supervisor
↓
Specialist
↓
Technician
↓
Resolution
Flexible Routing
The system introduced non-linear routing, allowing issues to be resolved at different operational levels based on complexity.
Issue
↓
Agent | Specialist | Technician
↓
Resolution
TELEMETRY PROTOCOLS
SNMP (Simple Network Management Protocol)
Devices expose MIBs (Management Information Bases). The monitoring system polls at configurable intervals (default: 30 seconds for edge routers, 60 seconds for aggregation switches).
Key MIB objects:
ifInErrors / ifOutErrors: interface error counters
ifInDiscards / ifOutDiscards: packet discard rates (early congestion signal)
sysUpTime: device uptime (reset = reboot event = flag)
ifOperStatus: link operational state
cpmCPUTotal5sec: CPU utilization (high CPU causes downstream drops)
SNMP Traps ingested asynchronously alongside polled data. Polling for trend analysis; traps for event detection.
Syslog — RFC 5424
Continuous real-time event stream from all network devices.
format: <PRI>TIMESTAMP HOSTNAME APPNAME MSGID STRUCTURED-DATA MSG
Severity 0–4 (Emergency → Warning) ingested into correlation pipeline.
Severity 5–7 (Notice → Debug) stored for audit, not processed for alerting.
NetFlow / IPFIX / sFlow
Flow-level traffic data enabling:
Per-link baseline traffic volume (sudden drops = upstream failure signal)
Traffic matrix analysis for routing change impact assessment
Top-talker identification during congestion events
sFlow used on high-throughput optical segments — lower precision, lower performance overhead.
BGP (Border Gateway Protocol) Monitoring
Captured via OpenBMP or ExaBGP route collectors. BGP UPDATE messages feed the correlation engine.
Events monitored:
Route flapping: BGP keep alive timeout / interface instability
Route withdrawal: destination unreachable (user-visible disruption within 30–60 seconds)
AS path changes: upstream provider routing issue
Session resets: loss of peering session
BGP events carry the highest individual confidence weight (25 points) due to their direct correlation with user-visible failures.
NORMALIZED EVENT SCHEMA
SNMP (Simple Network Management Protocol)
Devices expose MIBs (Management Information Bases). The monitoring system polls at configurable intervals (default: 30 seconds for edge routers, 60 seconds for aggregation switches).
Key MIB objects:
ifInErrors / ifOutErrors: interface error counters
ifInDiscards / ifOutDiscards: packet discard rates (early congestion signal)
sysUpTime: device uptime (reset = reboot event = flag)
ifOperStatus: link operational state
cpmCPUTotal5sec: CPU utilization (high CPU causes downstream drops)
SNMP Traps ingested asynchronously alongside polled data. Polling for trend analysis; traps for event detection.
Syslog — RFC 5424
Continuous real-time event stream from all network devices.
format: <PRI>TIMESTAMP HOSTNAME APPNAME MSGID STRUCTURED-DATA MSG
Severity 0–4 (Emergency → Warning) ingested into correlation pipeline.
Severity 5–7 (Notice → Debug) stored for audit, not processed for alerting.
NetFlow / IPFIX / sFlow
Flow-level traffic data enabling:
Per-link baseline traffic volume (sudden drops = upstream failure signal)
Traffic matrix analysis for routing change impact assessment
Top-talker identification during congestion events
sFlow used on high-throughput optical segments — lower precision, lower performance overhead.
BGP (Border Gateway Protocol) Monitoring
Captured via OpenBMP or ExaBGP route collectors. BGP UPDATE messages feed the correlation engine.
Events monitored:
Route flapping: BGP keep alive timeout / interface instability
Route withdrawal: destination unreachable (user-visible disruption within 30–60 seconds)
AS path changes: upstream provider routing issue
Session resets: loss of peering session
BGP events carry the highest individual confidence weight (25 points) due to their direct correlation with user-visible failures.
Research and Understanding
Method 01
Desk Research
Desk Research
Network Operations Centre (NOC) Design Standards
NOC environments are documented in operations literature as high-stress, cognitively demanding workplaces. Design considerations unique to NOC contexts include: high ambient screen time (operators may monitor 8–16 displays simultaneously), time-pressured decision making, shift-based rotations (introducing handover risk), and the critical importance of information hierarchy — displaying the most urgent information first, not the most information first. including general cleanings and checkups, fillings, crowns, bridges, root canals, tooth extractions, and cosmetic procedures like teeth whitening and veneers.
SLA Terminology and Regulatory Obligations
Signal Correlation Theory
Alert Fatigue Research
ITIL v4 Incident Management Framework
Method 02
ISP Operational Research
ISP Operational Research
Network Event Classification
Fiber Optic-Specific Failure Modes
Traffic Pattern Baselines
Last-Mile vs. Backbone Failures
Method 03
NOC Workflow Analysis
NOC Workflow Analysis
The Handover Problem
Cognitive Load and Information Hierarchy
Escalation Decision Latency
Field Technician Information Needs
Method 04
Systems Mapping
Systems Mapping
The Signal Propagation Map
The Role Dependency Map
The Blast Radius Model
The Feedback Loop Audit
Method 05
Market Research
PagerDuty
Event Intelligence (AIOps Layer)
Urgency and Severity Scoring
On-Call Schedule Management
Post-Incident Reviews
Method 05
Market Research
Datadog
Metrics
Logs
Traces
Map-Based Infrastructure Visualisation
Anomaly Detection Widgets
Method 05
Market Research
Google Site Reliability Engineering (SRE)
The Four Golden Signals
Service Level Objectives (SLOs)
Error Budgets
Toil Reduction
Blameless Postmortems
Method 05
Market Research
Atlassian Incident Management
Incident Classification and Routing
SLA Tracking and Breach Prevention
Opsgenie's Alert Routing Logic
Statuspage for Stakeholder Communication
