KFON · OPERATIONAL INTELLIGENCE · SYSTEM SIMULATION
Adjust inputs. Watch all four roles react — simultaneously.
Each slider changes a real variable in the FPS formula. The score updates live, and all four operator views respond to the same incident — showing exactly what each role sees and when they're triggered to act.
District
Fault type
Signal strengths — raw inputs to FPS
How strong is each incoming signal? Weak signals may not reach thresholds even with high weights.
TTemporal overlap — signals within time window
72
GGeographic proximity — spatial clustering
95
IInfrastructure overlap — shared path/node
95
HHistorical similarity — matches past incidents
72
FPS weight calibration
Σ 100%
Redistribute weight between dimensions. Doesn't need to sum to 100 — the system auto-normalises.
TTemporal weight
25%
GGeographic weight
30%
IInfrastructure weight
30%
HHistorical weight
15%
T
25
%
G
30
%
I
30
%
H
15
%
Ernakulam · Fiber cut · 24 nodes
42,000 households in range
Dispatch triggers at FPS ≥ 56 · Current: 77 ✓ dispatched
FAILURE PROBABILITY SCORE
77
CRITICAL
/ 100
T
72
G
86
I
84
H
53
031 MONITOR56 WARNING76 CRITICAL100
25%×72 + 30%×86 + 30%×84 + 15%×53 = 77
0–30
Info
31–55
Monitor
56–75
Warning
76+
Critical
T1
Tier 1 Agent
INC-2850 entered alert queue · FPS 77 · CRITICAL
14,700 households at risk · Fiber cut pattern
Signals: T 72 · G 86 · I 84 · H 53
Escalation recommended immediately
SUP
Supervisor
Escalation received · CRITICAL · FPS 77
Blast radius: ~16,170 households · Ernakulam
Dispatch ERN-Team-2 · available · 12km from site
SLA risk: avg resolution ~6h at this FPS · limit 4h
LEAD
NOC Lead
Ernakulam — CRITICAL · FPS 77 · 16,170 HH affected
Blast radius: 24 nodes monitored · 4 infra segments at risk
SLA compliance: AT RISK — breach likely
FPS model: T25% G30% I30% H15% (current weights)
FLD
Field Tech
Dispatched to Ernakulam · Fiber cut
Site: ERN-C1 junction · Gate: K-4439
Materials: splice kit · OTDR · cable markers
!Site note: J3 splice flagged as repeat failure point — check first
!
CRITICAL — all roles on alert. field team dispatched. SLA at risk.
16,170 households potentially affected · avg historical resolution at this FPS: 6h 28min
KFON Operational Intelligence · Speculative Design · Aswanth Choyan
FPS = (T×25%) + (G×30%) + (I×30%) + (H×15%)

PROCESS & LOGIC

OPERATIONAL WORKFLOW

The system moves issues through structured operational stages, ensuring clear ownership, escalation logic, and resolution tracking from detection to closure.

Detection / Issue Intake

Issues detected proactively or reported manually

Ticket Created

Detect unusual network behavior

Initial Triage

Determine severity, impact, and routing path

Specialist Review

Investigate complex technical incidents

Escalation

Assign to responsible roles

Supervisor Review

Handle escalations and decision routing

Direct Field Dispatch

Send technician for physical network issues

On-site Investigation

Agent Review

Attempt quick resolution for known issues

Issue Found

Action/Fix

Agent

Specialist

Technician

Resolution

Issue addressed and validated

Validation

Ticket Closure

Feedback

Capture resolution insights

Knowledge Base

System Learning

Improve future detection and routing

Diagnosis

Decision & Routing

Issue Fixed

Issue Fixed

ROLE BASED INTERFACES

The system provides role-specific interfaces designed around operational responsibilities, ensuring clarity, efficiency, and accountability across the incident lifecycle

Agent

Specialist

Supervisor

Technician

Incident Resolution

Ticket / Incident System

Agent Dashboard

Ticket List

Ticket Detail

Escalation

Diagnosis Tools

Investigation View

Root Cause Analysis

Resolution Actions

Field Assignment

Site Investigation

Repair Actions

Report Upload

System Learning

Operations Dashboard

Escalation Control

SLA Monitoring

Decision Routing

ROUTING LOGIC

Early support workflows relied on linear escalation paths, where issues moved sequentially across roles regardless of complexity. This created delays and unnecessary handoffs.

To address this, the system evolved to support flexible, role-based routing, allowing issues to be resolved at the appropriate operational level.

Linear Escalation

Issues followed a fixed escalation path, leading to delays and unnecessary dependencies.

Agent

Supervisor

Specialist

Technician

Resolution

Flexible Routing

The system introduced non-linear routing, allowing issues to be resolved at different operational levels based on complexity.

Issue

Agent | Specialist | Technician

Resolution

TELEMETRY PROTOCOLS

SNMP (Simple Network Management Protocol)

Devices expose MIBs (Management Information Bases). The monitoring system polls at configurable intervals (default: 30 seconds for edge routers, 60 seconds for aggregation switches).

Key MIB objects:

ifInErrors / ifOutErrors: interface error counters

ifInDiscards / ifOutDiscards: packet discard rates (early congestion signal)

sysUpTime: device uptime (reset = reboot event = flag)

ifOperStatus: link operational state

cpmCPUTotal5sec: CPU utilization (high CPU causes downstream drops)

SNMP Traps ingested asynchronously alongside polled data. Polling for trend analysis; traps for event detection.

Syslog — RFC 5424

Continuous real-time event stream from all network devices.

format: <PRI>TIMESTAMP HOSTNAME APPNAME MSGID STRUCTURED-DATA MSG

Severity 0–4 (Emergency → Warning) ingested into correlation pipeline.

Severity 5–7 (Notice → Debug) stored for audit, not processed for alerting.

NetFlow / IPFIX / sFlow

Flow-level traffic data enabling:

Per-link baseline traffic volume (sudden drops = upstream failure signal)

Traffic matrix analysis for routing change impact assessment

Top-talker identification during congestion events

sFlow used on high-throughput optical segments — lower precision, lower performance overhead.

BGP (Border Gateway Protocol) Monitoring

Captured via OpenBMP or ExaBGP route collectors. BGP UPDATE messages feed the correlation engine.

Events monitored:

Route flapping: BGP keep alive timeout / interface instability

Route withdrawal: destination unreachable (user-visible disruption within 30–60 seconds)

AS path changes: upstream provider routing issue

Session resets: loss of peering session

BGP events carry the highest individual confidence weight (25 points) due to their direct correlation with user-visible failures.

NORMALIZED EVENT SCHEMA

SNMP (Simple Network Management Protocol)

Devices expose MIBs (Management Information Bases). The monitoring system polls at configurable intervals (default: 30 seconds for edge routers, 60 seconds for aggregation switches).

Key MIB objects:

ifInErrors / ifOutErrors: interface error counters

ifInDiscards / ifOutDiscards: packet discard rates (early congestion signal)

sysUpTime: device uptime (reset = reboot event = flag)

ifOperStatus: link operational state

cpmCPUTotal5sec: CPU utilization (high CPU causes downstream drops)

SNMP Traps ingested asynchronously alongside polled data. Polling for trend analysis; traps for event detection.

Syslog — RFC 5424

Continuous real-time event stream from all network devices.

format: <PRI>TIMESTAMP HOSTNAME APPNAME MSGID STRUCTURED-DATA MSG

Severity 0–4 (Emergency → Warning) ingested into correlation pipeline.

Severity 5–7 (Notice → Debug) stored for audit, not processed for alerting.

NetFlow / IPFIX / sFlow

Flow-level traffic data enabling:

Per-link baseline traffic volume (sudden drops = upstream failure signal)

Traffic matrix analysis for routing change impact assessment

Top-talker identification during congestion events

sFlow used on high-throughput optical segments — lower precision, lower performance overhead.

BGP (Border Gateway Protocol) Monitoring

Captured via OpenBMP or ExaBGP route collectors. BGP UPDATE messages feed the correlation engine.

Events monitored:

Route flapping: BGP keep alive timeout / interface instability

Route withdrawal: destination unreachable (user-visible disruption within 30–60 seconds)

AS path changes: upstream provider routing issue

Session resets: loss of peering session

BGP events carry the highest individual confidence weight (25 points) due to their direct correlation with user-visible failures.

Research and Understanding

Method 01

Desk Research

Desk Research

01

Network Operations Centre (NOC) Design Standards

NOC environments are documented in operations literature as high-stress, cognitively demanding workplaces. Design considerations unique to NOC contexts include: high ambient screen time (operators may monitor 8–16 displays simultaneously), time-pressured decision making, shift-based rotations (introducing handover risk), and the critical importance of information hierarchy — displaying the most urgent information first, not the most information first. including general cleanings and checkups, fillings, crowns, bridges, root canals, tooth extractions, and cosmetic procedures like teeth whitening and veneers.

02

SLA Terminology and Regulatory Obligations

03

Signal Correlation Theory

04

Alert Fatigue Research

05

ITIL v4 Incident Management Framework

Method 02

ISP Operational Research

ISP Operational Research

01

Network Event Classification

02

Fiber Optic-Specific Failure Modes

03

Traffic Pattern Baselines

04

Last-Mile vs. Backbone Failures

Method 03

NOC Workflow Analysis

NOC Workflow Analysis

01

The Handover Problem

02

Cognitive Load and Information Hierarchy

03

Escalation Decision Latency

04

Field Technician Information Needs

Method 04

Systems Mapping

Systems Mapping

01

The Signal Propagation Map

02

The Role Dependency Map

03

The Blast Radius Model

04

The Feedback Loop Audit

Method 05

Market Research

  1. PagerDuty


01

Event Intelligence (AIOps Layer)

02

Urgency and Severity Scoring

03

On-Call Schedule Management

04

Post-Incident Reviews

Method 05

Market Research

  1. Datadog


01

Metrics

02

Logs

03

Traces

04

Map-Based Infrastructure Visualisation

05

Anomaly Detection Widgets

Method 05

Market Research

  1. Google Site Reliability Engineering (SRE)

01

The Four Golden Signals

02

Service Level Objectives (SLOs)

03

Error Budgets

04

Toil Reduction

05

Blameless Postmortems

Method 05

Market Research

  1. Atlassian Incident Management

01

Incident Classification and Routing

02

SLA Tracking and Breach Prevention

03

Opsgenie's Alert Routing Logic

04

Statuspage for Stakeholder Communication