This past Sunday PAXIO experienced an outage that impacted the majority of our Oakland & South Bay served customers. Both dynamic “DHCP” and static IP internet customers were impacted, EPL circuits were not affected. We sincerely apologize for this inconvenience.
This outage was not due to scheduled or planned work, a router crashed to an inaccessible state. A physical replacement part was required to restore service. All services should be restored at this time -- please continue to reach out to our support team if your service is still down.
At approximately 11:35am a crucial router went offline. Paxio engineers determined the equipment was unreachable via both our in-band and out-of-band management paths. A technician was dispatched onsite at 12:03pm to assess the outage.
The router was found in a kernel panic state, a power cycle or “hard reboot” was issued at 12:07pm. The router failed to come online due to a file system corruption error. File system consistency checks continued to fail, a replacement “route engine” (essentially the brains or main central processing unit for routing) was installed at 12:21pm.
The replacement route engine had a newer operating system build pre-installed, this software was not fully compatible with our environment. Static customers had restored service for brief periods, however the dynamic interfaces were not fully coming online. The physical interface that served these customers was only showing traffic in a single direction (commonly known as half-duplex), even though physical link was reported on both sides of this link. Configurations were re-checked, optics swapped, patch cables swapped, line cards swapped - no change would alleviate the half-duplex problem. A capture of the running state and debugging information was archived for the router vendor to troubleshoot in the future.
Various older software revisions were re-installed to try to remedy the problem, at the same time a support escalation ticket was opened with the hardware vendor. At 1:43pm an older version of the software brought interfaces back up in a compatible state. All systems were online by 2:19pm.
The debugging information captured has allowed our network vendor to determine a possible cause of the unsupported newer operating system. Paxio engineers will begin testing this in a lab environment this week.
PAXIO strives to have multiple levels of redundancy within our network. We are in the process of building out new portions of the network that will connect this customer base to multiple redundancy routers in the future.