How to Design a Reliable Office Network


A basic network has two major components that provide service, an ISP (modem) and a router.  If either fails, the entire network goes down.  These are called SPOFs (single points of failure).

My predecessor used 2 basic networks in his original configuration.  The intent was to be more reliable, but the effect was opposite.  If one network failed, all users migrated to the other, causing it to crash invariably.  Statistically, we had one system with 4 SPOFs.  This design should never be used.

*TWC modems contain integrated routers (not shown), we will not calculate separate probability for these.  Instead, each ISP will be assumed 50% rate of failure for probability calculations.  Limiting possible failure combinations to 16 instead of 64.


Out of 16 possible failure combinations, 15 result in complete loss of service:

1/16 Success Rate (old network)

After many failures, a new design was needed.  Some research led me to Dell SonicWall.  In Dell’s HA design, if one router fails, its “software license” and service are transferred to the duplicate hardware.  The duplicate hardware has a reduced price because of the shared license.

In addition, I removed one of Time Warners modems (saving $400/mo).  Recently, Google Fiber was installed as well.  This allows us to use Time Warner as a failover in the event Fiber fails.  Google Fiber is a new service, and fiber breaks are notoriously hard to repair.  For this reason having a failover ISP is good practice.


Using all of this the new network has 0 SPOFs!  The statistical advantages are incredible!!!
Again, we will assume that each ISP and router has a 50% rate of failure (hugely exaggerated).
Out of 16 possible failure combinations, the new network fails only 7 of them!

9/16 Success Rate (new network)*
*Although Google uses a separate Modem and Router, probability was not calculated for these individual components because TWC’s modem contains an integrated Router as well.  If we were to calculate all of this separately, there would be 64 possible failure combinations.  To keep things simple Google and TWC are given equal 50% failure rates as ISPs.

The Riverbeds of Oman:

All good design borrows concepts from nature.  Water always chooses the path of least resistance.  By allowing more than one path, our connectivity behaves the same way.
I hope you enjoyed this presentation.  The results took a long time to achieve.