Sunday, June 12, 2011

notes: OSPF Troubleshooting SPF Calculation and Route Flapping

This section explains the most common reasons behind route flapping in OSPF and SPF calculation. Whenever there is a change in topology, OSPF runs the SPF algorithm to compute the shortest path first tree again. Unstable links existing within the OSPF network could cause constant SPF calculation.
This section discusses the problem of SPF running constantly in the network for the following reasons:
  • Interface flap within the network
    This is a common problem in OSPF. Whenever there is a link flap in an area, OSPF runs SPF. So, if a network has unstable links, it can cause constant SPF run. SPF itself is not a problem because OSPF is just adjusting the change in database through calculating SPF. The real prob-lem occurs if there are small routers in the network and a constant SPF run might cause a CPU spike in a router.
    A link flap in an area causes SPF to run. If a link is flapping constantly, this can increase the number of SPF calculations in an area. A constant number of SPF calculations is not a problem, but if the number is incrementing constantly, it is an indication of a problem.


    show ip ospf
    - output will show how many spf algorithim executed
    debug ip ospf monitor
    - to find out particular LSA is flapping

    show log
    - check on the router if there is an interface flapping

    Solution:
    Actually two solutions exist in this case:

  • Fix the link flap.

  • Redefine the area boundaries.

    Sometimes, the first solution might not be manageable because the link is flapping as the result of some telco outage beyond your control. One way to fix this temporarily is to manually shut down that interface.

    The second solution requires some redesigning. If the link flap is happening too often, it might be possible to redefine the area, exclude this router from the area, and make it a member of a totally stubby area. Sometimes, this is also difficult to implement.

    In short, link flaps are realities; if there are too many link flaps, the number of routers in an area should be decreased so that fewer routers are affected.
         
  • Neighbor flap within the network

    A neighbor flap also causes SPF to run. A neighbor flap can happen because of several reasons discussed already in this chapter. When a link goes down, the neighbor goes down as well.

    show ip ospf
    ospf log-adjacency-changes
      - this is to track ospf neighbor changes / it will send syslog message.
     show log

    Solution:  This problem is common in Frame Relay hub-and-spoke environments. If there are too many neighbors in Frame Relay, there is a high chance that their Hellos might start dropping. The solution in this case is to tune the broadcast queue so that it doesn't drop the OSPF Hello packets. The neighbor goes into INIT after FULL because the neighbor missed three Hellos and declared R2 dead. This can be confirmed by looking at the show interface statistics that indicate that the serial interface broadcast queue is dropping many packets.

    Too many drops are occurring at the interface level. This is causing the route to flap. To correct this problem, you must tune the Frame Relay broadcast queue accordingly.

  • Duplicate router ID
    This is also a common problem in OSPF. When two routers have identical router IDs, confusion results in the OSPF topology database, and the route keeps getting added and deleted. The most common symptom of this problem is that the LS Age field always has a small value.

    When there is a duplicate router ID, it causes SPF frequently, and the SPF counter keeps incrementing unless the problem is fixed

    show ip ospf
    debug ip ospf monitor
    show ip ospf database router x.x.x.x
    There are two instances of this output taken 15 seconds apart. The first output shows that the number of links in this router is one; the second output shows that the number of links on this router is three. This is a discrepancy because of a duplicate router ID. This means that there must be another router with the same router ID causing the number of links to change every 15 seconds. Also, the LS Age field is always less than 10 seconds.

    Solution:
    Ensure that the router ID is unique in the ospf network.



No comments:

Post a Comment