Monday, June 6, 2011

notes: OSPF Troubleshooting Neighbor Relationship

I.  OSPF Neighbor relationship

1.  The OSPF neighbor list is empty.
  • OSPF is not enabled on the interface.
    show ip ospf interface
  • Layer 1/2 is down.
    show ip ospf interface
    show interface
    show ip ospf neighbor
     Possible reasons:
    • Unplugged cable
    • Loose cable
    • Bad cable
    • Bad transceiver
    • Bad port
    • Bad interface card
    • Layer 2 problem at telco in case of a WAN link
    • Missing clock statement in case of back-to-back serial connection
  • The interface is defined as passive under OSPF.
     show ip ip ospf interface
    no passive-interface
    show ip ospf neighbor

    Passive-interface command is entered intentionally so that the router cannot take part in any OSPF process on that segment. This is the case when you don't want to form any neighbor relationship on an interface but you do want to advertise that interface.

    In OSPF, a passive interface means "do not send or receive OSPF Hellos on this interface." So, making an interface passive under OSPF with the intention of preventing the router from sending any routes on that interface but receiving all the routes is wrong.

  • An access list is blocking OSPF Hellos on both sides.
    OSPF sends its Hello on a multicast address of 224.0.0.5.  If only one side is blocking OSPF Hellos, the output of show ip ospf neighbor will indicate that the neighbor is stuck in the INIT state.
    using debug commands:
    access-list 101 permit ip x.x.x.0 0.0.0.255 host 224.0.0.5 
    debug ip packet 101 detail
    output:
    IP: s=131.108.1.2 (Ethernet0), d=224.0.0.5, len 68, access denied, proto=89
    Solution:  add the multicast in the ACL
     access-list 100 permit ip any host 224.0.0.5
     verification:
    show ip ospf neighbor
  • A subnet number/mask has been mismatched over a broadcast link.
    OSPF performs the subnet number and mask check on all media except point-to-point and virtual links 
    debug ip ospf adj                                            
     
    Solution:  correct the mask for both sides of d link.                                                                             
    Verification:
    show ip osp neighbor                                                                                                                                       
  • The Hello/dead interval has been mismatched. 
    OSPF neighbors exchange Hello packets periodically to form and maintain neighbor relation-ships. OSPF advertises the router's Hello and dead intervals in the Hello packets. These intervals must match with the neighbor's; otherwise, an adjacency will not form.                                                 
    debug ip ospf adj
    Solution:  ensure the OSPF timers are matched at both sides of the link.
    to change hello interval from its default value:
    ip ospf hello-interval #
    verification:
    show ip ospf neighbor

  • The authentication type (plain text versus MD5) has been mismatched.
    OSPF uses two types of authentication, plain-text (Type 1) and MD5 (Type 2). Type 0 is called null authentication. If the plain-text authentication type is enabled on one side, the other side must also have plain-text authentication. OSPF will not form an adjacency unless both sides agree on the same authentication type.
    debug ip ospf adj
    Solution:  ensure that authentication mode is the same at both sides.
    verification:
    show ip ospf neighbor

  • An authentication key has been mismatched.
    When authentication is enabled, the authentication key also must be configured on the interface. Authentication previously was supported on a per-area basis, but beginning with the specifications in RFC 2328, authentication is supported on a per-interface basis. This feature has been implemented in Cisco IOS Software Release 12.0.8 and later.
    If authentication is enabled on one side but not the other, OSPF complains about the mismatch in authentication type. Sometimes, the authentication key is configured correctly on both sides but debug ip ospf adj still complains about a mismatched authentication type. In this situation, authentication-key must be typed again because there is a chance that a space was added during the authentication key configuration by mistake. Because the space character is not visible in the configuration, this part is difficult to determine.
    Another possible thing that can go wrong is for one side, R1, to have a plain-text key configured and the other side, R2, to have an MD5 key configured, even though the authentication type is plain text. In this situation, the MD5 key is completely ignored by R2 because MD5 has not been enabled on the router. 
    debug ip ospf adj
    Solution:  make sure that both sides have the same kind of authentication key. If the problem still exists, retype the authentication key; there is a possibility of an added space character before or after the authentication key.
    verification:
    show ip ospf neighbor

  • An area ID has been mismatched.
    OSPF sends area information in the Hello packets. If both sides do not agree that they are members of a common area, no OSPF adjacency will be formed. The area information is a part of the OSPF protocol header.
    debug ip ospf adj
    Solution:  ensure that same area are used in the network command at both sides of the link.
    verification:
    show ip ospf neighbor
  • Stub/transit/NSSA area options have been mismatched.
    When OSPF exchanges Hello packets with a neighbor, one of the things that it exchanges in the Hello packet is an optional capability represented by 8 bits. One of the option fields is for the E bit, which is the OSPF stub area flag. When the E bit is set to 0, the area with which the router is associated is a stub area, and no external LSAs are allowed in this area.
    If one side has the E bit set to 0 and the other side doesn't, OSPF adjacency is not formed. This is called an optional capability mismatch. One side says that it can allow external routes, and the other side says that it cannot allow external routes, so OSPF neighbor relationships are not formed.
    debug ip ospf adj
    Solution:  make sure that both sides agree on the same type of area
    verification:
    show ip ospf neighbor

  • An OSPF adjacency exists with secondary IP addressing.
    This is a very common problem in which a customer might have one Class C address on a LAN segment. When the customer runs out of address space, he gets another Class C address and assigns the new address as a secondary address under the same interface. Everything works fine until two routers must exchange OSPF Hellos/updates and one router's primary IP ad-dress is assigned as the secondary IP address on the other side, as depicted in the network  
    debug ip ospf adj
    Solution: this kind of problem is to create subinterfaces on R1. This is possible only if the interface that has the secondary address is Fast Ethernet or Gigabit Ethernet and it is con-nected through a Layer 2 switch. This can be achieved through an Inter-Switch Link (ISL), in the case of a Cisco switch, or dot1Q encapsulation, in the case of a different vendor's switch. ISL or dot1Q encapsulation is used to route between two separate VLANs.
    verification:
    show ip ospf neighbor

  • An OSPF adjacency exists over an asynchronous interface.
    You must enable asynchronous default or dynamic routing when OSPF is enabled between two routers over asynchronous interface. When async default routing is enabled, the router always sends routing packets over an asynchronous interface. In case of interactive asynchronous connections for which users have to type ppp to establish the PPP session, the async dynamic routing command can be used, but then users must type ppp /routing to enable routing over the asynchronous interface. An inability to do this causes OSPF not to form any adjacency over the asynchronous link.

    show interface
    Solution: use either async default routing or asyn dynamic routing to solve this problem.
    verification:
    show ip ospf neighbor

  • No network type or neighbor is defined over NBMA (Frame Relay, X.25, SMDS, and so on).
    This is a classic problem of NBMA networks. OSPF or any other routing protocol will not be capable of sending or receiving any Hello packet unless you configure a neighbor statement or change the network type to broadcast or point-to-multipoint. When the neighbor statement is configured, it triggers OSPF Hellos and neighbor relationships are formed.
    show ip ospf interface
    Solution: 
    1.  configure the neighbor statement under router ospf.
    2.  changing the network type to either broadcast or point-to-multipoint. In this case, OSPF starts sending the multicast Hellos across the link

    ip ospf network broadcast
    ip ospf network point-to-multipoint

    verification:
    show ip ospf neighbor
    show ip ospf interface
  • The frame-relay map/dialer map statement is missing the broadcast keyword on both sides.
    OSPF uses multicast Hellos to form adjacencies. Other routing protocols—for example, RIP and EIGRP—also use broadcasts or multicasts to form neighbor relationships. In the case of Frame Relay or dialer interfaces, you must enable the broadcast keyword in frame-relay or dialer-map statements on both ends to propagate OSPF Hellos. These maps statements are valid only if the interfaces are multipoint in nature. For example, by default, Frame Relay interfaces are multipoint. Also, the BRI interface is multipoint because it is capable of dialing more than one number.
    One thing to note here is that both sides should have this broadcast keyword missing from the frame-relay map or dialer-map configurations to produce this problem. If just one side is missing the broadcast keyword, the other side will see this router in INIT and the neighbors will never become adjacent.

    using debug commands:
    access-list 101 permit ip x.x.x.0 0.0.0.255 host 224.0.0.5 
    debug ip packet 101 detail 

    Output:
    IP: s=131.108.1.1 (local), d=224.0.0.5 (Serial0), len 68, encapsulation failed, proto= 89 
     
    Solution: 
     the keyword broadcast must be enabled on both sides. If it is enabled on only one side, it will produce a stuck in INIT problem.                                                                                                              frame-relay map ip 131.108.1.2 16 broadcast                   dialer map ip 131.108.1.2 broadcast name R2 76444                      verification:
  • show ip ospf neighbor
    show ip ospf interface

2.  An OSPF neighbor is stuck in ATTEMPT.
  • OSPF Neighbor Stuck in ATTEMPT—Cause: Misconfigured neighbor Statement                                show ip ospf neighbor                                                                                        solution:  configure the proper neighbor statement with the proper IP address                                       verification:                                                                                                                                              show ip ospf neighbor
  • SPF Neighbor Stuck in ATTEMPT—Cause: Unicast Connectivity Is Broken on NBMA                     
    OSPF sends unicast Hellos over NBMA interfaces if neighbor statements manually are con-figured. If the unicast connectivity is broken, OSPF will never form any adjacencies. OSPF tries to contact neighbors every Hello interval (that is, every 30 seconds) by default over NBMA interfaces. If it does not receive any reply from the neighbor, it will show that the neighbor is stuck in ATTEMPT. Many possible reasons can exist for broken unicast connectivity. You should consider the following causes for a broken unicast connectivity, assuming that Layer 2 is up:
    - A wrong DLCI or VPI/VCI mapping exists in a Frame Relay or ATM switch, respectively.
    - An access list is blocking the unicast.
    - NAT is translating the unicast.
     perform ping test.
    show ip ospf neighbor
    Solution:  the unicast broken connectivity could be the result of many factors. If it's a wrong DLCI or VC mapping, be sure to check these mappings and correct those. If it's the access list that is blocking the unicast connectivity, be sure to permit the necessary unicast IP address in the access list
    verification:
    show ip ospf neighbor 


3.  An OSPF neighbor is stuck in INIT.
     When a router receives an OSPF Hello from a neighbor, it sends the Hello packet by including that    
      neighbor's router ID in the Hello packet. If it doesn't include the neighbor's router ID, the neighbor will be  stuck in INIT.

    The most common possible causes of this problem are as follows:
  • An access list on one side is blocking OSPF Hellos.
    Extended IP access list 101 
    permit ip 131.108.1.0 0.0.0.3 host 224.0.0.5
    debug ip packet 101 detail    
    Solution:
    allow the source to send packtes to multicast address.
    access-list 100 permit ip 131.108.1.0 0.0.0.255 host 224.0.0.5
     Verification:
    show ip ospf neighbor
     
  • Multicast capabilities are broken on one side (6500 switch problem).
     This is a specific situation that is valid only in the case of a Catalyst 6500 switch with the multilayer switch feature card (MSFC). The problem is that one side is sending OSPF Hellos that the other side does not receive.
    This situation is produced when the command set protocolfilter enabled is entered on the 6500 switch. By default, the protocol filter is disabled. Enabling this command begins altering the multicast frame to and from MSFC and port adapter within the FlexWan module of the 6500 switch.  
    show ip ospf neighbor
    Solution:  
    CAT6k(enable) set protocolfilter disable
    Verification:
    show ip ospf neighbor

  • Authentication is enabled on only one side (virtual link example).
    When authentication is used, it must be enabled on both sides; otherwise, one side will show the neighbor stuck in the INIT state. The router that has authentication enabled will reject all the nonauthenticated packets, and the adjacency will show stuck in INIT. The other side will not detect any problem because the authentication is turned on, so it will simply ignore the authentication in a packet and treat it as a normal packet.
    debug ip ospf adj
    Solution:  ensure authentication is enabled in the ospf process and interface and keys are the same at both ends of the link.
    Verification:
    show ip ospf neighbor

  • The frame-relay map/dialer map statement on one side is missing the broadcast keyword.
    Extended IP access list 100 
    permit ip 131.108.1.0 0.0.0.3 host 224.0.0.5 
    debug ip packet 100 detail 
    - if it shows encapsulation failed.
    Solution: ensure broadcast keyword is included in the frame-relay map command.
    Verification
    show ip ospf neighbor

  • Hellos are getting lost on one side at Layer 2.
    This situation happens when there is a problem on the Layer 2 media; for example, the Frame Relay switch is blocking the multicast traffic for some reason. When R1 sends the Hello, R2 never receives it. Because R2 never saw Hellos from R1, the neighbor list of R2 will be empty. However, R1 sees the Hellos from R2, which does not list R1 as a valid neighbor; so, R1 declares this neighbor in the INIT state.
    Check at both sides of the link if they received each other hello packets.
    Extended IP access list 100 
    permit ip 131.108.1.0 0.0.0.3 host 224.0.0.5
    debug ip packet 100 detail
          Solution: 
          Step 1. Change the network type on both sides to nonbroadcast. 
          Step 2. Configure the neighbor statement on one router.
                  
            This solution is a workaround for the Layer 2 problem, but it doesn't fix the original Layer 2 problem. By changing the network type to nonbroadcast, OSPF will send and receive Hellos as unicast instead of multicast. So, if any issues occur with multicast at Layer 2, changing the network type to nonbroadcast and configuring a neighbor statement causes OSPF to form neighbors on a medium whose multicast capabilities are broken.

 4.  An OSPF neighbor is stuck in 2-WAY.
      It is normal in broadcast media to have a 2-WAY state because not every router becomes adjacent on 
      broadcast media. Every router enters into FULL state with the DR and the BDR.  In this example, there 
      are only two routers on Ethernet; both are configured with priority 0. Priority 0 means that this router wil
      not take part in DR/BDR election process.
      If all the routers on an Ethernet segment are configured with priority 0, no routers on the segment will be 
      in FULL state with any other router. This creates problems. At least one router on the segment must have 
      a priority that is not set to 0.
      show ip ospf neighbor 
      Solution: remove the priority 0 command on one router so that it will default to 1. or change the priority
      non-zero value.
      Verification:
       show ip ospf neighbor

5.  An OSPF neighbor is stuck in EXSTART/EXCHANGE.
     This is an important state during the OSPF adjacency process. In this state, the router elects a master
     and   a slave and the initial sequence number. The whole database also is exchanged during this state. If a
     neighbor is stuck in EXSTART/EXCHANGE for a long time, it is an indication of a problem. For more
     information on the EXSTART/EXCHANGE state
  • Mismatched interface MTU
    OSPF sends the interface MTU in a database description packet. If there is a MTU mis-match, OSPF will not form an adjacency.
    debug ip ospf adj
    output will show that there is a mismatch in mtu.
    show int  int-type#
    Solution:  make sure that the MTU is set to the same value on both sides

    There is another situation that could lead to a MTU mismatch—when a router is connected through FDDI to a switch with the route switch module (RSM) blade in it
    This is a normal setup in a Catalyst switch environment. When a packet is received on a switch FDDI port, it goes across the switch backplane to the slot where the RSM is installed. The conversion/fragmentation from FDDI to Ethernet happens at the switch level.
    Solution:  
    ip ospf mtu-ignore
    Verification:
    show ip ospf neighbor

  • Duplicate router IDs on neighbors
    When OSPF sends a DBD packet to elect a master and a slave, the router with the highest router ID becomes the master. This happens in the EXSTART process. If there is any problem with election, the router will be stuck in the EXSTART/EXCHANGE state.
    show ip ospf neighbor
    debug ip ospf adj
    Solution:  review the router-id of all routers in the OSPF network.
    Verification:
    show ip ospf neighbor

  • Inability to ping across with more than certain MTU size
    When OSPF begins forming an adjacency with its neighbor, it goes through several states. In EXSTART state, OSPF determines which will be the master and which will be the slave. After the routers decided this, they start exchanging the LSA header in the form of DBD packets. If the database is huge, OSPF uses the interface MTU and tries to send as much data as possible up to the limit of the interface MTU. If there is a problem with Layer 2 accepting large packets that are within the interface MTU range, the OSPF adjacency will be stuck in the EXCHANGE state.
    show ip ospf neigbhor
    debug ip ospf adj
    output:
    OSPF: Send DBD to 131.108.2.1 on Serial0 seq 0x793 opt 0x2 flag 0x3 len 1274 OSPF: Retransmitting DBD to 131.108.2.1 on Serial0
    - normal ping is successful but ping with 1,200 size fails.
    Solution:
    The problem is actually with Layer 2. R1 can ping R2 when using a 100-byte datagram, but the ping starts failing when the datagram size is greater than 1200 bytes.
    To solve this problem, fix the Layer 2 issue. One way to narrow this problem is to connect the two devices directly instead of going through switches and so forth, to see whether the problem is with the Layer 2 devices or with the router itself. If connecting routers back to back doesn't fix the problem, there is a possibility of bad hardware. Most times, it turns out to be a problem in the middle—for example, a LAN switch or a telco cloud.
    Depending upon the media, there are several recommendations:

    In the case of a LAN medium  
               - Check the MTU size defined in the switch configuration for this medium.   
               - Try using a different port.
    In the case of a WAN medium
              - If you are the WAN cloud provider, check at which hop it fails. 
              - If you are getting a circuit from a telco, request that the WAN cloud in the middle be checked 
                 to see where it fails.         
  • Broken unicast connectivity because of the following:
    When OSPF routers begin exchanging database information with each other, they send a unicast packet to each other in EXSTART/EXCHANGE state. This happens only if the network type is not a point-to-point link. In cases of a point-to-point link, OSPF sends all multicast packets. If unicast connectivity is broken, OSPF neighbor remains in EXSTART state.
     Ping Fails:
    - Wrong VC/DLCI mapping in Frame Relay/ATM switch
       debug packet at both sides.
       Extended IP access list 100 
     permit ip 131.108.1.0 0.0.0.255 131.108.1.0 0.0.0.255
     debug ip packet detail 100
      output:  if both sides are sending and still no response, check with your provider.
      Solution:   the telephone carrier should be contacted to determine whether any such thing has happened.   
      There is a slight chance that the problem could be with the router itself and that it is dropping 
      the packet. Any other problems will appear in the debug messages. Problems such as the 
      wrong Frame Relay mapping within the router produce "encapsulation failure" messages in the 
      debug output.
    - Access list blocking the unicast
       Extended IP access list 100 
     permit ip 131.108.1.0 0.0.0.255 131.108.1.0 0.0.0.255
     debug ip packet detail 100
       Solution: ensure ACL is not blocking the multicast destination
    - NAT translating the unicast
      This is another common problem that occurs when NAT is configured on the router. If NAT 
      is  misconfigured, it will start translating the unicast packet coming toward it, which will break 
      the unicast connectivity. 
      The main thing to watch for is the access list in NAT. If the access list is permitting everything, 
      this problem will occur
      topology :
                       10.0.0.0/8-R1---------------------------------R2----
                                       NAT e0    131.108.1.0/24            e0
    R1#  
    interface Ethernet 0  
    ip nat outside  
    !  
    ip nat inside source list 1 interface Serial0.2 overload  
    !  
    access-list 1 permit any
    Solution:  To solve this problem, change access list 1 and permit only those IP address that require translation
    access-list 1 permit 10.0.0.0 0.255.255.255 
    verification:
    show ip ospf neighbor

  • Network type of point-to-point between PRI and BRI/dialer
    The network type on a PRI interface is point-to-point. This causes OSPF to send multicast packets even after the 2-WAY state. If only one BRI comes up as an OSPF neighbor, it will work fine. However, when multiple BRIs try to form an adjacency with the PRI, the PRI will complain because its network type is point-to-point. Because all OSPF packets are sent as multicast on a point-to-point link, the PRI receives DBD packets from multiple BRI neighbors, and this causes all the neighbors to get into the EXSTART/EXCHANGE state.

    Note: R1 is a hub using  single Pri interface to connect R2 and R3
                   R1 (pri)--------------(bri)R3
                   R1 (pri)--------------(bri)R2

    show ip ospf neighbor
    show ip ospf interface  ( to verify the network type)
    debug ip ospf adj
    Solution: convert  all interface to point-to-multipoint
    Verification:
    show ip ospf neighbor

    Changing the net-work type to point-to-multipoint forces OSPF to send a unicast packet for DBDs instead of a multicast after 2-WAY state, so the packet destined for R3 never reaches R2.

6.  An OSPF neighbor is stuck in LOADING.
      This is a rare problem in OSPF neighbor relationships. When a neighbor is stuck in the LOADING state, 
      the local router has sent a link-state request packet to the neighbor requesting an outdated or missing 
      LSA and is waiting for an update from its neighbor. If a neighbor doesn't reply or a neighbors' reply never 
      reaches the local router, the router will be stuck in the LOADING state.
  • Mismatched MTU
    This is a unique problem that happens when an MTU mismatch occurs. If the MTUs are not the same across the link, this problem occurs. Specifically, if a neighbor's MTU is greater than the local router's, the neighbor sends a large MTU packet as a link-state update. This packet never reaches the local router; as a result, the neighbor gets stuck in the LOADING state.
    show interface int-type#
    show version
    debug ip ospf adj
    Solution:
    In this particular case, R2 is running Cisco IOS Software Release 11.3.10T, which does not support MTU mismatch detection. R1 is running Cisco IOS Software Release 12.0.7T, which does support MTU mismatch detection. R1 detects MTU mismatches only when R2's MTU is higher than R1's; otherwise, it does not complain. In other words, MTU mismatch detection is valid only for a neighbor with an MTU higher than that of the local router.
    In this case, R2's MTU is 2048, so even though R1 is running Cisco IOS Software code with MTU mismatch detection, R1 cannot detect an MTU mismatch because R2's MTU is lower than R1's.
    When R2 sends the LS request packet for the new instance of the LSAs, R1 replies with an LSA that exceeds 2048, so R2 never gets that packet because it is too large. To fix this problem, make sure that the MTUs on both sides match. To change the MTU on an interface (in this case, R2's Serial 0 interface), enter the following interface-level command:
     R2
    interface serial 0  
    mtu 4470 

  • Corrupted link-state request packet
    When a link-state request packet is corrupted, the neighbor discards the packet and the local router never receives the response from the neighbor. This causes the OSPF neighbor to be stuck in the LOADING state.
    Link-state request packets usually become corrupted because of the following reasons:        
             - A device between the neighbors, such as a switch, is corrupting the packet.    
             - The sending router's packet is invalid. In this case, either the sending router's interface is bad or 
                the error is caused by a software bug.  
              - The receiving router is calculating the wrong checksum. In this case, either the receiving 
                 router's interface is bad or the error is caused by a software bug. This is the least likely cause 
                of this error message.
    show log 
    %OSPF-4-ERRRCV: Received invalid packet: Bad Checksum from 131.108.1.1, Serial0 
    %OSPF-4-ERRRCV: Received invalid packet: Bad Checksum from 131.108.1.1, Serial0
    debug ip ospf adj

    Solution: hardware replacement


No comments:

Post a Comment