VIRTUALRACK for Network Engineers: notes: OSPF Troubleshooting CPUHOG Problems

When OSPF forms an adjacency, it floods all the link-state update packets to its neighbors. Sometimes, the flooding process takes a lot of time, depending upon the router resources. When a router's CPU gets too busy when flooding using the most of the router's resources, CPUHOG messages appear in the log.

The CPUHOG messages usually appear in two significant stages:

Neighbor formation process
LSA refresh process

This section discusses the possible solutions for these two instances of SPF:

CPUHOG messages during adjacency formation CAUSE: Router Is Not Running Packet-Pacing Code
When OSPF forms an adjacency, it floods all its link-state packets to its neighbor. This flooding sometimes takes a lot of CPU. Also, releases of Cisco IOS Software before 12.0T did not support packet pacing, which means that a router will try to send data as fast as it can over a link. If a link is slow or the router on the other side is slow in responding, this results in retransmission of the LSA and eventually leads to CPUHOG messages. Packet pacing adds a pacing interval between the LS updates. Instead of flooding everything at once, its sends the packet with a gap of a few milliseconds in between.

show log
%SYS-3-CPUHOG: Task ran for 2424 msec (15/15), process = OSPF Router
%SYS-3-CPUHOG: Task ran for 2340 msec (10/9), process = OSPF Router
%SYS-3-CPUHOG: Task ran for 2264 msec (0/0), process = OSPF Router

Solution:

Packet pacing introduces a delay of 33 ms between packets and 66 ms between retransmissions. This pacing interval reduces the CPUHOG messages, and the adjacency is formed more quickly. This feature is on by default in Cisco IOS Software Release 12.0T and later.
CPUHOG messages during LSA refresh period Cause: Router Is Not Running LSA Group-Pacing Code
This problem occurs when the Cisco IOS Software code is not Release 12.0 or later. In Cisco IOS Software Release 12.0, the LSA group pacing feature was introduced to eliminate this CPU problem that can occur every 30 minutes.
In previous versions of Cisco IOS Software, all LSAs refresh every 30 minutes to synchronize the age of all LSAs. Therefore, there is a significant flood every 30 minutes to refresh all LSAs at the same time. This flooding causes the CPUHOG messages every 30 minutes. Imagine a situation in which a couple thousand LSAs are refreshing at the same time.

Solution:

LSA group pacing looks at the LSA every periodic interval (every 4 minutes, by default) and refreshes only those LSAs that are past their refresh time. This is an efficient way of reducing a large flood by chopping it down to smaller LSA floods. No extra configuration is required for this feature, but for large numbers of LSAs (generally 10,000 or more), it is recommended to use small intervals (for example, every 2 minutes); for few 100s of LSAs, use a large interval, such as 20 minutes.
If 10,000 LSAs need to be refreshed, keeping the refresh interval smaller will check the LSA every 2 or 4 minutes to see how many LSAs have reached the refresh interval, which is 30 minutes. The advantage of checking this frequently is that fewer LSAs would need to be refreshed every 2 or 4 minutes, and this will not cause a huge storm of LSA updates. If the number of LSAs is small, it really doesn't matter whether the refresh occurs at 2 minutes or 20 minutes. That is why it's better to increase the timer so that all the LSAs that are few in number can be refreshed at once.

R1(config)#router ospf 1
R1(config-router)#timer lsa-group-pacing ?
<10-1800> Interval between group of LSA being refreshed or maxaged

Saturday, June 11, 2011

notes: OSPF Troubleshooting CPUHOG Problems

No comments:

Post a Comment