Today at work we experienced a very serious network issue. Initially it seemed that the network just went down, we couldn't get outside of our facility, neither to the internet nor to our other facilities via the MPLS connection. But, nslookup (domain resolution via UDP) was working. We noticed that we could get to things on the network as long as they were in the same network (10.x.y.n/255.255.0.0 could connect to 10.x.z.n/255.255.0.0), so it seemed that the core switch was not able to route requests. I opened up a simple network monitor and saw that there were two devices flooding the network with broadcasts (10.x.x.x port 138 -> 10.x.255.255 port 138 and 10.x.x.y port 1985 -> 224.0.0.2 port 1985). The networking group found that the core switch was running at 100% processor. So, they identified which ports on the core these packets were coming from, then unplugged that cable. From there we tracked it to the next closet, and to the next closet. From there it was narrowed down to a room with thin clients that were not turned on. In the process of trying to identify which hardware might be causing the problem they found a small switch with an Ethernet cable plugged into it in an unussual manner.

It turns out that a simple network switch (small, non-configurable, Netgear type switch) had a regular patch cable plugged into two of it's ports. So, one switch port was plugged into another port on the same switch with a regular Ethernet network patch cable (not a crossover cable). And that seemingly minor thing brought our network to it's knees for a short period of time.

Tags: Blogger Network Lessons Learned

Published: 2008-06-02