Wednesday, October 15, 2014
removing a BGP router
When you wnat to remove a device that is running IBGP, you have to watch where IBGP sessions can form.
I was removing a router with had IBGP configured on it to another router at a site that was connected to MPLS providers. The LAN interfaces were shut down.
What happened next was since the router loopbacks (the IBGP neighbors) were put into the provider and IBGP session would form through the providers with a long as path. Now you would think that BGP split horizon would prevent router A from learning about router B, BUT if you have a default route, traffic will flow to your data center that will know about both loopbacks via the providers and will route traffic between them
Then prefixes for the local subnets began to flow to the router that had no connection to the local networks.
Then that router advertised them to the provider so traffic started to flow to a router that could not forward any of that traffic.
Finally router A would send via IBGP its own loopback that would point to a local route, since router B was not connected to the any LANs once that IBGP advertisement of the loopback arrived, the IBGP session would time out, then the local based loopback prefix gets removed the prefix advertised by the provider comes back.
THUS the ibgp peer goes up and down in a 90 second sequence.
Net of this is that if you are removing a device from the network, first thing to do is to shut down dynamic routing to that box, add any static routes to say loopbacks you need and a default route to the device being removed, then after a while shut down the interfaces.
Friday, August 22, 2014
Reducing the size of the internet routing table and static routes
Much ink has been spilled about the 512K routes and various network equipment that has trouble with that limit. You may consider either reducing the number of routes (say not take in the /24s after getting the default route sent to you by the ISP) and not load balance as well as you would like to just do the default route only anyway.
The issue is you may have VPNs, or NATs or other things with static routes that assume that you have the internet routing table. If you have redistributed those static routes into the IGP, when you get rid of your eBGP internet routes you find youself with a routing loop.
Net of this is when you are making a change to your internet routing audit your devices for static routes that are being redistributed and ensure that you do not create a routing loop when you reduce your internet routing table size.
The issue is you may have VPNs, or NATs or other things with static routes that assume that you have the internet routing table. If you have redistributed those static routes into the IGP, when you get rid of your eBGP internet routes you find youself with a routing loop.
Net of this is when you are making a change to your internet routing audit your devices for static routes that are being redistributed and ensure that you do not create a routing loop when you reduce your internet routing table size.
anycast and bgp may not mix
Anycast is a common way of doing distributed services. The idea is to have duplicate IP addresses and thus have duplicate routes. What then happens is a user picks the nearest service instance based on the routing table. When you are using BGP, you have to be aware of ebgp behavior. If data center A and data center B want to use a common anycast address, remember that only 1 of the edge routers will have an ebgp route the other data center will not. What happens is data center A send ebgp router, data center B receives it. Data center B puts ebgp route in the routing table. Since there is not an IGP route in the routing table data center B will not inject its own ebgp route. A distance command can fix this you just set the distance for any received ebgp anycast prefix to less than that of ospf
router bgp ####
distance 0.0.0.0 255.255.255.255 120 <acl>
Point is that when you are using BGP you have to design around where the anycast addresses will be injected and ensure that you have IGP prefixes where they need to be
router bgp ####
distance 0.0.0.0 255.255.255.255 120 <acl>
Point is that when you are using BGP you have to design around where the anycast addresses will be injected and ensure that you have IGP prefixes where they need to be
Wednesday, June 11, 2014
messed up default in a vrf
Problem was that servers in a vrf that used a stub area could not reach outside the VRF. The default routes were messed up they were pointing not to the exit but to themselves. I knew this was an area 0 problem as you get a default route when you have a connection to area 0.
The flip side is that when you are setting up a VRF the box that you want to originate has to have 1 interface in area 0 to force the advertisement of a default route
Looking in one of the switches i see that there is an area 0 there. This is not correct there is not supposed to be one.
Routing Process 100 with ID 10.202.255.7 VRF up
Area BACKBONE(0.0.0.0) (Inactive)
Area has existed for 1w0d
Interfaces in this area: 1 Active interfaces: 1
Passive interfaces: 1 Loopback interfaces: 0
No authentication available
SPF calculation has run 2 times
Last SPF ran for 0.000125s
Area ranges are
Number of LSAs: 90, checksum sum 0x2e88fb
Area (0.0.0.202)
Area has existed for 2y9w
Interfaces in this area: 22 Active interfaces: 21
Passive interfaces: 19 Loopback interfaces: 1
This area is a STUB area
Generates stub default route with cost 1
No authentication available
SPF calculation has run 690 times
Last SPF ran for 0.003378s
Area ranges are
Number of LSAs: 31, checksum sum 0xeb8bc
So now need to find the bad interface I just need to know which interface is in area 0 so I can fix it
sho ip ospf int vrf up | inc Vlan|area
Vlan6626 is up, line protocol is up
IP address 10.10.10.2/23, Process ID 100 VRF backup, area 0.0.0.0
so fixing the ospf area statement under the SVI (this was NX-OS) fixes the problem.
The flip side is that when you are setting up a VRF the box that you want to originate has to have 1 interface in area 0 to force the advertisement of a default route
Looking in one of the switches i see that there is an area 0 there. This is not correct there is not supposed to be one.
Routing Process 100 with ID 10.202.255.7 VRF up
Area BACKBONE(0.0.0.0) (Inactive)
Area has existed for 1w0d
Interfaces in this area: 1 Active interfaces: 1
Passive interfaces: 1 Loopback interfaces: 0
No authentication available
SPF calculation has run 2 times
Last SPF ran for 0.000125s
Area ranges are
Number of LSAs: 90, checksum sum 0x2e88fb
Area (0.0.0.202)
Area has existed for 2y9w
Interfaces in this area: 22 Active interfaces: 21
Passive interfaces: 19 Loopback interfaces: 1
This area is a STUB area
Generates stub default route with cost 1
No authentication available
SPF calculation has run 690 times
Last SPF ran for 0.003378s
Area ranges are
Number of LSAs: 31, checksum sum 0xeb8bc
So now need to find the bad interface I just need to know which interface is in area 0 so I can fix it
sho ip ospf int vrf up | inc Vlan|area
Vlan6626 is up, line protocol is up
IP address 10.10.10.2/23, Process ID 100 VRF backup, area 0.0.0.0
so fixing the ospf area statement under the SVI (this was NX-OS) fixes the problem.
So what about applications specifically do network engineers need to know.
I was at Network Worlds ONX (SDN) conference and I heard for the nth time that network engineers need to learn how to communicate with our application developer partners.
Learn WHAT, never seems to be answered. So I came to the conclusion I would need to figure this out myself.
Clearly we are going to have to speak some common language and develop a way to map the design of an application into network characteristic we can provision (QoS, load balancing, security, traffic engineering etc.).
I think the knowledge is there (and been there for a while) so looking at the ancient texts may give me some answers
My first stop is the classic 8 myths of distributed computing
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn't change.
There is one administrator.
Transport cost is zero.
The network is homogeneous
See below for some links. We can now begin a discussion as to did the application developer make any of these assumptions and are any of these issues in your network.
http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
http://java.sys-con.com/node/38665
http://www.rgoarchitects.com/Files/fallacies.pdf
I also remembered that there are a number of different interaction models for applications
RPC: Short message the application is waiting for a response so latency important
Message queuing: I send a message and check back later, latency not as important
File transfer: a 1 way transmission care about available bandwidth and is the application written to consume the bandwidth.
Email?
Streaming
Not sure if data base calls are the same as RPC or different.
The point is that the interaction model between application components is key to what demands the application will have upon the network. So understanding the interaction models and mapping them to the sorts of network services are required to make those applications work seems a good place to start.
I am not sure if I need to go into the types of SOA, SOAP, XML etc. I am hoping that people smarter than me can begin to describe what the right information we need to have as network engineers.
Friday, February 21, 2014
failing traceroutes with unix
Unix Traceroute uses a set of UDP ports starting at 33434 (33434 to 33534) the reply can be desination unreachable. There can be an issue were they trace fails on certain packets because the target system has some program listening on one of the UDP ports in queston, you see this with the Nth trace packet always failing. Some systems let you override the UDP range.
The specific issue occured when 1 packet failed on a trace route to a system on the same subnet, 1000s of pings worked without a problem.
So use ICMP ping if you can, if UDP traceroutes fail at the last hop for no reason, this can be considered
Monday, February 10, 2014
IBGP can cycle between established and available
IF you have a configuration where you have 2 routers to 2 separate ISPs if you run IBGP on the loopback interface and you do not cover that with an IBGP network statement, BUT put a BGP network statement this bouncing can happen.
Router B does not have its loopback statement covered by OSPF, router A learns the loopback via eBGP. An IBGP session comes up going through the providers. Since most IBGP sessions have next hop self you get a recursive route situation. Adding router Bs loopback into the IGP fixes this.
The first part of IBGP debugging is making sure that the peers match (if router A peers to router Bs loopback, then router B needs update source loop 0), after that you check that you have a route to the loopback address. You have to check were that route came from (make sure not eBGP).
Subscribe to:
Posts (Atom)