So we've got some disruptive work to do to the network. There are multiple steps, and failure is a possibility. We'll need to schedule some sort of maintenance window for the activity, and have a plan developed in advance for the work, and a way to undo all of the work in case something doesn't according to plan. It's really important to have a Method and Procedure (MOP) laid out, for a couple of reasons. First, it gives us something to refer to as we are in the thick of the maintenance, and second and more importantly, it forces us to think through the steps in advance. You might be surprised how many details you think of while you're writing a MOP that didn't occur to you before. Many maintenance plans have been cancelled during the MOP authoring because the writer realized some critical element that they had forgotten about when the plan was just an idea in their head.
The first thing to have is a mental picture of where we are right now, and where we want to get to. Then we draw it out. I've got some visio diagrams to do that, but a big whiteboard would be just as good. Here's where we are now:
And here's where we want to get to:
There are two main chunks of work involved. The first is to get the UCS chassis up and running and reinstall our virtualized Aruba Master's and Controllers. That won't be disruptive to our current production WiFi because everything can run off the consumer Google WiFi router while we're doing the work, so we can wing that a little bit. I'll be covering that setup in a separate blog post (with videos).
The second part is to get rid of the Google WiFi and replace it with Aruba system WiFi and the pfSense Router.
1) Test the WiFi network with an SSID different from the production one. The current production network on the Google WiFi mesh system is called seagull (and seagull-guest), and the test SSID on the aruba is siegelgrouplabs. This is all tested and I'm authoring this post over the Aruba network now.
2) Set up the pfSense router with a different IP address in the same subnet that's under the Google WiFi router (192.168.86/24), and get the 5xGE LAG working. This is also done and I will cover the setup of the Aruba S2500 switch and pfSense router configuration in a separate article.
3) Set up the WAN port on the pfSense router to get an IP address via DHCP once it's plugged into the cable modem.
I haven't tested this yet, but I believe this is going to get it done. Nothing tricky is going on with IPv4 DHCP, as we just want to get an IP address and let it set the default gateway. Then NAT should do its thing and all should be good. On the IPv6 DHCPv6 side, we want to request a prefix larger than a /64, and as I pointed out in this previous post, we know that Comcast will (should?) assign a prefix up to a /60, which has 4 bits, or 16 /64 subnets in it. We also know that our prefix delegation is going to change from what it is now (2601:281:8300:ae/64), so we know that IPv6 will go down for a little while during this maintenance window and we'll have to make some adjustments to our config once we know what addresses we get assigned.
Next, let's check the firewall. I know that I'll need to set up some new rules once we have re-IP'd the internal network with IPv6, but I don't automatically want the whole network exposed. It would appear that Netgate (the makers of pfsense) have an implicit default deny rule in place for everything coming in on the WAN, so that's good, but I am going to add one anyway.
I've also went ahead and made some new NAT port-map rules for DNS and HTTP/HTTPS to be translated to the internal IPv4 addresses of those servers, so as soon as the WAN interface is up, those should start working.
Okay, so I think everything that can be done in advance is done.
Once we move the cables around to match our end-state diagram and remove power from the Google router, we'll have to make the following changes:
1) Since our DHCP router will disappear, I will configure a static IP on the device that I'll be doing the maintenance work from. We'll use 192.168.86.70/24.
2) Change the IPv4 address on the LAN interface to 192.168.86.1
3) Then enable dhcpd on the LAN interface so our clients get IPv4 addresses. We'll do that here under services -> DHCP Server -> LAN. I've pre-configured as much as I can, so I should just have to click the enable button.
4) Now we have our first major testing step. We'll go to the desktop computer, which was both a wired connection as well as a WiFi connection on siegelgrouplabs and see if we are getting IP addresses assigned from the new DHCP server. If yes, then test the Internet. If everything is good, move on to the next step. If not, troubleshoot. If the troubleshooting fails, and we run out of time, implement our backout plan (bottom of this post).
5) Now it's time to fire up the SSID 'seagull'. Go to our mobility master (vmm.siegelgrouplabs.net) and create a new WLAN like this:
6) Another test. In the controller, verify that the new clients are showing up on the new WLAN that we created and that they are getting IP addresses. If good, move on. If not, troubleshoot. By now, we should have a fully functioning IPv4 network and IPv6 isn't mandatory to get operational before we conclude the maintenance window, so we can do the remaining steps at our leisure. So let's move on and tackle IPv6.
7) This post over at pfSense says that we should configure the local LAN interface to 'track interface" for IPv6, so we've made that change from the previous configuration. I'm going to see what happens when everything gets re-cabled, but I would prefer to hard-code the static address so I may change it back.
We should be able to find out what we got from Comcast by looking at Status -> DHCPv6 leases. Verify that we got a /60 delegation. Choose the first /64 in the delegation and assign a new static route to the LAN interface of xxxx:xxxx:xxxx:xxxx::1/64. If we didn't, we might have to play with some of the advanced configuration options to make sure we're sending the right option code to Comcast to get the /60.
8) Then we want to configure the interface in such a way as the clients on the network will assign their own IPv6 addresses via SLAAC, but know that this devices is the gateway on the network. From this informative post at pfsense, there are a lot of options for us to look at. We can run a full DHCPv6 server and assign addresses out of a range we create, or we can just advertise ourselves as a router and let SLAAC do it's thing if we set "unmanaged." There's another option that looks interesting as well, called stateless DHCP which will provide DNS and NTP information via DHCP, but still allow clients to configure their own addresses with SLAAC. When the time comes, we'll configure this option and see how it works.
9) Once that's in place, we'll go to our servers (connecting over v4, obviously, and configure new static IPv6 addresses in them. We need to do that for the CentOS 8.1 kvm hypervisor (seagull), the vmware hypervior on the UCS platform, and all of the virtual machines that reside on those hypervisor's: dns (FreeBSD 12.1), lab1 (Ubuntu 16.04), vmm1, vmm2, vmc1, and vmc2 (ArubaOS).
10) add IPv6 firewall rules to allow DNS and HTTP/HTTPS to the new IPv6 addresses we configured on the DNS and lab1 servers.
All done! Hopefully all was successful in our plan and we can do some minor clean-up like packing up the Google WiFi routers and changing our management device from a static address back to a DHCP assigned IP, but if not, we need a back-out plan:
1) remove 'seagull' WLAN from controller (if applicable)
2) remove DHCPv4 server from pfsense router LAN interface
3) renumber the LAN interface of the pfsense router back to 192.168.86.42
4) unplug pfsense router WAN port from cable modem
5) plug in google wifi router and connect to the cable modem (the lan interface should still be connected to the switch
And that's it! I'm looking forward to doing the work and reviewing the results with you once it's completed.