2024-08-24

HomeLab Mk.3 - Project Closeout

From a project methodology-standpoint, I'm missing some udates since the last post, but this is because I had since entered a redundancy, had immediate funding as a result, not to mention, limitted time to kick-off, execute and deploy before securing new employment.

The whole project is now complete with a 4RU AMD Ryzen-based custom-built server runnig Debian GNU/Linux.

Some of the improvemnts that have been made so far are as follows (in no particular order);

  1. Employed cryptsetup on top of software RAID
  2. Purchased and installed the 4RU system into an 18RU rack
  3. Installed Cockpit for easier host/hypervisor management
  4. Migrated the VMs from the previous HomeLab hypervisor to the new one
  5. Built a functioning eve-ng instance as a VM using nested virtualisation for network moddeling
One key compromise, was that I decided to reduce costs with memory so the hypervisor host is outfited with 64Gb instead of the maximum 192Gb of RAM. This was due to the higher than expected motherboard cost not to mention my requirements are farily low at this stage so the cost of that sort of outlay isn't justified.

In addition to the above, I've also embarked on a more secured and virtualised infrastructure by using OPNSense for PROD, DEV, (NET)LAB and DMZ networks which pretty much just stiches together and firewalls multiple isolated virtual networks inside of libvirt and peers with the multi-layer switch over a P2P L3 interface via a dot1q trunk while also advertising a summary route and accepts a default route only from upstream.

I think its a failry elegant design given my constraints and requirements but more importantly, it is a much more manageble setup now which reduces some technical debt for me. Now theres very few improvements to make even in the next iteration of the HomeLab in future, which will mostly be a hardware refresh - That and re-racking everything since the racks mounting rails needs adjusting to accomidate the 4RU server depth which was unfortunately not able to be done in time.

While I would love to share the overall design itself, it unfortunately has far too much information that is now considered somewhat confidential, but those who I trust and those who know me are always welcome to take a read (preferably onscreen) as I'm not in a position to re-write it for public consumption.

Debugging Cisco Access Lists

I want to share something specific I learned outside the official Cisco curriculum.

Despite the fact that I've done some traffic seperation for untrusted devices, there's still, unfortunately some that need to be on my internal network for now (Google-based devices like a Google TV-based TV and an old Google Home - Nest products don't interest me) so I decided to do something about this to restrict vertical traffic and potential attacks from old, unsupported or not-so-trusted hosts.

While I can send all the L2 traffic to a virtual firewall or my internet firewall appliance, that in my opinion, is a sub-optimal solution.

All I have to work with is an old unsupported Cisco IOS v15 (Classic) Multilayer switch.

I thought, this will be pretty easy. Just allow host services like DHCP/netboot, intra-VLAN traffic etc., block RFC1918 and allow everything else. Ez Pz. Except netboot to my netboot.xyz server didn't work and I couldn't work out why.

ip access-list extended RESTRICTED_ACCESS
 remark NETWORK_SERVICES
 permit udp any eq bootpc any eq bootps
 permit udp any any eq domain
 remark ALLOW_PING
 permit icmp any any echo
 permit icmp any any echo-reply
 remark ALLOW_PXE_SERVER
 permit udp any host 192.168.56.3 eq tftp
 permit tcp any host 192.168.56.3 eq www
 remark PERMIT_INTRA-VLAN
 permit ip 192.168.0.0 0.0.0.255 192.168.0.0 0.0.0.255 log
 remark DENY_RFC1918
 deny   ip any 10.0.0.0 0.255.255.255
 deny   ip any 172.16.0.0 0.15.255.255
 deny   ip any 192.168.0.0 0.0.255.255
 remark ALLOW_EVERYTHING_ELSE
 permit ip any any log

I needed some visibility on the ports and protocols like a firewall log... Cisco conditional debugging to the rescue!

The specific Cisco debug I used was `debug ip packet detail`

Unfortunately, the detail was overwhelming and showed far too much information for any human to interpret and nearly brought down the switch, so I had to contrain or filter the output with a debug condition. which is as follows:

`debug condition ip 192.168.0.4`

This produced the information I required and allowed me to pinpoint the missing port and protocol required!

21w3d: IP: s=192.168.0.1 (local), d=192.168.0.4 (Vlan666), len 56, sending

21w3d:     ICMP type=3, code=13
21w3d: IP: s=192.168.0.1 (local), d=192.168.0.4 (Vlan666), len 56, output feature
21w3d:     ICMP type=3, code=13, Check hwidb(88), rtype 1, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE
21w3d: IP: s=192.168.0.1 (local), d=192.168.0.4 (Vlan666), len 56, sending full packet
21w3d:     ICMP type=3, code=13pak 599DB6C consumed in input feature , packet consumed, Access List(31), rtype 0, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE
21w3d: IP: s=192.168.0.4 (Vlan666), d=192.168.56.3, len 32, access denied
21w3d:     UDP src=62557, dst=30002
21w3d: FIBipv4-packet-proc: route packet from Vlan666 src 192.168.0.4 dst 192.168.56.3
21w3d: FIBfwd-proc: packet routed by adj to Vlan56 192.168.56.3
21w3d: FIBipv4-packet-proc: packet routing succeeded
21w3d: IP: s=192.168.0.1 (local), d=192.168.0.4, len 56, local feature
21w3d:     ICMP type=3, code=13, CASA(4), rtype 0, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE
21w3d: IP: s=192.168.0.1 (local), d=192.168.0.4, len 56, local feature

As you can see in the above output, UDP port 3002 was blocked (due to the implicit deny any rule), so adding that in before the deny RFC1918 entry resolved this for me. Happy days.

So here's the final ACL that worked a treat.

ip access-list extended RESTRICTED_ACCESS
 remark NETWORK_SERVICES
 permit udp any eq bootpc any eq bootps
 permit udp any any eq domain
 remark ALLOW_PING
 permit icmp any any echo
 permit icmp any any echo-reply
 remark ALLOW_PXE_SERVER
 permit udp any host 192.168.56.3 eq tftp
 permit udp any host 192.168.56.3 eq 30002
 permit tcp any host 192.168.56.3 eq www
 remark PERMIT_INTRA-VLAN
 permit ip 192.168.0.0 0.0.0.255 192.168.0.0 0.0.0.255 log
 remark DENY_RFC1918
 deny   ip any 10.0.0.0 0.255.255.255
 deny   ip any 172.16.0.0 0.15.255.255
 deny   ip any 192.168.0.0 0.0.255.255
 remark ALLOW_EVERYTHING_ELSE
 permit ip any any log

Yes, I know I can (and probably will) tighten it some more and make DNS more specific (or remove it entirely to enforce quad9 DNS and prevent poisoning), but I wanted an ACL that is as simple as possible so I can easlily model and apply to other interfaces and SVI's which I might add is being done and so far it is working well.

2023-11-03

Git

I'm not sure why git is called 'the stupid content tracker' (according to the man page that is), but I've discovered that - despite many tutorials overcomplicating the setup by adding the creation of a git user account and SSH key-based authentication - it is stupidly trivial to set up a remote repository.


By stupid I mean that git does not reference any of the object files in a way that you would expect or as you are used to working with them in your locally checked-out repository or IDE.

This method of file storage threw me off and caught me off guard but I eventually managed to get the initial comit added to the remote.

I also learned that git appears to work locally, meaning you can clone on the same system that's hosting the repository using directory paths without a transport protocol!

I'm now armed with information on how private git repo hosting works, which is especially useful for interim SCM or when private hosting is required for whatever reason.

2023-10-29

Libvirt virtio Networking

Devling deeper into Libvirt, has my trying to find ways to improve the previous build through lab testing.

The latest testing is virtio networking with an isolated network in order to mitigate libvirt not being able to snapshot guests unless the volumes they use are all qcow2.

With this limitation in mind, I employed NFS to a common datastore for guests that require access to the datastore, however the path taken in the current configuration is suboptimal and takes the path of the hosts management interface.

The virtio model provides much better throughput while at the same time allowing guests to communicate with the host, but not outside the host.

In my testing with a virtio model I was able to achieve over 10Gbps with no tuning whatsoever as follows;

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  16.3 GBytes  14.0 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  16.3 GBytes  14.0 Gbits/sec                  receiver

The current path which uses the suboptimal path is not only limited to the hardware NIC/switch, but we can also observe quite a lot of retries indicating TCP retransmits are likely also occuring which would be introducing latency with NFS.

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec  315             sender
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec                  receiver
I now have yet amother defined improvement concept ready for implementation on the new server build.

2023-10-26

Libvirt pool storage management

I was really looking forward to improving on my previous homelab by building a new server, defining succinct and well thought out pools that leverages and manages LVM, mounts etc in order to abstract away some of the sysadmin tasks.


In my limit testing, I've found that libvirt storage management is flexible yet limited insofar as the fact that I could have potentially done away with the complexities of mdadm, manual definition of a PV and/or VG and LVs, formatting, creating mountpoints and then adding the mounted filesystem(s) to libvirt or let libvirt mount it for me, but since I'm using crypto in order to mitigate potential data breaches during hard drive disposal, it means that I can't leverage RAID functionality within LVM itself as I require a simplified encryption with a single key on a single volume or in my case, an md array.

If I didn't require crypto, I may have been able to skip the manual mdadm RAID configuration and carved out nicer storage management, however this is unfortunately not the case.

It seems as though you can't easily carve up an LV as if it where a PV from libvirt's perspective when defining a pool (that is without the headaches that comes with partitioning LVs or overcomplicating the solution with pools defined from libvirt volumes). Libvirt pools also seem flat in nature and I can't figure out how to define a volume under a directory without defining seperate volumes (such as dir-based) to overcome this.

So for now my solution is to handle most of the storage manually with one single mount point based on a single md and crypto device along with a single LVM PV, VG and LV with dir-based pools defined to manage volumes.

It doesn't seem ideal nor efficient, but right now I need a solution to move the project forward to completion.

I will further test and refine (and possibly even automate) the solution on the new hypervisor host at some point. Who knows, there may be better tools or newly discovered ways of doing this in the future.

The next step in the overall solution is to test a virtiofs shared for and/or virtio high-speed (10Gbps) isolated SAN solution.

2023-09-28

Regular Expressions - Examples and Use Cases

Background

This post should serve as a repository of selected use-case reqular expressions, sorted by utility/name. It is predominantly centered around Linux and user-space utilies (with a certain amount of Cisco IOS-based examples as well in its heading and subheadings). It will hopefully be continually updated as I intent to keep adding to it as I see buld more regular expression use cases.

MDADM

The following was useful to gather mdadm information when I had an issue with a missing block device in a RAID array (which turned out to be SATA cables that where accidently swapped when performing maintenance/cleaning causing device unexpected device renaming which ultimately bumped a device off the array - sdb in my case). The examples here uses simple patterns to show the linux block devices in an array and looking for log entries

user@host:~$ sudo mdadm --detail /dev/md0 | egrep '\/dev\/sd?'
       3       8       64        0      active sync   /dev/sde
       1       8       32        1      active sync   /dev/sdc
       4       8       48        2      active sync   /dev/sdd

user@host:~$ cat /etc/mdadm/mdadm.conf | egrep '\/dev\/sd?'
DEVICE /dev/sdb /dev/sdc /dev/sdd /dev/sde
user@host:~$
user@host:~$ sudo dmesg | grep md0
[    2.701684] md/raid:md0: device sdc operational as raid disk 1
[    2.701686] md/raid:md0: device sdd operational as raid disk 2
[    2.701687] md/raid:md0: device sde operational as raid disk 0
[    2.702549] md/raid:md0: raid level 5 active with 3 out of 3 devices, algorithm 2
[    2.702574] md0: detected capacity change from 0 to 8001304920064
user@host:~$ 

HDPARM

For similar reasons to the MDADM, I initially suspected that a disk was faulty and wanted to extract the serial numbers of each for warranty lookup. This is how I acheived that outcome (sans actual serial numbers).

user@host:~$ sudo hdparm -I /dev/sd? | egrep '(\/dev\/sd?|Serial\ Number)'
/dev/sda:
        Serial Number:      *** REDACTED ***
/dev/sdb:
        Serial Number:      *** REDACTED ***
/dev/sdc:
        Serial Number:      *** REDACTED ***
/dev/sdd:
        Serial Number:      *** REDACTED ***
/dev/sde:
        Serial Number:      *** REDACTED ***
user@host:~$

SCREEN

So, sometimes a screen is killed or exited (often accidently) and rather than opening up the local user screenrc file, looking for the screen/entry/command and then executing the screen command manually to restore it, with the help of grep, I simply execute it dirrectly with bash substitution. Here are a couple of examples:

$(grep virsh ~/.screenrc)
$(grep /var/log/messages ~/.screenrc)
$(grep virt_snapshot ~/.screenrc)

LVM

At some point, we might need to review LVM volumes to see where we can scale and resize etc. The following allowed me to quickly see everything at a glance in order to formulate a plan for resizing.

user@host:~$ sudo lvdisplay | egrep "LV (Name|Size)"

[sudo] password for user:
  LV Name                video
  LV Size                <4.02 TiB
  LV Name                audio
  LV Size                750.00 GiB
  LV Name                hdimg
  LV Size                <2.51 TiB
  LV Name                swap
  LV Size                16.00 GiB
  LV Name                var-tmp
  LV Size                8.00 GiB
user@host:~$

Cisco IOS

A collection of various Cisco IOS commands and the very limited IOS regular expression engine on an IOS device (or IOS-XE's IOSD).

show version

Show a consolidated view of uptime, firmware and software version & reason for reload (minus all the Cisco copyright and releng information):

SWITCH#show ver | incl Cisco IOS Software|(ROM|BOOTLDR)|uptime|System (returned|restarted|image)
Cisco IOS Software, C3750 Software (C3750-IPSERVICESK9-M), Version 15.0(2)SE11, RELEASE SOFTWARE (fc3)
ROM: Bootstrap program is C3750 boot loader
BOOTLDR: C3750 Boot Loader (C3750-HBOOT-M) Version 12.2(44)SE5, RELEASE SOFTWARE (fc1)
SWITCH uptime is 1 week, 3 days, 22 hours, 29 minutes
System returned to ROM by power-on
System restarted at 12:28:16 WST Sun Sep 17 2023
System image file is "flash:/c3750-ipservicesk9-mz.150-2.SE11.bin"
SWITCH#

show etherchannel

Show portchannel member state times - This is particularly useful in correlating events for possible cause without having to rely on syslog:

SWITCH#show etherchannel 1 detail | incl ^(Port: |Age of the port)
Port: Gi1/0/15
Age of the port in the current state: 10d:22h:41m:32s
Port: Gi1/0/16
Age of the port in the current state: 10d:22h:41m:31s
Port: Gi1/0/17
Age of the port in the current state: 10d:22h:41m:30s
Port: Gi1/0/18
Age of the port in the current state: 10d:22h:41m:30s
SWITCH#

2023-09-21

Cisco IOS IPv6 observations

I wanted to delve deeper into some of the intricacies of IPv6, specifically Neighbour discovery and Directly attached Static routes as well as OSPFv3 using the legacy configuration. I recently discovered two odd Cisco behaviours with these following topics, possibly related to virtual lab devices, so not tested on real equipment.

  1. IPv6 Directly Attached Static Routes
  2. OSPF IPv6

IPv6 Directly Attached Static Route

This doesn't seem to work as described (at least not in a lab). Only fully-specified or a net-hop static route works. This could be due to either;
  • No MAC address to IPv6 neighbor binding - since IPv6 doesn't use the concept of ARP like IPv4 does, it instead relies on Neighbor discovery, which doesn't seem to work - more testing/research is required.
  • Limitation with the way Layer 2 forwarding is handled in an Emulated/Virtual environment.

OSPF IPv6

The protocol, according to some old Cisco Press article I dug up[1]. It appears to "leverage" the OSPFv3 engine, however it can be configured/started using the legacy IPv6 OPSF configuration similar to IPv4 as per the following:

ipv6 ospf process-id

Now, if there's an existing, legacy OSPF IPv4 configuration using the same process-id, it appears to silently fail when entering the configuration (except perhaps if you enable debugging). No neighbours will establish at all, despite documetation claiming that it migrates it to the OSPFv3 configuration (it most likely does this internally though as I observed that the configuration stays pretty much as you entered it in both running and start-up configuration).

The lesson I learned here, is to identify if multiple OSPF address-families share the same process in legacy configuration mode and either;

  1. Update all your configuration so that one of the "confliting" addresss families is unique or
  2. You migrate the conflicting processes/address families to the new OSPFv3 configuration as a consolidation of address families under the one process.
Further to the above, when removing an OSPF process with no ipv6 ospf process-id, any interface-specific IPv6 process/area configuration is also removewd without warning.

 
Google+