Configuring a Proxmox VE 2.x cluster running over an OpenVPN intranet Part 1
[Note we have been running this setup for about three months now, and it has easily exceeded our expectations. The ease of instantiating a temporary throwaway virgin Linux install for testing in software development alone easily beats our old clunky VirtualBox setup - and upgrading server components (as they all live in their own containers now) is vastly easier than before. We have exceeded 99.99% uptime for the first time, despite our mail "server" getting confused for a bit, but we can now reboot it separately from the web service. The only negative is lack of complete IPv6 support - it works for HTTP but nothing else until the Proxmox kernel gains IPv6 NAT support]
Here at ned Productions Ltd we've been looking at how to virtualise our existing server config in order to decrease our maintenance overheads. Currently, we manually configure each server on an ad hoc basis using whatever the latest Ubuntu server LTS is. We run three public servers, with the two DNS name servers on two geographically disparate VPSs and currently a single fully dedicated server in Amsterdam. Internally, we have three separate servers each doing various things. We use rsync to both keep backups of the public servers and to transport public instances of server applications onto one of the internal servers for the purposes of testing and upgrades.
Now this is pretty secure, but it's also quite manual. We have a fairly lousy ADSL connection going out especially, so it takes forever to upload even with rsync, even after making duplicates on the public server in order to try and reduce the data transferred. The backup is also run manually - we can't afford the local disc space to have it be automatic as right now there's no deduplicated storage here.
As much as it's manual, and tedious, it hasn't been enough of a pain for us to do much about it in the past years. However, we're finding ourselves rolling out more and more microserver applications, each of which currently requires a VPS. That's okay, VPSs are cheap, but having to reconfigure each one again and again is beginning to grate. Also, backup must be configured for each manually. Internally, backup consists of me copying everything to an external USB hard drive. Given that this isn't the 1990s, and we do have technologies like ZFS, all that could be automated and because of dedupping many more backups could be kept securely.
Anyway, we had a play with Ubuntu Cloud Infrastructure in 12.04 LTS Alpha in order to make an Amazon EC2 compatible cloud infrastructure. All I can say is wow, it can't even install on a single machine without KVM virtualisation. That just sucks. What I want is to start off on a single server with the ability to expand from there as needed. In fairness, EC2 and UCI isn't about this - they're about dynamic expansion i.e. virtualisation solutions which self-expand. I want static expansion where I expand the thing manually without it causing me too much pain because I can just fire more "servers" onto the one single server (or two). And for static, not dynamic, cloud solutions I haven't found anything better than Proxmox VE which is a pretty web GUI around Linux OpenVZ and KVM virtualisation but - and here's the key - with clustering support so you can manage multiple Proxmox servers from the single GUI, and can migrate VMs seamlessly between nodes and/or have VMs keep in sync with one another across nodes.
Proxmox VE 2.x clustering support
There's plenty more at http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster but in essence, Proxmox 2.x clusters are a pool of Proxmox installs which use IP multicast to communicate with one another, so you can have VM images auto-sync between nodes and you can have HA (high availability) switch to a backup node should a primary node stop responding. Now that's real interesting, because to add another Proxmox node you simply fire one up inside the same IP subnet and voila, more computing power and more backup nodes!
However, it raises an interesting problem. How do you get a local subnet cluster running on commodity servers, each with their own IP address, perhaps even living in different geographical locations? Most commodity server providers don't provide BGP access, so configuring multicast is essentially impossible. Similarly, how do we here at ned Productions Ltd have the local in-house server which sits behind a slow, NATed IPv4 only ADSL connection running within the same cluster as our servers out on the public internet? That way we can have public VM images auto-replicate themselves to local backups, and if we want to test a new feature or debug something or test a software upgrade we can simply delta duplicate a local copy of a public backup (this copies no data, so it's real fast) into a test VM and experiment. No more waiting hours for rsync! You just get on with it, and when you're happy you sync with local which in turn will replicate to public. No more waiting around for manual commands!
So yes, virtualising our server setup should pay huge dividends in improved productivity by getting rid of much of our manual admin work. I figure that our solution will be of great interest to others, never mind it being useful for ourselves to document how we achieved this, so here's how you use OpenVPN to form a cross-site local subnet.
- Part One
- 1. Prepare Proxmox VE 2.x
- 2. Configure the OpenVPN bridge on the public server
- 3. Move the Proxmox on the OpenVPN server over to running exclusively on the VPN private subnet
- 4. Adding additional Proxmox nodes
- Part Two
- Part Three
- 7. Setting up self-replicating node storage using DRBD
- 8. Configuring backing storage for the replicated data
- 9. Setting up DRBD
- 10. Adding the DRBD replicated storage to LVM so Proxmox can use it
- 11. Making the replicated storage available for OpenVZ as well
- 12. Accessing replicated content from one of the peer nodes
- 13. Encrypted off-site backup
The default virgin Proxmox install has a few annoying things which need fixing up. Before anything else, upgrade to latest:
aptitude update && aptitude full-upgrade
One might want to fiddle with /etc/hostname, /etc/hosts and /etc/resolv.conf and/or IPv6 config and/or timezone via 'dpkg-reconfigure tzdata' at this stage, then reboot.
Next step is if you're running on SSDs, you may wish to convert the ext3 filesystems to ext4 so you can mount using the discard option to enable TRIM.
Next one ought to add a non-root login account, upload SSH private keys to that login and configure sshd to refuse non-SSH key logins. Annoyingly, sudo is missing on Proxmox but it's easy to fix:
aptitude install sudo adduser ned adduser ned sudo su ned sudo -s exit cd mkdir .ssh nano .ssh/authorized_keys PASTE chmod -R og-rwx .ssh exit
Test that you can definitely log in via SSH key. Make sure you can sudo into root as we're about to disable root network login! Then change PasswordAuthentication in /etc/ssh/sshd_config to no and restart sshd. Make sure you can no longer log in as root from the network.
Next problem: Proxmox 2.x due to a bug in Debian squeeze currently borks IPv6 configuration over a routed bridge, so you're going to need to force it like so if you want IPv6 connectivity by editing /etc/network/interfaces like this:
auto vmbr0 iface vmbr0 inet static address xxx.xxx.xxx.xxx netmask 255.255.255.0 network xxx.xxx.xxx.0 broadcast xxx.xxx.xxx.255 gateway xxx.xxx.xxx.254 bridge_ports eth0 bridge_stp off bridge_fd 0 # Something is borked with IPv6 bridging, so manually ... post-up ip -6 ro add 2001:xxxx:x:xxff::/64 dev vmbr0 post-up ip -6 ro add default via 2001:xxxx:x:xxff:ff:ff:ff:ff dev vmbr0 pre-down ip -6 ro del default via 2001:xxxx:x:xxff:ff:ff:ff:ff dev vmbr0 pre-down ip -6 ro del 2001:xxxx:x:xxff::/64 dev vmbr0 iface vmbr0 inet6 static address 2001:xxxx:x:xxxx::1 netmask 64 gateway 2001:xxxx:x:xxff:ff:ff:ff:ff bridge_ports eth0 bridge_stp off bridge_fd 0
What's supposed to happen is that the gateway in the inet6 static section is supposed to configure a default route for IPv6 via the gateway specified (which in IPv6 is always your IPv6 address with the last /64 set to 00FF e.g. 1111:2222:3333:4444:5555:6666:7777:8888 would go to 1111:2222:3333:44ff:ff:ff:ff:ff) but it's ignored, so we simply use a post-up and pre-down to do it manually.
Last problem: Proxmox is rather lax when opening services to the public gaze:
root@europe3:~# netstat -lp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 localhost.localdomai:85 *:* LISTEN 1501/pvedaemon work tcp 0 0 localhost.locald:domain *:* LISTEN 1238/named tcp 0 0 *:ssh *:* LISTEN 3262/sshd tcp 0 0 localhost.localdom:smtp *:* LISTEN 1349/master tcp 0 0 localhost.localdoma:953 *:* LISTEN 1238/named tcp 0 0 *:52422 *:* LISTEN 973/rpc.statd tcp 0 0 *:sunrpc *:* LISTEN 961/portmap tcp6 0 0 [::]:www [::]:* LISTEN 1652/apache2 tcp6 0 0 ip6-localhost:domain [::]:* LISTEN 1238/named tcp6 0 0 [::]:ssh [::]:* LISTEN 3262/sshd tcp6 0 0 ip6-localhost:953 [::]:* LISTEN 1238/named tcp6 0 0 [::]:https [::]:* LISTEN 1652/apache2 tcp6 0 0 [::]:8006 [::]:* LISTEN 1652/apache2 udp 0 0 *:725 *:* 973/rpc.statd udp 0 0 *:sunrpc *:* 961/portmap udp 0 0 europe3.nedproducti:ntp *:* 1554/ntpd udp 0 0 localhost.localdoma:ntp *:* 1554/ntpd udp 0 0 *:ntp *:* 1554/ntpd udp 0 0 *:33169 *:* 973/rpc.statd udp 0 0 localhost.locald:domain *:* 1238/named udp6 0 0 fe80::143d:2ff:fea3:ntp [::]:* 1554/ntpd udp6 0 0 fe80::143d:2ff:fea3:ntp [::]:* 1554/ntpd udp6 0 0 fe80::21c:c0ff:fe3a:ntp [::]:* 1554/ntpd udp6 0 0 fe80::21c:c0ff:fe3a:ntp [::]:* 1554/ntpd udp6 0 0 ip6-localhost:ntp [::]:* 1554/ntpd udp6 0 0 fe80::1%19620736:ntp [::]:* 1554/ntpd udp6 0 0 [::]:ntp [::]:* 1554/ntpd udp6 0 0 ip6-localhost:domain [::]:* 1238/named
As you can see, rpc.statd, portmap and ntpd are all on the catchall network interface. This isn't ideal, and I haven't forgotten that time when rpc.statd had a remote unprivileged root escalation bug some years ago, so we need to get them pinned to localhost and away from the public gaze instead.
nano /etc/default/ntp NTPD_OPTS='-g --interface=127.0.0.1' /etc/init.d/ntp restart nano /etc/default/portmap OPTIONS="-i 127.0.0.1" /etc/init.d/portmap restart nano /etc/default/nfs-common STATDOPTS="--name 127.0.0.1" /etc/init.d/nfs-common restart
Note: Binding ntpd to localhost appears to prevent it from contacting any time servers. You may wish to leave it alone and rely on the firewall (below) to prevent access to it.
After that, it looks like this:
root@europe3:~# netstat -lp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 localhost.localdomai:85 *:* LISTEN 1501/pvedaemon work tcp 0 0 localhost.locald:domain *:* LISTEN 1238/named tcp 0 0 *:ssh *:* LISTEN 3262/sshd tcp 0 0 localhost.localdom:smtp *:* LISTEN 1349/master tcp 0 0 localhost.localdoma:953 *:* LISTEN 1238/named tcp 0 0 *:59206 *:* LISTEN 3607/rpc.statd tcp 0 0 localhost.locald:sunrpc *:* LISTEN 3576/portmap tcp6 0 0 [::]:www [::]:* LISTEN 1652/apache2 tcp6 0 0 ip6-localhost:domain [::]:* LISTEN 1238/named tcp6 0 0 [::]:ssh [::]:* LISTEN 3262/sshd tcp6 0 0 ip6-localhost:953 [::]:* LISTEN 1238/named tcp6 0 0 [::]:https [::]:* LISTEN 1652/apache2 tcp6 0 0 [::]:8006 [::]:* LISTEN 1652/apache2 udp 0 0 localhost.locald:sunrpc *:* 3576/portmap udp 0 0 localhost.localdoma:ntp *:* 3554/ntpd udp 0 0 *:ntp *:* 3554/ntpd udp 0 0 *:39330 *:* 3607/rpc.statd udp 0 0 *:815 *:* 3607/rpc.statd udp 0 0 localhost.locald:domain *:* 1238/named udp6 0 0 ip6-localhost:ntp [::]:* 3554/ntpd udp6 0 0 [::]:ntp [::]:* 3554/ntpd udp6 0 0 ip6-localhost:domain [::]:* 1238/named
Yes, rpc.statd is known to ignore instructions to not bind to all interfaces. And yes, ntpd is better than before by not binding to each interface, but is still listening on a UDP wildcard. So, we have no choice sadly, we're going to have to install a firewall. I like ufw from Ubuntu as it's lightweight and easily configurable. Due to its success It's now part of Debian actually, but that's a Debian too new for Proxmox 2.0 so install from source:
wget https://launchpad.net/ufw/0.30/0.30.1/+download/ufw-0.30.1.tar.gz tar zxf ufw-0.30.1.tar.gz cd ufw-0.30.1 sudo aptitude install build-essential sudo python setup.py install sudo nano /etc/default/ufw IPV6=yes sudo cp doc/initscript.example /etc/init.d/ufw sudo chmod +x /etc/init.d/ufw sudo update-rc.d ufw defaults sudo ufw allow to any port ssh sudo ufw enable root@europe3:/home/ned/ufw-0.30.1# ufw status Status: active To Action From -- ------ ---- 22 ALLOW Anywhere 22 ALLOW Anywhere (v6)
As you can see, it should be that traffic from any to port 22 is allowed for both IPv4 and IPv6. Fire up a duplicate SSH session to make sure you can still log in. If not, use 'ufw disable' and figure out the problem. ufw nicely stays disabled until you tell it otherwise, even across reboots. Also, make sure you CAN'T access the web GUI as after all only port 22 is allowed.
Next obvious step: while you're configuring the server, keep everyone else out:
sudo ufw allow from <your ip addr> to any root@europe3:/home/ned/ufw-0.30.1# ufw status Status: active To Action From -- ------ ---- 22 ALLOW Anywhere Anywhere ALLOW xxx.xxx.xxx.xxx 22 ALLOW Anywhere (v6)
That solves people attacking local services and your web GUI login once and for all :) It's a shame the default out of the box Proxmox 2.0 config isn't a little more security aware actually. However, thanks to our OpenVPN setup coming, we can change the web GUI to only run inside the VPN on a VPN only IP. This will permanently keep the web GUI - and indeed anything else sensitive - away from public view. And, more importantly, it lets us use port 80 on the public IP for web serving etc without the Proxmox GUI getting in the way!
You're going to have to choose one public server to act as the main OpenVPN server for the purposes of other things connecting to it. You can configure fail over backups easily with OpenVPN however, so if the primary goes down then the backup will kick in and all the clients will automatically switch over. That's outside the scope of this guide however.
I'm going to assume from now on that your OpenVPN server will also be a Proxmox node. It doesn't have to be this way - indeed, it's preferable security wise if it isn't this way. But servers cost money, so I'll assume it to be true. As you're running Proxmox, the rest of this guide will also assume a Debian environment.
So first things first, you're going to need an extra dummy network device on which to run your VPN and therefore your private cluster subnet as we want the cluster subnet completely disengaged from anything going near a public network. The reason why we choose to manually configure a separate dummy networking device is because later we're going to manually bind the Proxmox only services like the WebGUI to that dummy interface exclusively, so we need it around and working at boot rather than waiting for OpenVPN to instantiate it via dev tap.
echo "options dummy numdummies=2" > /etc/modprobe.d/local.conf
Proxmox comes with a single dummy network device that is already used by Proxmox, so we need another one. The above fixes this (after a reboot). You can check dummy1 exists via ls /proc/sys/net/ipv4/conf/.
Once dummy1 definitely exists (if it doesn't, changing the below will cause your server to not boot which likely means a technician intervention), edit /etc/network/interfaces as follows:
# This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). # The loopback network interface auto lo iface lo inet loopback # for Routing auto vmbr1 iface vmbr1 inet manual post-up /etc/pve/kvm-networking.sh bridge_ports dummy0 bridge_stp off bridge_fd 0 # vmbr0: Bridging. Make sure to use only MAC adresses that were assigned to you. auto vmbr0 iface vmbr0 inet static address xxx.xxx.xxx.xxx netmask 255.255.255.0 network xxx.xxx.xxx.0 broadcast xxx.xxx.xxx.255 gateway xxx.xxx.xxx.254 bridge_ports eth0 bridge_stp off bridge_fd 0 # Something is borked with IPv6 bridging, so manually ... post-up ip -6 ro add 2001:xxxx:x:xxff::/64 dev vmbr0 post-up ip -6 ro add default via 2001:xxxx:x:xxff:ff:ff:ff:ff dev vmbr0 pre-down ip -6 ro del default via 2001:xxxx:x:xxff:ff:ff:ff:ff dev vmbr0 pre-down ip -6 ro del 2001:xxxx:x:xxff::/64 dev vmbr0 iface vmbr0 inet6 static address 2001:xxxx:x:xxxx::1 netmask 64 gateway 2001:xxxx:x:xxff:ff:ff:ff:ff bridge_ports eth0 bridge_stp off bridge_fd 0 # for OpenVPN auto openvpnbr0 iface openvpnbr0 inet static address 10.xxx.xxx.1 netmask 255.255.255.0 network 10.xxx.xxx.0 broadcast 10.xxx.xxx.255 bridge_ports dummy1 bridge_stp off bridge_fd 0 post-up route add -net 184.108.40.206 netmask 240.0.0.0 dev openvpnbr0
Obviously the iface vmbr0 will be whatever Proxmox configured it to be, so that you ought to leave alone. The new bit is at the bottom starting with # for OpenVPN. Here you must replace 10.xxx.xxx.1 with something unique e.g. 10.123.321.1. Whatever you choose, make it consistent across iface openvpnbr0. The reason this is important to be unique is to prevent clashes with any other 10.x.x.x networks any machine which might access the VPN might ever encounter e.g. at offices, at the home or even in free wifi cafes. Whatever you choose, do a /etc/init.d/networking restart and if you're still connected and there are no fatal syntax errors, reboot just to make sure everything is hunky dory. If it isn't, your server obviously won't come up and it'll be manual intervention time :(
Note: Giving the bridge the name of 'openvpnbr0' means that Proxmox won't show it as an option for networking in the web UI. Personally, as I'll be running OpenVZ containers only, I wanted this. If you're running KVM then you'll want to name it something like 'vmbr99' so Proxmox does show it as an option for KVM networking. If you do this, make sure your OpenVPN cannot possibly leak packets from your KVM instances to your public NIC - most server hosters auto-disconnect any server trying to send packets with unapproved MAC addresses for security reasons. Thanks to Jennifer suggesting this in the comments section below.
It's worth doing an ifconfig and pinging the unique 10.xxx.xxx.1 you chose before moving on. The important thing is that it exists, it works and it can be bound to.
Next step is to install openvpn:
sudo aptitude install openvpn sudo mkdir /etc/openvpn/easy-rsa/ sudo cp -R /usr/share/doc/openvpn/examples/easy-rsa/2.0/* /etc/openvpn/easy-rsa/ cd /etc/openvpn/easy-rsa/ sudo nano vars export KEY_COUNTRY="IE" export KEY_PROVINCE="Cork" export KEY_CITY="Cork" export KEY_ORG="ned Productions Limited" export KEY_EMAIL="me@nospam"
Obviously choose your own values here for the certs. Now generate the VPN certs:
source ./vars ./clean-all ./build-dh ./pkitool --initca ./pkitool --server server cd keys openvpn --genkey --secret ta.key sudo cp server.crt server.key ca.crt dh1024.pem ta.key ../../ cd ../.. nano server.conf
Now you enter an OpenVPN server config somewhat like this:
mode server tls-server local 220.127.116.11 port 1194 proto tcp script-security 2 dev tap0 up "/etc/openvpn/up.sh" down "/etc/openvpn/down.sh" persist-key persist-tun duplicate-cn client-to-client ca ca.crt cert server.crt key server.key # This file should be kept secret dh dh1024.pem tls-auth ta.key 0 # This file is secret ifconfig-pool-persist ipp.txt server-bridge 10.78.65.1 255.255.255.0 10.78.65.100 10.78.65.110 max-clients 10 user nobody group nogroup keepalive 10 120 status openvpn-status.log verb 3 #log openvpn-log.log
Note the presence of duplicate-cn and client-to-client - this allows one set of client certs to be used by multiple clients (for this to work, TCP mode must be used, not UDP) and furthermore for those clients to be fully able to see one another. This is obviously relying on two helper scripts, up.sh and down.sh:
#!/bin/sh /sbin/ifconfig openvpnbr0 promisc /sbin/ifconfig tap0 up promisc /usr/sbin/brctl addif openvpnbr0 tap0
#!/bin/sh /usr/sbin/brctl delif openvpnbr0 tap0 /sbin/ifconfig tap0 down -promisc /sbin/ifconfig openvpnbr0 -promisc
A quick /etc/init.d/openvpn restart and it should be running. Now for the client config:
cd easy-rsa ./pkitool client cd keys tar jcf openvpnkeys.tar.bz2 ca.crt client.crt client.key ta.key mv openvpnkeys.tar.bz2 ~
You'll need to SCP this tar file down, extract the client keys into your OpenVPN config directory and probably rename them to something obvious. Once done, here's an .ovpn/.conf file for your client OpenVPN:
client dev tap proto tcp # The hostname/IP and port of the server. remote <your server ip> 1194 resolv-retry infinite nobind persist-key persist-tun ca ovh-ca.crt cert ovh-client.crt key ovh-client.key tls-auth ovh-ta.key 1 #comp-lzo verb 3
Once that's in place, fire up the local OpenVPN. Because you permitted anything from your IP to connect to anything on the server, this should succeed and you should be allocated the IP 10.xxx.xxx.100.
Check the bridge is working by pinging 10.xxx.xxx.1 from the client. Also check that the bridge is working the other direction by pinging 10.xxx.xxx.100 from the server. Both should ping successfully, and in about the same latency for obvious reasons.
Next step: try connecting in from TWO clients. Make sure you can ping 10.xxx.xxx.1 from both clients, and 10.xxx.xxx.100 and 10.xxx.xxx.101 from the server. Now here comes the acid test: try pinging 10.xxx.xxx.101 from 10.xxx.xxx.100 and vice versa. Both ought to work a treat.
ufw allow from any to any port openvpn
This allows anything access to your VPN. Obviously you can customise this if you're exceptionally paranoid.
Congratulations! Your OpenVPN based private subnet is configured and working!
After the complexity before, this part is relatively straightforward. One can guess how to move Proxmox over to run over the private VPN subnet:
nano /etc/hosts 10.xxx.xxx.1 <your hostname>.vpn.local <your hostname> nano /etc/default/ntp NTPD_OPTS='-g --interface=127.0.0.1 --interface=10.xxx.xxx.1' /etc/init.d/ntp restart nano /etc/default/portmap OPTIONS="-i 127.0.0.1 -i 10.xxx.xxx.1" /etc/init.d/portmap restart nano /etc/default/nfs-common STATDOPTS="--name 127.0.0.1 --name 10.xxx.xxx.1" /etc/init.d/nfs-common restart nano /etc/apache2/ports.conf Listen *:80 => Listen 10.xxx.xxx.1:80 Listen *:443 => Listen 10.xxx.xxx.1:443 nano /etc/apache2/sites-available/pve.conf Listen *:8006 => Listen 10.xxx.xxx.1:8006
If when restarting nfs-common you find that statd won't bind, use netstat -lp to make sure portmap has actually bound to both localhost AND 10.xxx.xxx.1. Believe it or not, even in this modern day many versions of portmap are incapable of more than one interface bind, so either it binds to one or all but nothing in between. If you're afflicted (as I am), return portmap to binding to everything.
Now obviously I'm being a bit aggressive here and moving the entire of Apache over to the private VPN. If you wanted Apache to also serve public pages, you wouldn't want to stop it binding to all interfaces. Instead you'd twiddle the <VirtualHost *:80> et al. in sites-available/pve-redirect.conf and sites-available/pve.conf to match only to 10.xxx.xxx.1 (and don't forget to fix <NameVirtualHost> if you do!). Personally speaking, I never use Apache except for legacy apps as nginx is much faster, uses a lot less memory and I suspect due to being vastly simpler is probably more secure. However, each to their own, and I do agree Apache is very easy to configure compared to alternatives!
So, after all that, you should be looking at something like this:
root@europe3:~# netstat -lp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 europe3.vpn.local:www *:* LISTEN 2991/apache2 tcp 0 0 localhost.localdomai:85 *:* LISTEN 1629/pvedaemon work tcp 0 0 localhost.locald:domain *:* LISTEN 1352/named tcp 0 0 *:ssh *:* LISTEN 1469/sshd tcp 0 0 *:59416 *:* LISTEN 4643/rpc.statd tcp 0 0 localhost.localdom:smtp *:* LISTEN 1517/master tcp 0 0 localhost.localdoma:953 *:* LISTEN 1352/named tcp 0 0 europe3.vpn.local:https *:* LISTEN 2991/apache2 tcp 0 0 europe3.vpn.local:8006 *:* LISTEN 2991/apache2 tcp 0 0 europe3.nedprod:openvpn *:* LISTEN 1410/openvpn tcp 0 0 *:sunrpc *:* LISTEN 4721/portmap tcp6 0 0 ip6-localhost:domain [::]:* LISTEN 1352/named tcp6 0 0 [::]:ssh [::]:* LISTEN 1469/sshd tcp6 0 0 ip6-localhost:953 [::]:* LISTEN 1352/named udp 0 0 *:1003 *:* 4643/rpc.statd udp 0 0 *:sunrpc *:* 4721/portmap udp 0 0 europe3.vpn.local:ntp *:* 4568/ntpd udp 0 0 localhost.localdoma:ntp *:* 4568/ntpd udp 0 0 *:ntp *:* 4568/ntpd udp 0 0 *:48047 *:* 4643/rpc.statd udp 0 0 localhost.locald:domain *:* 1352/named udp6 0 0 ip6-localhost:ntp [::]:* 4568/ntpd udp6 0 0 [::]:ntp [::]:* 4568/ntpd udp6 0 0 ip6-localhost:domain [::]:* 1352/named
Test it's working by trying to log into the Proxmox Web GUI. If you use your public IP, it won't work, but if you use 10.xxx.xxx.1 with the VPN active it also won't work. Why?
ufw allow from 10.xxx.xxx.0/24 to 10.xxx.xxx.0/24
After that logging into 10.xxx.xxx.1 should work fine.
Adding further nodes is not dissimilar to configuring the primary OpenVPN master server. You configure an openvpnbr0 device via dummy1 just as before, except you choose a different static IP e.g. 10.xxx.xxx.10. And when it comes to configuring openvpn, you use a client script like this:
client script-security 2 dev tap0 up "/etc/openvpn/up.sh" down "/etc/openvpn/down.sh" proto tcp # The hostname/IP and port of the server. remote xxx.xxx.xxx.xxx 1194 resolv-retry infinite nobind persist-key persist-tun ca ovh-ca.crt cert ovh-client.crt key ovh-client.key tls-auth ovh-ta.key 1 #comp-lzo verb 3
In other words, one adds the bridging scripts. Make sure it's working by pinging 10.xxx.xxx.10 from the VPN server.
Next step is to make Proxmox run the cluster from 10.xxx.xxx.10 instead of the default IP for the machine. How to do this isn't well documented, but it's very easy. Open /etc/hosts and have something like the following:
127.0.0.1 localhost.localdomain localhost 10.xxx.xxx.10 milla.vpn.local milla pvelocalhost 192.168.2.2 milla.nedland milla
What you're doing is to change the default IP to 10.xxx.xxx.10 and to set pvelocalhost as an alias for it. You can reboot now.
Next is to make sure that IP multicast is working. This is what the 'post-up route add -net 18.104.22.168 netmask 240.0.0.0 dev openvpnbr0' in the /etc/network/interfaces were for. You can verify they're configured by simply running 'route'. The firewall will block multicast though - this is easiest fixed as follows by disabling the firewall on the openvpnbr0 bridge:
nano /etc/ufw/before.rules -A ufw-before-input -i openvpnbr0 -j ACCEPT -A ufw-before-forward -i openvpnbr0 -j ACCEPT ufw disable && ufw enable
You probably ought to make sure multicast is working, so there is a handy test tool which you can install via 'aptitude install ssmping'. On one node, call 'ssmpingd' and on the other call 'asmping 22.214.171.124 ip_for_NODE_A_here'. You need to see something similar to this from the second node:
root@milla:/home/ned# asmping 126.96.36.199 10.xxx.xxx.1 asmping joined (S,G) = (*,188.8.131.52) pinging 10.xxx.xxx.1 from 10.xxx.xxx.10 unicast from 10.xxx.xxx.1, seq=1 dist=0 time=226.947 ms multicast from 10.xxx.xxx.1, seq=1 dist=0 time=341.924 ms unicast from 10.xxx.xxx.1, seq=2 dist=0 time=76.332 ms multicast from 10.xxx.xxx.1, seq=2 dist=0 time=146.323 ms unicast from 10.xxx.xxx.1, seq=3 dist=0 time=77.307 ms multicast from 10.xxx.xxx.1, seq=3 dist=0 time=149.288 ms unicast from 10.xxx.xxx.1, seq=4 dist=0 time=77.378 ms
What is key here is that there are lines containing 'multicast'. If you're getting 'unicast' only then something is broken.
Next is to configure the cluster - Proxmox v2.0 doesn't have master nodes, so any node will do, though you should choose the one in the most secure location (e.g. your home network) AND the one from which to replicate config. Do 'pvecm create YOUR-CLUSTER-NAME' followed by 'pvecm status' to make sure it works. You'll need to temporarily enable passworded root login to let nodes add, so to the top of /etc/ssh/sshd_config add and after which restart sshd:
PasswordAuthentication yes PermitRootLogin yes
Ok, should be ready to go now. On the other node do 'pvecm add <other node's ip>'. It will ask you for the other node's root password, so enter that. After a period of synchronisation, voila, you have yourself a cluster! Don't forget to disable passworded SSH logins after - you may wish to leave root login turned on as migrating VMs requires root login, however shortly we'll be setting up auto-replicating storage which removes the need for that.
If it times out as it may do over high latency connections such as mine, try rebooting. You then may find that apache refuses to start due to /etc/pve/local/pve-ssl.pem being missing. You can safely regenerate these manually using openssl - they're just standard Apache SSL certs.
ASIDE: If you mess up your Proxmox config, there is no obvious nor documented way of resetting the cluster config back to default on 2.0 like there was on 1.x. Here's how you do it:
service cman stop service pve-cluster stop cp -R /etc/pve/openvz /root cp -R /etc/pve/qemu-server /root rm /etc/cluster/cluster.conf rm /var/lib/pve-cluster/* service pve-cluster start service cman start # This should do nothing quietly cp /root/openvz/* /etc/pve/openvz cp /root/qemu-server/* /etc/pve/qemu-server
Note how we copy the container and VM configs out to root first - they get deleted by the reset - but copying them back after usually works a treat. Thanks Peter in the comments below for suggesting this!
Oh, and if you want to hand tune /etc/pve/cluster.conf once you have a running cluster but you only have one node, you need to issue a 'pvecm e 1' to disable read-only on /etc/pve :). The read-only gets turned on in an orphaned cluster node for safety.