Claud Computing

June 3, 2020admin No Comments

[HOW-TO] Share an OpenStack image between tenants

Today, I had to test an upgrade for software that comes in the form of an image. I wanted to do it in a test tenant, but the image was in the “prod” tenant. Of course, one could simply upload the image to another tenant, but there’s a better way – image sharing between tenants.

The concept is nothing new except that with OpenStack, there are a couple of quirks. It’s not so much sharing the image with another tenant, but adding the tenant to the image as a member.

So without further ado, let’s see how easy it is to accomplish this.

0) Set some environment variables to make our lives easier. “image_ID” is the image we want to share, “project_name” is the project (or tenant) we want to share the image with and “tenant_ID” is, well, the ID of the tenant that we want to have the image in.

export image_ID=1234-567-89012-34567-890123
export project_name=dev_tenant
export tenant_ID=a1b2c3d4e4

1) As mentioned before, we need to add the project ID as a member of the image:

openstack image add project $image_ID $project_name

2) Sharing is not enough. We also need to accept the share from the target tenant:

OS_TENANT_ID=$tenant_ID glance member-update $image_ID $tenant_ID accepted

3) Last but not least, confirm the image was indeed shared (accepted):

OS_TENANT_ID=$tenant_ID openstack image show $image_ID

So, how about un-sharing? Easy as pie, it’s as simple as removing the project from the image:

openstack image remove project $image_ID $project_name

May 13, 2020admin No Comments

[Quick fix] OpenStack Compute (nova) instance could not be found

Sometimes in the life of a cloud admin, we run across issues that seem quite daunting at first, but come out as trivial in the end. This was exactly the case when I had to migrate a few instances some while back.

Nova (openstack compute) reported all the instances in ERROR state. On a closer look with:

nova list --all-t --host $hypervisor

or with the openstack client:

openstack server list --all-projects --host $hypervisor

they were indeed down.

The logs showed a python traceroute ending with:

InstanceNotFound: Instance instance-00000084 could not be found.

The instance files were present on disk, the permissions looked fine, the directory structure was fine, SElinux looked good, qemu reported a healthy disk image when querying it; it’s as if nova was in the twilight zone.

Turns out the fix was trivial. By simply hard rebooting the instance or setting the state to active and then normally rebooting it, nova would magically find the instance on disk and happily power it on. Below, the 2 options:

nova reboot --hard $instance_id

or:

nova reset-state --active $instance_id
nova reboot $instance_id

Such a simple fix for what first appeared to be such a daunting task.

January 24, 2020admin No Comments

How to deploy a k8s cluster on VmWare ESX

k8s, vmware, ansible, metalLB, prometheus, grafana, alertmanager, metrics-server, harbor, clair, notary

If you’re short on resources in your home lab and simply can’t deploy an OpenStack private cloud to play around with, this tutorial will walk you through setting up a highly-available k8s cluster on VMware ESX vms. The principles used to deploy them very much resemble those used to deploy OpenStack instances so you’ll be able to quickly and effectively walk through the very first step in this tutorial.

The k8s cluster we’ll be deploying will have 3x master and 3x worker nodes and will use metalLB for bare-metal load balancing, prometheus, grafana and alertmanager for monitoring, metrics and reporting and harbor with clair and notary for image repository, signing and scanning.

We will be:

checking out and customising ansible-deploy-vmware-vm
creating vms from templates
setting up:
- the k8s cluster via kubespray
- metalLB
- prometheus, grafana and alertmanager
- metrics-server
- harbor
- notary and clair

What we won’t be doing is:

setting up VMwareESX (see here)
creating a VMware template (see here)

INFO: the VMware templates were made after an Ubuntu 18.04 vm.

STEP 1 — deploy multiple VMWare virtual machines from a template

Deploy the vm’s via Ansible. First and foremost, checkout ansible-deploy-vmware-vm and cd to it.

git clone https://github.com/cloudmaniac/ansible-deploy-vmware-vm.git
cd ansible-deploy-vmware-vm

Second, we need to tell ansible how to connect to our VMware ESX cluster. Edit or create the answerfile.yml and fill in the self explanatory blanks:

 Infrastructure
# - Defines the vCenter / vSphere environment
deploy_vsphere_host: '<vsphere_ip>'
deploy_vsphere_user: '<username>'
deploy_vsphere_password: '<password>'
deploy_vsphere_datacenter: '<datacenter>'
deploy_vsphere_folder: ''
esxi_hostname: '<esx_hostname>'

# Guest
# - Describes virtual machine common options
guest_network: '<network_name>'
guest_netmask: '<netmask>'
guest_gateway: '<gw>'
guest_dns_server: '<dns>'
guest_domain_name: '<domain_name>'
guest_id: '<guestID>'
guest_memory: '<RAM>'
guest_vcpu: '<CPU_cores>'
guest_template: '<template_name>'

Define the vms we want to deploy. Sample vms-to-deploy:

[prod-k8s-master]
prod-k8s-master01 deploy_vsphere_datastore='ESX2' guest_custom_ip='<ip>' guest_notes='Master #1'
prod-k8s-master02 deploy_vsphere_datastore='ESX2' guest_custom_ip='<ip>' guest_notes='Master #2'
prod-k8s-master03 deploy_vsphere_datastore='ESX2' guest_custom_ip='<ip>' guest_notes='Master #3'

[prod-k8s-workers]
prod-k8s-worker01 deploy_vsphere_datastore='ESX2' guest_custom_ip='<ip>' guest_notes='Worker #01'
prod-k8s-worker02 deploy_vsphere_datastore='ESX2' guest_custom_ip='<ip>' guest_notes='Worker #02'
prod-k8s-worker03 deploy_vsphere_datastore='ESX2' guest_custom_ip='<ip>' guest_notes='Worker #03'

[prod-k8s-harbor]
prod-k8s-harbor01 deploy_vsphere_datastore='ESX2' guest_custom_ip='<ip>' guest_notes='Harbor #01'

Define the playbook. Sample deploy-k8s-vms-prod.yml:

---
- hosts: all
  gather_facts: false
  vars_files:
    - answerfile.yml
  roles:
     - deploy-vsphere-template

Let’s deploy the vms:

ansible-playbook -vv -i vms-to-deploy deploy-k8s-vms-prod.yml

STEP 2

Deploy the k8s cluster on top of the created vms.

sudo pip install -r requirements.txt

# copy the sample inventory
cp -rfp inventory/sample inventory/mycluster

# declare all the k8s cluster IPs (masters and workers only)
declare -a IPS=(x.x.x.x y.y.y.y z.z.z.z)
# see below samples before running this cmd
CONFIG_FILE=inventory/mycluster/hosts-prod.yml python3 contrib/inventory_builder/inventory.py ${IPS[@]}

sample hosts-prod.yaml:

all:
  hosts:
    prod-k8s-master01:
      ansible_host: <ip>
      ip: <ip>
      access_ip: <ip>
    prod-k8s-master02:
      ansible_host: <ip>
      ip: <ip>
      access_ip: <ip>
    prod-k8s-master03:
      ansible_host: <ip>
      ip: <ip>
      access_ip: <ip>
    prod-k8s-worker01:
      ansible_host: <ip>
      ip: <ip>
      access_ip: <ip>
    prod-k8s-worker02:
      ansible_host: <ip>
      ip: <ip>
      access_ip: <ip>
    prod-k8s-worker03:
      ansible_host: <ip>
      ip: <ip>
      access_ip: <ip>
  children:
    kube-master:
      hosts:
        prod-k8s-master01:
        prod-k8s-master02:
        prod-k8s-master03:
    kube-node:
      hosts:
        prod-k8s-worker01:
        prod-k8s-worker02:
        prod-k8s-worker03:
    etcd:
      hosts:
        prod-k8s-master01:
        prod-k8s-master02:
        prod-k8s-master03:
    k8s-cluster:
      children:
        kube-master:
        kube-node:
    calico-rr:
      hosts: {}

sample inventory.ini (make sure the inventory file contains the vm’s proper name (ie. the ones defined under “ansible-deploy-vmware-vm/vms-to-deploy”))

# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
# ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value.
[all]
# node1 ansible_host=95.54.0.12  # ip=10.3.0.1 etcd_member_name=etcd1
# node2 ansible_host=95.54.0.13  # ip=10.3.0.2 etcd_member_name=etcd2
# node3 ansible_host=95.54.0.14  # ip=10.3.0.3 etcd_member_name=etcd3
# node4 ansible_host=95.54.0.15  # ip=10.3.0.4 etcd_member_name=etcd4
# node5 ansible_host=95.54.0.16  # ip=10.3.0.5 etcd_member_name=etcd5
# node6 ansible_host=95.54.0.17  # ip=10.3.0.6 etcd_member_name=etcd6

# ## configure a bastion host if your nodes are not directly reachable
# bastion ansible_host=x.x.x.x ansible_user=some_user

[kube-master]
# node1
# node2

[etcd]
# node1
# node2
# node3

[kube-node]
# node2
# node3
# node4
# node5
# node6

[k8s-cluster:children]
kube-master
kube-node

Run the playbook to create the cluster:

ansible-playbook -vvv -i inventory/mycluster/hosts-prod.yml --become --become-user=root cluster.yml

STEP 3

MetalLB is a bare-metal load balancer for k8s that makes your current network extend into your k8s cluster.

Connect to one of the master nodes over ssh and setup your environment:

source <(kubectl completion bash) # or zsh

Deploy metalLB by performing the following on one of the master nodes:

kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.8.3/manifests/metallb.yaml

MetalLB remains idle until configured. As such, we need to define an IP range the k8s cluster can use and that is outside of any DHCP pool.

cat <<EOF | kubectl apply -f - 
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses:
      - <ip_range_start>-<ip_range_end>
EOF

Check the status of the pods with:

kubectl -n metallb-system get po

STEP 4

Install prometheus, grafana and alertmanager:

git clone https://github.com/coreos/kube-prometheus.git
cd kube-prometheus
kubectl create -f manifests/setup
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl create -f manifests/

# To teardown the stack:
#kubectl delete -f manifests/

In order to access the dashboards via the LoadBalancer IPs, we need to change a few service types from “ClusterIP” to “LoadBalancer”:

kubectl -n monitoring edit svc prometheus-k8s
kubectl -n monitoring edit svc grafana
kubectl -n monitoring edit svc alertmanager-main

# Get the external IP for the edited services
kubectl -n monitoring get svc

In order to access the web interface of these services, use the following ports:

– grafana -> 3000 (default usr/pass is admin/admin)
– prometheus -> 9090
– alertmanager -> 9093

STEP 5

Installing the metrics-server or how to get “kubectl top nodes” and “kubectl top pods” to work.

git clone https://github.com/kubernetes-incubator/metrics-server.git
cd metrics-server
kubectl create -f deploy/1.8+/

kubectl top nodes
kubectl top pods --all-namespaces

STEP 6

Installing Harbor with clair and notary support.

ssh to your harbor vm and install docker and docker-compose:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

sudo apt-get update && sudo apt-get install -y docker-ce
sudo curl -L "https://github.com/docker/compose/releases/download/1.25.3/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# allow  users to use docker without administrator privileges
sudo usermod -aG docker $USER

Logout and login and check docker:

docker info

Time to generate SSL certificated:

# generate a certificate authority
openssl req -newkey rsa:4096 -nodes -sha256 -keyout ca.key -x509 -days 3650 -out ca.crt

# generate a certificate signing request
openssl req -newkey rsa:4096 -nodes -sha256 -keyout prod-k8s-harbor01.lab.local.key -out prod-k8s-harbor01.lab.local.csr

# Create a configuration file for the Subject Alternative Name.
vim extfile.cnf
subjectAltName = IP:<ip>

# generate the certificate
openssl x509 -req -days 3650 -in prod-k8s-harbor01.lab.local.csr -CA ca.crt -CAkey ca.key -CAcreateserial -extfile extfile.cnf -out prod-k8s-harbor01.lab.local.crt

# copy the certificate to /etc/ssl/certs.
sudo cp *.crt *.key /etc/ssl/certs

Download and install Harbor:

wget https://github.com/goharbor/harbor/releases/download/v1.9.4/harbor-online-installer-v1.9.4.tgz
tar xvzf harbor-online-installer-v1.9.4.tgz
cd harbor

Edit harbor.yml and change a few options:

hostname: <harbor_ip>

# http related config
http:
  port: 80
https:
  port: 443
  certificate: /etc/ssl/certs/prod-k8s-harbor01.lab.local.crt
  private_key: /etc/ssl/certs/prod-k8s-harbor01.lab.local.key
harbor_admin_password: <your_admin_pass>

Finally, install Harbor, enabling clair and notary:

sudo ./install.sh --with-notary --with-clair

Configure the docker daemon on each of your worker nodes and

export harbor_ip=<harbor_local_ip>
declare -a IPS=(<k8s_master1> <k8s_master2> <k8s_master3>)
for i in ${IPS[@]}; do
  scp ../ca.crt $i:
  ssh $i "sudo mkdir -p /etc/docker/certs.d/$harbor_ip && \
    sudo mv ca.crt /etc/docker/certs.d/$harbor_ip/ && \
    sudo systemctl restart docker"
done

kubectl create secret docker-registry harbor \
--docker-server=https://<harbor_ip> \
--docker-username=admin \
--docker-email=admin@claud-computing.net \
--docker-password='<your_harbor_admin_password>'

To deploy images to Harbor, we need to pull them, tag them and push them. From the harbor machine (or any other that has the certificates in place as above), perform the following:

docker pull gcr.io/kuar-demo/kuard-amd64:1
docker tag gcr.io/kuar-demo/kuard-amd64:1 <harbor_ip>/private/kuard:v1
docker login <harbor_ip>
docker push <harbor_ip>/private/kuard:v1

Deploy the kuard app on k8s. ssh to a node where you have access to the cluster and create kuard-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kuard-deployment
  labels:
    app: kuard
spec:
  replicas: 1
  selector:
    matchLabels:
      run: kuard
  template:
    metadata:
      labels:
        run: kuard
    spec:
      containers:
      - name: kuard
        image: <harbor_ip>/private/kuard:v1
      imagePullSecrets:
      - name: harbor

Apply it.

kubectl create -f kuard-deployment.yaml
kubectl get po

STEP 7

Signing docker images with Notary. Install it first:

sudo wget https://github.com/theupdateframework/notary/releases/download/v0.6.1/notary-Linux-amd64 -O /usr/local/bin/notary
sudo chmod +x /usr/local/bin/notary

If needed, copy the “ca.crt” to the client machine you’re working from.

Check if you can connect to the harbor server:

openssl s_client -connect <harbor_ip>:443 -CAfile /etc/docker/certs.d/<harbor_ip>/ca.crt -no_ssl2

You should get something like this:

CONNECTED(00000005)depth=1 C = EU, ST = Example, L = Example, O = Example, OU = Example, CN = ca.example.local, emailAddress = root@ca.example.localverify return:1depth=0 C = EU, ST = Example, O = Example, CN = notary-server.example.localverify return:1---Certificate chain0 s:/C=EU/ST=Example/O=Example/CN=notary-server.example.locali:/C=EU/ST=Example/L=Example/O=Example/OU=Example/CN=ca.example.local/emailAddress=root@ca.example.local

Now pull an image from docker hub and tag it but don’t push it just yet.

docker pull nginx:latest
docker tag nginx:latest <harbor_ip>/private/nginx:latest

Let’s enable the Docker Content Trust and then push the image. Please note that when first pushing a signed image, you will be asked to create a password.

export DOCKER_CONTENT_TRUST_SERVER=https://<harbor_ip>:4443 DOCKER_CONTENT_TRUST=1
docker login <harbor_ip>
docker push <harbor_ip>/private/nginx:latest
unset DOCKER_CONTENT_TRUST_SERVER DOCKER_CONTENT_TRUST

Ok, so now the Docker image is pushed in our Registry server and it is signed by the Notary server. We can verify

notary --tlscacert ca.crt -s https://<harbor_ip>:4443 -d ~/.docker/trust list <harbor_ip>/private/nginx

Test that you can pull from docker hub and harbor:

docker pull nginx
docker pull <harbor_ip>/private/nginx:1.16.0

For using a completely private repository, leave the “DOCKER_CONTENT_TRUST_SERVER” and the “DOCKER_CONTENT_TRUST” environment variables set, on all the k8s cluster machines.

STEP 8

Clair is an open source project for the static analysis of vulnerabilities in application containers. To enable it go to the Harbor web-interface and click on “vulnerability” -> “edit” and select the scan frequency. Save and also click “scan now” if it’s your first time doing this.

May 10, 2019admin No Comments

How to pass the CKA and CKAD exam

Last week, I passed the Certified Kubernetes Administrator (CKA) and Certified Kubernetes Application Developer (CKAD) exam. I had very little knowledge and experience with Kubernetes but after spending only 3 weeks preparing, I managed to successfully pass both exams.

Below, I will detail the process I took in going from zero to hero.

1) Take an online course
It goes without saying that if you have close to zero knowledge on the subject at hand, some guidance to kick things off doesn’t hurt. I can wholeheartedly recommend the LinuxAcademy CKA, CKAD and Kubernetes the Hard Way courses. I found them to be very well structured and well worth the money. Take your time and make sure you know the material well. Redo the hands-on labs until you can recite them by heart. The $49,99 you pay per month is more than enough to help you pass both exams.

2) Take free practice exams
Once you’re done with the courses and know them like the back of your hand, step outside of your comfort zone and challenge yourself further. I’ve managed to find a few other practice exams that helped me a lot.

There’s a short one from the Kubernetes App Developer (CKAD) course from matthewpalmer.net that has a challenge which I found to be quite interesting.

Another free resource I found was the last point of the the Kubernetes certification course over at kodekloud.com. It’s a medium sized exercise that will walk you through many different aspects of k8s. I really enjoyed this one so I can highly recommend it.

3) Do free exercises
Now we get to the sauce. I found https://github.com/dgkanatsios/CKAD-exercises to be an amazing resource in checking your overall knowledge. They are very good exercises to go through, whether you are taking the CKA or the CKAD exam.
Additionally, https://github.com/wiewioras/k8s-labs is one amazing piece of exercise that I recommend to everyone.

4) Prep notes and tips
See https://github.com/twajr/ckad-prep-notes. Make sure to check the kubectl commands from the “core concepts”. They will save you a lot of time when creating resources and time is of the essence. With 24 questions and 3 hours for the CKA, that gives you 7.5 minutes per question. The CKAD is slightly more demanding with 19 questions and 2 hours, giving you an average of 6.3 minutes per question. Both exams are designed in such a way that you don’t have the luxury of spending time checking the docs; you really need to know what you’re doing.

In summary
– take a course
– do as many practice exams as you can
– do as many exercises as you can
– be VERY fast in creating resources
– know as much as possible about k8s

And finally, here are my certs:

[envira-gallery id=’179′]

Last words:
Kubernetes is a new and cool technology. My take is that it will become the new cloud platform. It is without a shadow of a doubt, the most exciting technology I have seen in the last 10 years. I believe it will become a standard.

May 9, 2019admin 1 Comment

[OpenBSD] Installing OpenBSD on top of software raid

In this tutorial, we will demonstrate how easy it is to install OpenBSD on a software RAID1. The same principles can be applied to create RAID0,5 or more.

We have to drop to the shell since the installer does not have this functionality built in. Select “s” at the installer.

Welcome to the OpenBSD/amd64 6.5 installation program.
(I)nstall, (U)pgrade or (S)hell? s

Check that we actually have more than 1x hdd:

# sysctl hw.disknames
hw.disknames=wd0:7922594e8158ee03,wd1:49129150e28daf19,cd0:,rd0:7c8ac10ea613493f

By default, OpenBSD only creates one hdd. Run the following to manually create the second hdd:

# cd /dev/
# sh MAKEDEV wd1

Now initialize the MBR on both disks:

# fdisk -iy wd0
Writing MBR at offset 0.
# fdisk -iy wd1
Writing MBR at offset 0.

Create a label on both disks called “raid”. Dump it and restore to the second disk. Make sure to select “raid” for the “FS type“. Run the following:

# disklabel -E wd0
Label editor (enter '?' for help at any prompt)
> a a
offset: [64] 
size: [20964761] 
FS type: [4.2BSD] raid
> q
Write new label?: [y] 
# disklabel wd0 > label.dump
# disklabel -R wd1 label.dump

Create the softraid(4) volume using the two “a” partitions. This will create a new drive called “sd0”. Run following:

# bioctl -c 1 -l wd0a,wd1a softraid0 
sd0 at scsibus1 targ 1 lun 0: SCSI2 0/direct fixed
sd0: 10236MB, 512 bytes/sector, 20964233 sectors
softraid0: SR RAID 1 volume attached as sd0

where “-c” represents the RAID level and “-l” the chunk that will be used to create the RAID. These values can be adjusted to create other types of RAID arrays.

After this is done, exit the shell and proceed with the normal installation (just remember to select the new RAID drive “sd0” as the root disk).

# ^D
erase ^?, werase ^W, kill ^U, intr ^C, status ^T

Welcome to the OpenBSD/amd64 6.5 installation program.
(I)nstall, (U)pgrade or (S)hell? i

[...]

Available disks are: wd0 wd1 sd0.
Which one is the root disk? (or 'done') [wd0] sd0

And that’s it! Enjoy your software RAID OpenBSD system!

January 4, 2019admin No Comments

Making WordPress work behind a reverse proxy

My current web infrastructure includes an LB that does SSL termination via pound and several web servers behind it. Unfortunately WordPress gets confused when dealing with this scenario and will serve, among others, images and CSS over HTTP, and text over HTTPS.

To get around this, all we have to do is edit wp-config.php and add the following code:

if ( (!empty( $_SERVER['HTTP_X_FORWARDED_HOST'])) ||
     (!empty( $_SERVER['HTTP_X_FORWARDED_FOR'])) ) {

    $_SERVER['HTTP_HOST'] = $_SERVER['HTTP_X_FORWARDED_HOST'];
    $_SERVER['HTTPS'] = 'on';
}

Make sure to restart your web server!

December 12, 2018admin No Comments

[Ansible] How to json_query a key that contains a dot

Today I found myself needing to extract the value of a json key that contained a dot (‘.’). As usual, json_query to the rescue, except that it didn’t work, returning an empty value. With a little bit of digging and trial and error testing, the solution presented itself. The key is to enclose the json key in quotes and then escape them.

Here is an example json file:

{
	"menu": {
		"id": "file",
		"value": "File",
		"popup": {
			"dot.key": [{
					"value": "New",
					"onclick": "CreateNewDoc()"
				},
				{
					"value": "Open",
					"onclick": "OpenDoc()"
				},
				{
					"value": "Close",
					"onclick": "CloseDoc()"
				}
			]
		}
	}
}

And an example ansible playbook:

- name: json manipulation 
  gather_facts: no
  hosts: localhost
  vars:
    json_var: "{{ lookup('file', 'file.json') | from_json }}"
  tasks:
  - name: get "menu.item" from json
    set_fact:
      json_out: "{{ json_var | json_query('menu.popup.\"dot.key\"[].value') }}"

We can clearly see the result:

PLAY [json manipulation] ******************************************************************************************************************************************************************************************
META: ran handlers

TASK [get "menu.item" from json] **********************************************************************************************************************************************************************************
task path: /tmp/1/1.yaml:7
ok: [localhost] => {
    "ansible_facts": {
        "json_out": [
            "New", 
            "Open", 
            "Close"
        ]
    }, 
    "changed": false
}
META: ran handlers
META: ran handlers

PLAY RECAP ********************************************************************************************************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0

PS: In this case we have 3x keys under “dot.key”. We obtain all of them in our result because we are selecting the whole dictionary ( ‘\”dot.key\”[]’ ) — notice the square brackets. We insert a number here and select the position of the key we want to get the value for. If for instance we would only need the 2nd key, we would use ‘\”dot.key\”[1]’ — “0” being the first value

October 26, 2018admin 34 Comments

ESX 6.5 and 6.7 on HPE G6/G7 server PSOD fix

Background

HP’s done it again. They’ve managed to break their custom ESX ISO on the G6 and G7 servers. I suspect it’s the same for G8.

If like me you found yourself perplexed by the PSOD (pink screen of death) after upgrading from ESX 6.0 to 6.5, keep on reading for the fix (skip to the bottom for the download link to an already fixed ISO).

The issue

It seems the “hpe-smx-provider” driver version 6.5.0 from the ESX 6.5 ISO is causing the PSOD.

The fix and the drawbacks

(Simply) Replace the driver with it’s older counterpart (version 6.0.0) and reinstall (see below for the procedure or check the “download link” section at the very end for an already fixed ISO). Unfortunately if you decide to create a custom ISO with the previous version of “hpe-smx-provider”, you will no longer be able to upgrade to any future ESX version and will need to do a full installation every time (thank you HP!). That being said, here is the procedure to customize the ISO:

Customize your own HPE ESX ISO

Install PowerCLI from here. At the time of this writing, the latest version was 6.5.0R1.
Download both the ESX 6.5 and 6.0 offline bundles and save them to a convenient place (ex: C:\HP) — download from this link. To make it easier, rename them to something like HPE_ESX6.5.zip and HPE_ESX6.0.zip.

Open up PowerCLI and do the following:Add the 6.5 bundle

cd C:\HP
Add-EsxSoftwareDepot -DepotUrl HPE_ESX6.5.zip

# check that the profile was loaded and the vendor is HPE

Get-EsxImageProfile

# clone the profile (feel free to specify whatever you like for the “vendor” when prompted)

New-EsxImageProfile -CloneProfile HPE-ESXi-6.5.0* -Name "HPE-ESX-6.5-CUST"

# double check the profile was indeed cloned

Get-EsxImageProfile

# Remove the “hpe-smx-provider” driver from the clone

Remove-EsxSoftwarePackage HPE-ESX-6.5-CUST hpe-smx-provider

# Add the 6.0 bundle

Add-EsxSoftwareDepot -DepotUrl HPE_ESX6.0.zip

# check the profile

Get-EsxImageProfile

# check driver versions from both bundles

Get-EsxSoftwarePackage | findstr smx

# Add hpe-smx-provider version 6.00 to the custom profile

add-esxsoftwarepackage -imageprofile HPE-ESX-6.5-CUST -softwarepackage "hpe-smx-provider 600.03.11.00.9-2768847"

# Export the profile to an ISO

Export-EsxImageProfile -ImageProfile HPE-ESX-6.5-CUST -ExportToIso -filepath "HPE-ESX-6.5-CUST.iso"

The results

Download links

The following images have hpe-smx-provider version 600.03.11.00.9-2768847. Nothing else was changed. The 6.5.0d (build 5310538) is running on an HP Proliant DL380 G7 since March 2018 and was also tested on a DL380 G6 for a couple of months. For convenience, I’ve also built a 6.7.0 ISO and at the request of a reader, 6.7.0 Update1 from April of 2019:

VMware-ESXi-6.5.0-5310538-HPE-650.10.1.5.20-Oct2017_CUSTOM.iso
VMware-ESXi-6.7.0-9484548-HPE-Gen9plus-670.10.3.5.6-Sep2018_CUSTOM.iso
VMware-ESXi-6.7.0-Update1-11675023-HPE-Gen9plus-670.U1.10.4.0.19-Apr2019_CUSTOM.iso

Enjoy!