SCVMM: When the deployment of new VM template suddenly fails

2 Mar

Recently I ran in a very strange behavior when deploying a VM template with Server 2016 through VMM 2012 R2. First of all to enable the full support of Windows Server 2016-based VMs in VMM 2012 you need at least Update Rollup 11 Hotfix 1 installed. But even after installing  the latest UR (UR12 in my case), the deployment of a Server 2016 VM has failed.

The Issue:
Everytime when a new VM is deployed from a Server 2016 VM template the process fails at specialize phase of the sysprep. However all other existing templates with Server 2012 were working as expected.

Because in in this phase also the domain join happens I decided to give another try with a VM template which has no domain join configured. And tada, the VM was deployed successfully. 

The root cause:
With this finding my assumption was that, when the VM template is configure for domain join, VMM adds something in the unattend.xml which Server 2016 does not like that much. So I inspected the unattend.xml file of a failed deployment and there I found the following section which has looked a litte bit strange:

 Somehow the Domain of the domain join account was missing. 

The Solution:
So I checked the VMM Run As Account which was specified as domain join credentials in the VM template. And as you can see, we have also no domain information here.  

After changing the username to “domain\vm domain join” the deployment went through smooth as it should. Inspecting the unattend.xml file showed that the domain is now also correctly filled in.

Conslusion:
When the deployment of a new VM Template in VMM suddenly fails at the domain join step, double check the run as account and be sure that there is also the domain name in the username field.
In my case it was a template with Server 2016. But I think chances are good as the same could also happens with new VM templates with another guest OS.

Beware of SCVMM UR5 if you have Hyper-V Clusters with Storage Spaces

23 Mar

The latest Update Rollup (UR5) for SCVMM 2012 R2 seems to have some issues with Live Migration in environments where the Hyper-V hosts have Cluster Shared Volumes (CSV) directly stored on clustered Storage Spaces. So basically this is the case if you are using one of the following configuration for your Hyper-V clusters:

  • Two Hyper-V host directly connected to a SAS JBOD.
  • Cluster in a Box Systems like these from DataON.

Note: The here described issue does not occur if the VMs are stored on a Scale-Out File Server or on a traditional Fiber Channel SAN

The Error:
Anyway the issue I have noticed in UR5 with one of the above listed configuration, is that when you try to live migrate a VM in SCVMM the Live Migration fails with the following Error:
Error20404

This only happens in SCVMM. Live Migration through Failover Cluser Manager still works fine.
After some troubleshooting I found also that the Live Migration works sometimes from e.g. Host 1 to Host 2 but not the other way. So this brings me to the conclusion that it must has something to do which host is the owner node of the CSV Volume. And some further tests has confirmed my assumption. If you change the owner node of a CSV which is stored on clustered Storage Spaces the whole disk is moved from on node to the other and this seems to confuse SCVMM.
As a result SCVMM inserts an invalid value (00000000-0000-0000-0000-000000000000) for “HostDiskID”, for one of the hosts, in the “tbl_ADHC_HostDisk” table in the SCVMM DB.
tbl_ADHC_HostDisk

During the Live Migration pre checks, SCVMM runs a query to find a disk with the ID “00000000-0000-0000-0000-000000000000” in the DB which obviously does not exists. So the pre checks and Live Migration Job fails immediately with error 20404

Solution / Conclusion:

Update April 29, 2015: UR6 for SCMM 2012 R2 is now released and includes a fix for this issue. Yeah! 🙂

After opening a Microsoft Support case and posting the issue on connect.microsot.com I got the confirmation from Microsoft that this behavior is a bug in UR5. It will probably fixed in UR6 which is expect for April.
So my advice for everyone, which is using one of the above configuration together with SCVMM, is to stay on UR4 and wait for UR6. If it’s not already too late. 😉

 

Open high ports (over 49151) on a Windows Server Gateway

19 Jan

In a cloud infrastructure with System Center Virtual Machine Manager (SCVMM), Hyper-V and Azure Pack the Windows Server Gateway could provide the tenants with the possibility to connect their virtual network, provided by Hyper-V Network Virtualization (HNV, NVGRE), with the Internet via NAT. The tenant has then also the possibility to open and forward inbound ports to his VMs. For example he can open Port 80 to run a Webserver which is public reachable over the internet.
Basically this works very well. But lately I had a situation where I had to forward TCP Port 60000. So I was going the Azure Pack Tenant Portal and was trying to add a new Network Rule like I did it several times before:
image

 

 

 

 

 

 

 

 

 

 

But then it happens. The operation failed with a strange error:image

Then I had a look in SCVMM and found this, not very instructive, error:
image

So I digged in a little deeper and discovered on the gateway VM that SCVMM adds the external address for the tenants with the port range 1-49151. So that’s explains why you can not forward Port over 49152 on multitenant Windows Server NAT Gateway:
image

Probably the SCVMM defines this port range for the external address because all ports above 49151 where per RFC6335 actually destined for dynamic ports or private ports. In Windows the this range is also specified for the dynamic client port range for outgoing connections. (RPC Communication and so forth)

Bonus, Possible solutions

Option 1, manual intervention with
PowerShell :
But the RRAS Role in Windows which is also used for multitenant NAT Gateway has no restriction which would hinder you define external address with the whole port range from 1-65535 with PowerShell. In fact when you set an external address with PowerShell the default values for PortStart and PortEnd is 1024 and 65535.
This means you can remove the external address set by scvmm and add the address again with PowerShell with the whole port range. This can achieved by the following steps:

  • Get all external IP Address with Get-NetNatExternalAddress and note the values form the parameter “NatName” and “IPAddress” from the definition which you want to change.
  • Remove the existing external Address definition:

    image
  • Add a New external Address with the same value for NatName and IPAddress but with the new port range:


    image

Afterward you can head again to the tenant portal and now you can add a Network Rule with a port greater that 49151 without any problem. Smiley

Option 2, Registry setting for SCVMM:
After some further research I found that SCVMM has a undocumented Registry setting where you can specify the end port for the external address definition on the NAT Gateway. By creating the following Registry Key SCVMM configures automatically the whole port range (1 to 65535) if a tenant activates the NAT Gateway for his VNet in the Tenant Portal.

image
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft System Center Virtual Machine Manager Server\Settings\NATPortRangeEnd, REG_DWORD, Value: 65535

Disclaimer: Use these settings at your own risk! These where NO official solutions from Microsoft and changing these settings will lead probably to a unsupported situation! So be careful! Zwinkerndes Smiley