Get insights about the performance of your Windows systems with Grafana

12 May

Ever dreamed about some mission control like dashboards to get a quick insight about the performance of your Windows systems? 😊

If yes, then you probably like a view like this:

So here is how you get such a dashboard for your system in 6 simple steps in under an hour:

Install a VM with Ubuntu Linux 16.04.2 LTS

Even when it is Linux, no rocket science is needed here 😊. Just download the ISO image from the Ubuntu Website, attach it to your VM and boot form it. After that you get asked some simple questions about time zone, keyboard and partition settings. The most you can accept with the defaults or choose simple your preferred languages etc. Quite easy.

Set time zone to UTC

Login in to your Ubuntu system and change the time zone to UTC. As the InfluxDB (the backend) uses UTC time internally it is a clever idea to set the time zone for the system also to UTC.
To do so run the following command. Then choose “Non of the above” > “UTC”.

Install InfluxDB

InfluxDB is the he backend of the solution where all data is stored. It is a database engine which is built form the ground up to store metric data and for doing real-time analytics.
To install InfluxDB run the following commands on the Linux VM:

Install Grafana

Grafana is the frontend which will generate your nice-looking dashboard with the data stored in the InfluxDB. To install Grafana run the following commands on the Linux VM:

Install Telegraf on your Windows system

Now we are ready to collect data from our systems with the Telegraf, a small agent which can collect data from many various sources. One of these source is Windows Perfmon Counters which we will use here.

1. Download the Windows version of the Telegraf agent
https://dl.influxdata.com/telegraf/releases/telegraf-1.2.1_windows_amd64.zip
2. Copy the content of the zip file to C:\Program Files\telegraf on your systems
3. Replace the telegraf.conf with this one. -> telegraf.conf
So all needed perform counters get collected which are needed for the example dashboard in the last step.
4.  Also in the telegraf.conf, update the urls  paramter so it point to the IP address of your Linux VM


5. Install Telegraf as service and start it

Create Dahsboards and have fun! 🙂

The last step is to create your nice dashboards in the Grafana web UI. A good starting point is the “Telegraf & Influx Windows Host Overview” dashboard which can directly imported from the grafana.net repository

Login into the Grafana Web UI -> http://<your linux VM IP>:3000 (Username: admin, Password: admin)

First Grafana need to know it’s data source. Click on the Grafana logo in the top left corner and select “Data Source” in the menu. Then click on “+ Add data source“.

Define an Name for the Data Source (e.g. InfluxDB-telegraf) and choose “InfluxDB” as Type.
The URL is http://localhost:8086 has we have installed the InfluxDB locally. “Proxy” as the access type is correct.
The telegraf agent will automatically create the data base “telegraf”. So enter “telegraf” as Database name. As user you can enter anything. InfluxDB does not need any credentials by default but the Grafana interface wants you to enter something. (otherwise you can not save the data source)

Now go ahead and import your first dashboard.  Select Dashboard > Import in the menu

Enter “1902” and click on “Load

Change the Name if you like and select the data source just created in the step above (InfluxDB-telegraf) and then click on Import.

And tada! 🙂

Further steps

Now the Telegraf / InfluxDB setup is collecting performance data of your windows machines. With Grafana the collected data can visualized in a meaningful way so the determination the health of your system gets easy.

To further customize the data and visualization to your specific needs you can:

Be aware of DSC pull server compatibility issues with WMF 5.0 and 5.1

20 Feb

Apparently, there are some incompatibilities when WMF 5.0 computers wants to communicate with a DSC pull server running on WMF 5.1 or vice versa. This is especially the case when the “client” node and the pull server are not running the same OS version. For example, when you have a DSC pull server running on Server 2012 R2 (with WMF 5.0) and some DSC nodes running on Server 2016 (which as WMF 5.1 built in).

Currently I experienced two issues:

  1. A DSC pull client running on WMF 5.1 cannot send status reports when the DSC pull server is running still on WMF 5.0. This is because WMF 5.1 has invented the new “AdditinalData” parameter in the status report. I have reported this bug also on GitHub: https://github.com/PowerShell/PowerShell/issues/2921 
  2. A DSC pull client running von WMF 5.0 cannot communicate at all with a DSC pull server running on WMF 5.1.
     

Solution / Workaround for issue 1:
As the WMF 5.1 RTM no (again) available the simplest solution would be to upgrade the server and/or client to WMF 5.1. However, when you have to upgrade the DSC pull server then you must create a new EDB file and reregister all clients. Otherwise the issue preserve because the “AdditionalData” field is still missing in the database.

Solution / Workaround for issue 2:
The root cause of this issue can be found in the release notes of WMF 5.1:
“Previously, the DSC pull client only supported SSL3.0 and TLS1.0 over HTTPS connections. When forced to use more secure protocols, the pull client would stop functioning. In WMF 5.1, the DSC pull client no longer supports SSL 3.0 and adds support for the more secure TLS 1.1 and TLS 1.2 protocols.”

So, starting with WMF 5.1 the DSC pull server does not support TLS 1.0 anymore, but in reverse a DSC pull client running on WMF 5.0 is still using TLS 1.0 and can therefore not connect anymore to the DSC pull server.

The solution, without deploying WMF 5.1 to all pull clients, is to alter the behavior of the DSC pull server so he accepts again TLS 1.0 connections. This can be done by changing the following registry key on the DSC pull server:

Change Value from 0x0 to 0x1 and reboot the DSC pull server.
Afterward DSC pull clients running on WMF 5.0 can connect again to the DSC pull server.

How to enable CredSSP for PowerShell Remoting through GPO

19 Oct

In a domain environment CredSSP can easily enabled through a GPO. To do so there are three GPO settings to configure:

  1. Computer Configuration > Administrative Templates > Windows Components > Windows Remote Management (WinRM) > WinRM Client > Allow CredSSP Authentication (Enable)
    image
  2. Computer Configuration > Administrative Templates > Windows Components > Windows Remote Management (WinRM) >  WinRM Service > Allow CredSSP Authentication (Enable)
    image
  3. Computer Configuration > Administrative Templates  > System > Credential Delegation > Allow delegation of fresh credentials (add wsman/*<.FQDN of your domain>)
    image
  4. If in your environment are computers in an other, not trusted, AD domain to which you want connect using explicit credential and CredSSP you have to enabled also the following GPO setting.
    Computer Configuration > Administrative Templates  > System > Credential Delegation > Allow delegation of fresh credentials with NTLM-only server authentication (add wsman/*<.FQDN of your other domain>)
    image

Now you are ready to use CredSSP within your PowerShell remote sessions.

And a final word of warning! 😉
When you are using CredSSP your credentials were transferred to the remote system and your account is then a potential target for a pass-to-hash attack. Or with other words an attacker can steal your credentials. So only use CreddSSP with your PowerShell Remote session if you really have a need for it!

PowerShell DSC resource to enable/disable Microsoft Update

16 Jun

Ever get tired to manually set the check box for Microsoft Update in the above screen on a bunch of servers (e.g. in a new test lab setup or so)? Then this one is for you.

 I wrote recently, mostly as an exercise, a PowerShell DSC Module with a resource to enable (or disable) the Microsoft Update.

 I have then published the Module von GitHub to get another exercise. 😉
So if you interested you can get the Module from here:

https://github.com/J0F3/cMicrosoftUpdate

After you get the module, enabling the Microsoft Update settings will look like this:

Happy DSCing! 🙂

The connection between Hyper-V Network Virtualization (NVGRE) and MTU Size (and Linux)

26 Apr

In a network with Hyper-V Network Virtualization (using NVGRE encapsulation) the MTU (Maximum Transmission Unit) size is 42 Bytes smaller than in a traditional Ethernet network (where it is 1500 Bytes). The reason for this is the NVGRE encapsulation which needs the 42 Bytes to store his additional GRE Header in the packet. So the maximum MTU size with Hyper-V Network Virtualization is 1458 Bytes.

The problem with Linux: VMs:
For VMs running Windows Server 2008 or newer this should not be a Problem because Hyper-V has a mechanism which lowers the MTU size for the NIC of the VM automatically if needed. (Documented on the TechNet Wiki).
But with VMs running Linux you could run in a problem because the automatically MTU size reduction seem to not function correctly with Linux VMs:
https://support.microsoft.com/en-us/kb/3021753/
This has the effect that the MTU size in the Linux VMs stays at 1500 and therefore you can experience some very weird connection issues.

The Solution:
So there are two options to resolve this issue:

  • Set the MTU size for the virtual NICs of all Linux VMs manually to 1458 Bytes
  • Enable Jumbo Frames on the physical NICs on the Hyper-V Hosts. Then the there is no need to lower the MTU size in the VMs.
  • (wait for kernel updates for your Linux distribution which has the fix from KB3021753 implemented)

Beware of SCVMM UR5 if you have Hyper-V Clusters with Storage Spaces

23 Mar

The latest Update Rollup (UR5) for SCVMM 2012 R2 seems to have some issues with Live Migration in environments where the Hyper-V hosts have Cluster Shared Volumes (CSV) directly stored on clustered Storage Spaces. So basically this is the case if you are using one of the following configuration for your Hyper-V clusters:

  • Two Hyper-V host directly connected to a SAS JBOD.
  • Cluster in a Box Systems like these from DataON.

Note: The here described issue does not occur if the VMs are stored on a Scale-Out File Server or on a traditional Fiber Channel SAN

The Error:
Anyway the issue I have noticed in UR5 with one of the above listed configuration, is that when you try to live migrate a VM in SCVMM the Live Migration fails with the following Error:
Error20404

This only happens in SCVMM. Live Migration through Failover Cluser Manager still works fine.
After some troubleshooting I found also that the Live Migration works sometimes from e.g. Host 1 to Host 2 but not the other way. So this brings me to the conclusion that it must has something to do which host is the owner node of the CSV Volume. And some further tests has confirmed my assumption. If you change the owner node of a CSV which is stored on clustered Storage Spaces the whole disk is moved from on node to the other and this seems to confuse SCVMM.
As a result SCVMM inserts an invalid value (00000000-0000-0000-0000-000000000000) for “HostDiskID”, for one of the hosts, in the “tbl_ADHC_HostDisk” table in the SCVMM DB.
tbl_ADHC_HostDisk

During the Live Migration pre checks, SCVMM runs a query to find a disk with the ID “00000000-0000-0000-0000-000000000000” in the DB which obviously does not exists. So the pre checks and Live Migration Job fails immediately with error 20404

Solution / Conclusion:

Update April 29, 2015: UR6 for SCMM 2012 R2 is now released and includes a fix for this issue. Yeah! 🙂

After opening a Microsoft Support case and posting the issue on connect.microsot.com I got the confirmation from Microsoft that this behavior is a bug in UR5. It will probably fixed in UR6 which is expect for April.
So my advice for everyone, which is using one of the above configuration together with SCVMM, is to stay on UR4 and wait for UR6. If it’s not already too late. 😉

 

How to change the RDP certificate on a RD Session Host

30 Jan

In Windows Server 2012 R2 RD Deployment you will install a certificate for the RD Connection Broker, RD Web Access and RD Gateway in the Deployment Properties using Server Manager. But this does not change the certificate on sessions hosts in the RD Deployment and you will still get certificate warnings when conntection to the Session Hosts.

To change the certificate on the Session Hosts manually do the following:

  1. Install the Certificate and the Private Key in the computer certificate store.
  2. Set the thumbprint of the installed certificate with PowerShell and WMI:

     

The process ist also documented in detail at: http://blogs.technet.com/b/askperf/archive/2014/05/28/listener-certificate-configurations-in-windows-server-2012-2012-r2.aspx

Open high ports (over 49151) on a Windows Server Gateway

19 Jan

In a cloud infrastructure with System Center Virtual Machine Manager (SCVMM), Hyper-V and Azure Pack the Windows Server Gateway could provide the tenants with the possibility to connect their virtual network, provided by Hyper-V Network Virtualization (HNV, NVGRE), with the Internet via NAT. The tenant has then also the possibility to open and forward inbound ports to his VMs. For example he can open Port 80 to run a Webserver which is public reachable over the internet.
Basically this works very well. But lately I had a situation where I had to forward TCP Port 60000. So I was going the Azure Pack Tenant Portal and was trying to add a new Network Rule like I did it several times before:
image

 

 

 

 

 

 

 

 

 

 

But then it happens. The operation failed with a strange error:image

Then I had a look in SCVMM and found this, not very instructive, error:
image

So I digged in a little deeper and discovered on the gateway VM that SCVMM adds the external address for the tenants with the port range 1-49151. So that’s explains why you can not forward Port over 49152 on multitenant Windows Server NAT Gateway:
image

Probably the SCVMM defines this port range for the external address because all ports above 49151 where per RFC6335 actually destined for dynamic ports or private ports. In Windows the this range is also specified for the dynamic client port range for outgoing connections. (RPC Communication and so forth)

Bonus, Possible solutions

Option 1, manual intervention with
PowerShell :
But the RRAS Role in Windows which is also used for multitenant NAT Gateway has no restriction which would hinder you define external address with the whole port range from 1-65535 with PowerShell. In fact when you set an external address with PowerShell the default values for PortStart and PortEnd is 1024 and 65535.
This means you can remove the external address set by scvmm and add the address again with PowerShell with the whole port range. This can achieved by the following steps:

  • Get all external IP Address with Get-NetNatExternalAddress and note the values form the parameter “NatName” and “IPAddress” from the definition which you want to change.
  • Remove the existing external Address definition:

    image
  • Add a New external Address with the same value for NatName and IPAddress but with the new port range:


    image

Afterward you can head again to the tenant portal and now you can add a Network Rule with a port greater that 49151 without any problem. Smiley

Option 2, Registry setting for SCVMM:
After some further research I found that SCVMM has a undocumented Registry setting where you can specify the end port for the external address definition on the NAT Gateway. By creating the following Registry Key SCVMM configures automatically the whole port range (1 to 65535) if a tenant activates the NAT Gateway for his VNet in the Tenant Portal.

image
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft System Center Virtual Machine Manager Server\Settings\NATPortRangeEnd, REG_DWORD, Value: 65535

Disclaimer: Use these settings at your own risk! These where NO official solutions from Microsoft and changing these settings will lead probably to a unsupported situation! So be careful! Zwinkerndes Smiley