I work as VMware TAM (Technical Account Manager) and one my customer had recently significant incident when clients (vSphere admins) was not able connect to vCenter server. It did not work nighter from old C# client nor new Web Client. It was interesting that sometimes some admins were able to connect and stay connected but others where not able to connect.
As you can see, both error messages are very general and further holistic troubleshooting was necessary. After multiple theories, one customer's vSphere/Windows administrator did a Windows OS analysis with Windows perfmon tool and realized that during the incident there were more then 1400 open threads with client connections to vCenter server. This turned in to the hypothesis that we have reached the maximum of client sessions vCenter can accept.
The hypothesis is always very important but even more important is the proof that hypothesis is valid and it is the root cause of particular issue.
Unfortunately, the maximum of total client sessions to vCenter server is not documented. The only numbers documented in "Configuration Maximums - vSphere 5.5" are ..
My customer is still on vCenter 5.5 but I have prepared and executed the test in my home lab where I have vCenter 6.0 U2. I prepared PowerCLI script to create 2000 new client sessions and keep sessions open. The purpose of script is to find the maximum of established sessions vCenter can accept and see what will be the error message when maximum will be achieved.
The PowerCLI script is available on GitHub here
https://github.com/davidpasek/powercli-scripts/blob/master/vcenter-sessions.ps1
and it is based on excellent blog post and scripts "List and Disconnect vCenter Sessions" prepared by Alan Renouf.
I run the script in my lab and waited when it fails to find the maximum. You can see the expected failure on screenshot below ...
And the result is ...
Business impact and visibility
It is good to mention that this technical issue was observed during Disaster Recovery fail-over test and it silently disappeared after fail back of all services. That's the reason why this incident had very high internal business visibility and the issue was escalated to top IT management which required very quick Root Cause Analysis and proper problem management.
That's just another proof how vCenter and vSphere platform is critical in modern IT environments.
It seems, that my customer is using some automation script which establish connection to vCenter server, but because of some circumstances which happening only when services are running on disaster recovery backup site, the script does not disconnect sessions and the vCenter server maximum is exceeded and it does not accept any new connections. In such situation, vSphere platform is unmanageable.
This is good to know, especially in the age of automation, where single badly written automation script, can crash vSphere manageability.
As VMware TAM, I can communicate and justify my customer's product feature requests internally inside VMware organization. That's another benefit of VMware TAM Program.
So here is publicly written vCenter Product Feature Request which I will open with our Product Management.
Feature Request: Maximum of supported client sessions should be documented in "vSphere Configuration Maximums" document. When the maximum is exceeded, vCenter server should accept at least one more connection for vSphere Administrator (for example administrator@vsphere.local) which should be used as last resort or back door if you wish. Such special "back door" connection should be terminated and re-established by the most recent connection of vSphere Admin to allow manageability in such situation.
The error message was very general saying ...
Call "ServiceInstance.RetrieveContent" for object "ServiceInstance" on Server "vc01.example.com" failed.C# Client returned another further explanation ...
The server 'vc01.example.com' could not interpret the client's request. (The remote server returned an error: (503) Server Unavailable.)See error messages in screenshot below ...
C# Client error messages |
The hypothesis is always very important but even more important is the proof that hypothesis is valid and it is the root cause of particular issue.
Unfortunately, the maximum of total client sessions to vCenter server is not documented. The only numbers documented in "Configuration Maximums - vSphere 5.5" are ..
Concurrent vSphere Client connections to vCenter Server = 100Concurrent vSphere Web Clients connections to vCenter Server = 180However, my customer is using automation extensively, therefore PowerCLI can have additional connections. The only way how to know the maximum is to test it.
My customer is still on vCenter 5.5 but I have prepared and executed the test in my home lab where I have vCenter 6.0 U2. I prepared PowerCLI script to create 2000 new client sessions and keep sessions open. The purpose of script is to find the maximum of established sessions vCenter can accept and see what will be the error message when maximum will be achieved.
The PowerCLI script is available on GitHub here
https://github.com/davidpasek/powercli-scripts/blob/master/vcenter-sessions.ps1
and it is based on excellent blog post and scripts "List and Disconnect vCenter Sessions" prepared by Alan Renouf.
I run the script in my lab and waited when it fails to find the maximum. You can see the expected failure on screenshot below ...
Expected connection failure to find what is the maximum |
vCenter Server 6.0 U2 accepts maximally 1995 established client sessionsWhen the above maximum is exceeded you are not able to connect to vCenter server any more and you will see the error messages mentioned at the beginning of this article.
Business impact and visibility
It is good to mention that this technical issue was observed during Disaster Recovery fail-over test and it silently disappeared after fail back of all services. That's the reason why this incident had very high internal business visibility and the issue was escalated to top IT management which required very quick Root Cause Analysis and proper problem management.
That's just another proof how vCenter and vSphere platform is critical in modern IT environments.
It seems, that my customer is using some automation script which establish connection to vCenter server, but because of some circumstances which happening only when services are running on disaster recovery backup site, the script does not disconnect sessions and the vCenter server maximum is exceeded and it does not accept any new connections. In such situation, vSphere platform is unmanageable.
This is good to know, especially in the age of automation, where single badly written automation script, can crash vSphere manageability.
As VMware TAM, I can communicate and justify my customer's product feature requests internally inside VMware organization. That's another benefit of VMware TAM Program.
So here is publicly written vCenter Product Feature Request which I will open with our Product Management.
Feature Request: Maximum of supported client sessions should be documented in "vSphere Configuration Maximums" document. When the maximum is exceeded, vCenter server should accept at least one more connection for vSphere Administrator (for example administrator@vsphere.local) which should be used as last resort or back door if you wish. Such special "back door" connection should be terminated and re-established by the most recent connection of vSphere Admin to allow manageability in such situation.
No comments:
Post a Comment