Just to keep in mind an overview of the load balancing of vCloud Director.
vCNS load balancer :
Netscaler load balancer :
It’s been a long time since my last post.
Here I come with a new issue in vCloud Director Service Provider (8.10 but I think it’s applicable to more recent and previous versions).
Our platform consists of vCenter 6.0U2 (external PSC , tiny model), NSX 6.2.5 and vCloud Director SP 8.10.1.
A few days ago I faced a weird issue while I was trying to import a vApp template in my organization catalog :
“[xxxxx] Folder xxxx does not exist in our inventory, but vCenter Server claims that is does.”
Meanwhile, I was able to upload media files in the same catalog.
I observed the same results after running the same tests on all our catalogs and organization.
How the problem has been solved ?
I first thought it was due to a vCloud inventory issue. I forced a synchronization with vCenter without any improvement. I even cleared the INV tables in the vCloud database. Same result, still unable to upload vApp or OVF in the catalogs.
VMware support pointed out the issue without even requiring the support bundles.
This was actually a vCenter issue and not a vCloud one. Our vCenter was suffering of low memory.
vCenter did what vCloud asked but took too much time to inform to update the cell. This is the explanation given by VMware. A simple reboot should solve the issue.
To confirm the RAM problem, i simply ran the command “free -m” on our vCenter appliance, the output showed that the swap partition was heavily used, almost entirely, more than 20 GB. I do not mention the RAM on purpose because it almost always consumes around 8 GB.
In this case, swapping very likely means that the vCenter has memory leaks…
A simple reboot could have freed the memory and flushed the swap partition. I think so, however I decided to add some more RAM and adjust the VM to the small model. This, because our platform also backs vRA and a lot of other components that interact with vCenter.
After the reboot my upload issue was solved !
Good to know !
Notice that as of the version 6 of vcsa, it is no more required to manually adjust the RAM dedicated to the JVM. The JVM memory is dynamically adjusted.
The famous William Lam (blog VirtuallyGhettto) talks about that in this post.
Disk sizing upgrade
Moreover, no need to manually resize all the file systems. A script checks the disks and volumes and resizes them automatically at the boot of the appliance. If the resizing occurs while the appliance is already running a simple command line does the job. W. Lam explains that too here.
If you want to get more information about the VCSA partitioning you can check this KB.
Here, a reminder for the different VCSA sizing models.
I could also have named this post “Why you should never purge the content of the staging folder of your cells” 🙂
As I was troubleshooting a problem for a customer, I faced an annoying issue :
I was suddenly unable to download specific vApps and was always receiving the following error from vCloud “Invalid response from server”.
Very interesting and crystal clear message, isn’t it ?!
My best option to figure out what was going on was to browse vCloud logs.
vCloud-container-debug.log file gave precious information that helped me to understand my problem.
Look at this :
|Resource file: descriptor.ovf(2b448314-daf6-46dc-b7f1-84bb205f35c6). Download failed. Unable to locate resource file | requestId=da57dd71-aded-47e4-8d9d-93a64a8cab95,request=GET https://CellIPAddress/transfer/2a3ee224-60a8-4f64-b99b-84afade8f3e9/descriptor.ovf|
The OVF descriptor of the vApp I wanted to download could not be found. vCloud was unable to know what to transfer to its client.
|What is OVF descriptor ?
In a nutshell the OVF descriptor is a XML file that contains all necessary information about an OVF package (also used with OVA), its content (the VMs that make up the vApp) and how to download it. You can find more details about OVF format here.
I can guess a question rising in your head : Why vCloud is unable to find the files since we are simply trying to download an existing vApp template ? vCloud should already know the different templates it stores in its catalogs ! You’re right but …
Getting more and more confused, I started thinking about the last events and I remembered that some days before, I had manually purged the staging folder !
All the files were quite old (more than 2 weeks) and were supposed to be automatically removed. I thought – and I was wrong – that there was an issue with the cell.
What a big mistake !! Actually by manually deleting the content of the staging folder (/opt/vmware/vcloud-director/data/transfer) I accidentally broke the link between vCloud and its vApp download session.
At this time, I realized that I was missing something in the understanding of the download process and I decided to delve this particular topic.
This post will expose what I knew and learnt and also how VMware support team helped me to solve the problem. It will be articulated like that :
When one downloads a vApp template two main steps are achieved :
1 – After clicking on “Download…” vCloud enables the vApp template for download.
Actually, enabling a vApp for download is not only changing a property from “False” to “True”, its also copying the vApp content in the staging folder. That’s why if you pay attention to the operation, you may feel that it is may be very long (depending on the size of the vApp).
2 – Once the enablement action is completed, the download from the client starts.
If you look at the picture below you will see that :
1. The vApp is being enabled for download in vCloud Director
2. In the same time vCenter is exporting the OVF template (the target folder is the staging folder !).
3. Nothing is happening in the browser transfer windows. It is totally normal. The transfer will process only once the OVF export will have been completed.
But behing this, several things are achieved : checks, db updates etc…
Let me give you some details about what really runs in background when a cell manages a download request.
I made the diagram below according to my understanding of the process. So feel free to tell me if something’s wrong.
Depending on when you cancel a download – during or after the enablement of the vApp – different exepected results are observed.
I drew another diagram to present them.
|*About the transfer session time out value :
This value defines the period in course of which any interrupted transfer session can be resumed (without re-enabling the vApp).
Once the limit is reached, the data are deleted from the staging folder.
One can consider this value as the link between vCloud and a vApp download session (the famous I shouldn’t have broken).
You can find this value in the system settings of vCloud Director :
In normal conditions an automatic cleanup of the DB should have been done in the DB but in my case, VMware support pointed out a time sync issue between the cells and the DB server. The cells were running 3 mns behind the DB server so the “CleanTransferSession” triggers could never be met.
VMware and I decided to first clean the DB and only after and for me, solve the time sync* issue.
|To see the planned “CleanTransferSession” triggers, run the query :
select * from QRTZ_TRIGGERS where TRIGGER_NAME=‘GLOBAL_com.vmware.vcloud.transfer.server_cleanTransferSessionsTrigger’
To read the time value, use some timer converter website like https://www.epochconverter.com/
This manual DB cleanup relies on clearing records related to the download tasks plus – I think it is optional – the usual queries to clear QUARTZ and INV tables.
* This post will not show how to solve the time sync issue (ntpdate service misconfiguration here) but you can have a look at time keeping KBs :
For linux, timekeeping best practices are available here.
For winfows, the same there.
Clearing vApp download tasks
Of course, at this step run a backup of the DB before process with any records deletion !
Also notice that the procedure below is not supported by VMware and must not be followed without their support !
You first have to identify the records to delete. To do so, launch the query
select * from transfer_session
You will get something like this :
Then, for each transfer session (vApp download task), you must delete the relevant files.
select * from dbo.resource_file where spool_dir = ‘/opt/vmware/vcloud-director/data/transfer/2a3ee224-60a8-4f64-b99b-84afade8f3e9’ to confirm the content of a specific vApp and delete from transfer_session where transfer_session_id = 0x2A3EE22460A84F64B99B84AFADE8F3E9
delete * from dbo.resource_file and delete * from transfer_session can be run too of course but must be used with caution
You will have understood that dbo.resource_file.spool_dir = transfer_session.base_dir
Once all transfer sessions have been cleared, we can reset QUARTZ and INV tables.
|As info or reminder, QUARTZ tables store information about vCloud processes and tasks and INV tables store data about vCenter inventory.
Among several reasons, we may have to clear them when vCloud objects status are not synced with their real status in vSphere.
Clearing QUARTZ and INV tables
Prior the resume, we have to stop the cell according. I deal with stopping vCloud in this post, under “Implement vCloud certificates” section.
|DELETE FROM QRTZ_SCHEDULER_STATE;
DELETE FROM QRTZ_FIRED_TRIGGERS;
DELETE FROM QRTZ_PAUSED_TRIGGER_GRPS;
DELETE FROM QRTZ_CALENDARS;
DELETE FROM QRTZ_TRIGGER_LISTENERS;
DELETE FROM QRTZ_BLOB_TRIGGERS;
DELETE FROM QRTZ_CRON_TRIGGERS;
DELETE FROM QRTZ_SIMPLE_TRIGGERS;
DELETE FROM QRTZ_TRIGGERS;
DELETE FROM QRTZ_JOB_LISTENERS;
DELETE FROM QRTZ_JOB_DETAILS;
|DELETE FROM compute_resource_inv;
DELETE FROM custom_field_manager_inv;
DELETE FROM cluster_compute_resource_inv;
DELETE FROM datacenter_inv;
DELETE FROM datacenter_network_inv;
DELETE FROM datastore_inv;
DELETE FROM datastore_profile_inv;
DELETE FROM dv_portgroup_inv;
DELETE FROM dv_switch_inv;
DELETE FROM folder_inv;
DELETE FROM managed_server_inv;
DELETE FROM managed_server_datastore_inv;
DELETE FROM managed_server_network_inv;
DELETE FROM network_inv;
DELETE FROM resource_pool_inv;
DELETE FROM storage_pod_inv;
DELETE FROM storage_profile_inv;
DELETE FROM task_inv;
DELETE FROM vm_inv;
DELETE FROM property_map;
What we can keep in mind :
1 – Make sure that every component of your environment are time synced.
2 – Do not delete any file in the staging folder if the session transfer time-out value has not been reached.
3 – In case of remaining folder, double check the transfer session time out value and the transfer_session table of the vCloud database before deleting anything in the staging folder.
At the time I am writing this post, I’m observing something weird : For some cancelled downloads and although there is no more transfer session record, transfer session folders still exist in the staging folder. For each of them, a partial vmdk file (20 MB) is visible and an error is logged in vCloud logs. I’ve opened a case at VMware for this. Meanwhile, I gonna check the vCenter logs too.
I’ll let you know asap.
I did not track it yet as I did for a vApp but I assume that the mecanism I described here are the same for another catalog item download. I will update later on this.
I will try to make another post for the upload process soon.
Fresh update regarding the remaining files after a download cancellation : VMware support confirm a bug and will try to fix it for version 9 of vCloud, expected by july (2017).
I assume at this step that you already know the vCloud component architecture.
This post aims at describing the different steps to follow when it comes to replace vCloud Director SSL certificates and configure the load Balancer (Netscaler in this post).
We’ll see how to :
CREATE THE PRIVATE KEY
Open a SSH session as the root user on the first cell and go to /opt/vmware/vcloud-director/jre/bin.
Then, run the following two commands to generate respectively http and consoleproxy private keys. Enter a name for a new keystore so you can revert (do a rollback) in case of any failure during the certificates replacement process
|./keytool -keystore keystoreName.ks -storetype JCEKS –storepass password -genkey -keyalg RSA -keysize 2048 -alias http
./keytool -keystore keystoreName.ks -storetype JCEKS –storepass password -genkey -keyalg RSA -keysize 2048 -alias consoleproxy
|Please notice that the aliases cannot be customized and must be http for the web portal service and consoleproxy for the VMRC console proxy service.|
You will be prompted and will have to provide information to build the DN of the certificate.
At the question “What is your first and last name?“, type the FQDN of either the cell or the load balancer virtual server.
For exemple vcloudlb.domain.com for http service and vcloudcplb.domain.com for consoleproxy service.
It is time now to create the certificate signing request for each service. As vCloud Services will be load balanced, the certificates must be done for the public adresses of vCloud Director.
Still from the same SSH session and the same folder, enter the following commands to start the CSR creation wizard :
|./keytool -keystore keystoreName.ks* -storetype JCEKS –storepass password -certreq -alias http -file outputFile.csr|
|./keytool -keystore keystoreName.ks* -storetype JCEKS –storepass password -certreq -alias consoleproxy -file outputFile.csr|
|*Both http and consoleproxy certificates must share the same keystore !|
Once both web and console proxy services CSR have been done, we have to transmit them to the team in charge of issuing the certificates.
The certificates have been delivered. Before importing them in vCloud, we should check that they are correct.
Ensure that in the Subject propertie, the Common Name (CN) value is the public FQDN of the service. For exemple vcloudlb.domain.com for http service.
Once checked, they must be imported according the following order :
Root => Intermediate (if any) => http and consoleproxy (for the last two, the order doesn’t matter)
|./keytool –storetype JCEKS –storepass password –keystore keystoreName.ks –import –alias root –file RootCertificate.cer
./keytool –storetype JCEKS –storepass password –keystore keystoreName.ks –import –alias intermediate –file IntermediateCertificate.cer
./keytool –storetype JCEKS –storepass password –keystore keystoreName.ks –import –alias http –file httpCertificate.cer
./keytool –storetype JCEKS –storepass password –keystore keystoreName.ks –import -alias consoleproxy –file consoleproxyCertificate.cer
We can now list the content of the keystore and verify the registered certificates
|./keytool –storetype JCEKS –storepass password –keystore keystoreName.ks –list|
Last step, reconfigure vCloud application to take in account the new certificates
Stop the application
If they have been properly issued, we have to gracefully stop the vcloud services on the first cell. To do so, go to folder /opt/vmware/vcloud-director/bin and run the following commands :
Register the new certificates in the application
Launch the configuration tool : /opt/vmware/vcloud-director/bin/configure*
Give the path of the new keystore and the relevant password
Once the database updated, you are prompted to start the service, accept.
Copy the new keystore on every other cells and repeat steps “Stop the application” and “Register the new certificates in the application”.
|!! For security reason, it is recommended to move the keystore file in a secured place (out of the cells).
Moving the keystore doesn’t affect the cell behaviour. During a “configure” operation, a binary file is created for each certificate (saved as certificates and proxycertificates files) in folder /opt/vmware/vcloud-director/etc.
|* For some unknown reason, sometimes the path for the keystore is not prompted. When this occurs, just :
Edit the file /opt/vmware/vcloud-director/etc/global.properties and comment the lines related to the existing keystore path and password :
user.keystore.path = xxxxx
user.keystore.password = xxxxx
Relaunch the configure tool.
In order to implement load balancing of vCloud Director on Nestcaler we have to :
– Export the private key of each certificate.
– Get a copy of the root and if any, the intermediate certificate.
From a SSH session (as a root user) on one cellule, go to the folder /opt/vmware/vcloud-director/jre/bin
Run the command
|./keytool -v -importkeystore -srcstoretype JCEKS -srckeystore /App/install/backup/NewSignedCertificates.ks -srcalias http -destkeystore /App/install/NewSignedCertificates.p12 -deststoretype PKCS12|
This command will export the certificate of the web portal service and its private key
You will have to provide a password for the new keystore (the P12 one that will be generated)and the password of the existing keystore ( the JCEKS one) As the monitoring of the console proxy is done on the TCP socket, we do not have to export de consoleproxy private key and certificate.
We can now verify the content of the p12 keystore with the command
We can see that the private key is available !
Next step :
Import the certificate and its private key in the Netscaler keystore.
From your Netscaler interface (default credential are nsroot/nsroot at least til version 188.8.131.52, the one i used for this post)
First check that the CA Root certificate is installed in Netscaler.
=> Go to Traffic Management >> SSL >> Certificates >> CA Certificates
Then, add you server certificate
=> Go to Traffic Management >> SSL >> Certificates >> Server Certificates
=> Click on Install button
Type an explicite name for the certificate
Locate your certificate and provide the password of your keystore.
You can now install the certificate
Your certificate is now available in Netscaler and can be added to a virtual server.
Next Step :
Bind the new certificate to the vCloud HTTPS* virtual server.
*You might be confused because the alias of the certificate is “http” but just keep in mind that HTTP requests are redirected to the HTTPS port of the cell.
Once the vCloud http certificate has been installed in Netscaler, we have to link it to the vCloud HTTPS virtual server.
A procedure to implement vCloud load balancing in Netscaler is available here.
=> Go to Traffic Management >> Load Balancing >> Virtual Servers
Select your vCloud Virtual Server
Look at the state : Red because the virtual server does not have any server certificate
In the Certificate section, click on No at the line “No Server Certificate” so you can add the new certificate.
Then select the appropriate certificate and click on the Select button (no need to click on install as we’ve already installed it just before)
Finally, bind it.
Notice at this step that if you have not already bound the CA certificate you will have to add it too. (just click on No at the No CA Certificate)
Once the bind has been done, we can validate the virtual server modification by clicking on “Done” button at the bottom of the page.
You can now notice that your Virtual Server is now displaying a green status
Let’s now try to connect to vCloud using the virtual server FQDN
If you have a security alert, just ensure that the Root CA certificate id installed on your computer and that you typed the right URL (must match with the subject (or common name) of the certificate)
Below an example of a typical configuration. Make sure to respect the format of the information you provide.