vCloud Director: vApp download issue

I could also have named this post “Why you should never purge the content of the staging folder of your cells” ūüôā

As I was troubleshooting a problem for a customer, I faced an annoying issue :
I was suddenly unable to download specific vApps and was always receiving the following error from vCloud “Invalid response from server”.


Very interesting and crystal clear message, isn’t it ?!
My best option to figure out what was going on was to browse vCloud logs.
vCloud-container-debug.log file gave precious information that helped me to understand my problem.

Look at this :

Resource file: descriptor.ovf(2b448314-daf6-46dc-b7f1-84bb205f35c6). Download failed. Unable to locate resource file | requestId=da57dd71-aded-47e4-8d9d-93a64a8cab95,request=GET https://CellIPAddress/transfer/2a3ee224-60a8-4f64-b99b-84afade8f3e9/descriptor.ovf

Translation:
The OVF descriptor of the vApp I wanted to download could not be found. vCloud was unable to know what to transfer to its client.

What is OVF descriptor ?
In a nutshell the OVF descriptor is a XML file that contains all necessary information about an OVF package (also used with OVA), its content (the VMs that make up the vApp) and how to download it. You can find more details about OVF format here.

Explaination:
I can guess a question rising in your head : Why vCloud is unable to find the files since we are simply trying to download an existing vApp template ? vCloud should already know the different templates it stores in its catalogs ! You’re right but …

Getting more and more confused, I started thinking about the last events and I remembered that some days before, I had manually purged the staging folder !
All the files were quite old (more than 2 weeks) and were supposed to be automatically removed. I thought – and I was wrong – that there was an issue with the cell.
What a big mistake !! Actually by manually deleting the content of the staging folder (/opt/vmware/vcloud-director/data/transfer) I accidentally broke the link between vCloud and its vApp download session.

At this time, I realized that I was missing something in the understanding of the download process and I decided to delve this particular topic.

This post will expose what I knew and learnt and also how VMware support team helped me to solve the problem. It will be articulated like that :

Overview of a vApp download
vApp download process
Cancelling a download
Problem resolution
Conclusion

Overview of a vApp download

When one downloads a vApp template two main steps are achieved :

1 – After clicking on “Download…” vCloud enables the vApp template for download.
Actually, enabling a vApp for download is not only changing a property from “False” to “True”, its also copying the vApp content in the staging folder. That’s why if you pay attention to the operation, you may feel that it is may be very long (depending on the size of the vApp).

2 – Once the enablement action is completed, the download from the client starts.

If you look at the picture below you will see that :

1. The vApp is being enabled for download in vCloud Director

2. In the same time vCenter is exporting the OVF template (the target folder is the staging folder !).

3. Nothing is happening in the browser transfer windows. It is totally normal. The transfer will process only once the OVF export will have been completed.


But behing this, several things are achieved : checks, db updates etc…

Behind the scenes

Let me give you some details about what really runs in background when a cell manages a download request.

I made the diagram below according to my understanding of the process. So feel free to tell me if something’s wrong.

Cancelling a download

Depending on when you cancel a download – during or after the enablement of the vApp – different exepected results are observed.

I drew another diagram to present them.

*About the transfer session time out value :
This value defines the period in course of which any interrupted transfer session can be resumed (without re-enabling the vApp).
Once the limit is reached, the data are deleted from the staging folder.

One can consider this value as the link between vCloud and a vApp download session (the famous I shouldn’t have broken).

You can find this value in the system settings of vCloud Director :


Problem resolution

In normal conditions an automatic cleanup of the DB should have been done in the DB but in my case, VMware support pointed out a time sync issue between the cells and the DB server. The cells were running 3 mns behind the DB server so the “CleanTransferSession” triggers could never be met.
VMware and I decided to first clean the DB and only after and for me, solve the time sync* issue.

To see the planned “CleanTransferSession” triggers, run the query :
select * from QRTZ_TRIGGERS where TRIGGER_NAME=‘GLOBAL_com.vmware.vcloud.transfer.server_cleanTransferSessionsTrigger’

To read the time value, use some timer converter website like https://www.epochconverter.com/

This manual DB cleanup relies on clearing records related to the download tasks plus – I think it is optional – the usual queries to clear QUARTZ and INV tables.

* This post will not show how to solve the time sync issue (ntpdate service misconfiguration here) but you can have a look at time keeping KBs :
For linux, timekeeping best practices are available here.
For winfows, the same there.

Clearing vApp download tasks

Of course, at this step run a backup of the DB before process with any records deletion !

Also notice that the procedure below is not supported by VMware and must not be followed without their support !

You first have to identify the records to delete. To do so, launch the query

select * from transfer_session

You will get something like this :

Then, for each transfer session (vApp download task), you must delete the relevant files.

Example:
select * from dbo.resource_file where spool_dir = ‘/opt/vmware/vcloud-director/data/transfer/2a3ee224-60a8-4f64-b99b-84afade8f3e9’ to confirm the content of a specific vApp and delete from transfer_session where transfer_session_id = 0x2A3EE22460A84F64B99B84AFADE8F3E9
delete * from dbo.resource_file and delete * from transfer_session can be run too of course but must be used with caution

You will have understood that dbo.resource_file.spool_dir = transfer_session.base_dir

Once all transfer sessions have been cleared, we can reset QUARTZ and INV tables.

As info or reminder, QUARTZ tables store information about vCloud processes and tasks and INV tables store data about vCenter inventory.
Among several reasons, we may have to clear them when vCloud objects status are not synced with their real status in vSphere.


Clearing QUARTZ and INV tables

Prior the resume, we have to stop the cell according. I deal with stopping vCloud in this post, under “Implement vCloud certificates” section.

QUARTZ tables

DELETE FROM QRTZ_SCHEDULER_STATE;
DELETE FROM QRTZ_FIRED_TRIGGERS;
DELETE FROM QRTZ_PAUSED_TRIGGER_GRPS;
DELETE FROM QRTZ_CALENDARS;
DELETE FROM QRTZ_TRIGGER_LISTENERS;
DELETE FROM QRTZ_BLOB_TRIGGERS;
DELETE FROM QRTZ_CRON_TRIGGERS;
DELETE FROM QRTZ_SIMPLE_TRIGGERS;
DELETE FROM QRTZ_TRIGGERS;
DELETE FROM QRTZ_JOB_LISTENERS;
DELETE FROM QRTZ_JOB_DETAILS;

INV tables

DELETE FROM compute_resource_inv;
DELETE FROM custom_field_manager_inv;

DELETE FROM cluster_compute_resource_inv;
DELETE FROM datacenter_inv;
DELETE FROM datacenter_network_inv;
DELETE FROM datastore_inv;
DELETE FROM datastore_profile_inv;
DELETE FROM dv_portgroup_inv;
DELETE FROM dv_switch_inv;
DELETE FROM folder_inv;
DELETE FROM managed_server_inv;
DELETE FROM managed_server_datastore_inv;
DELETE FROM managed_server_network_inv;
DELETE FROM network_inv;
DELETE FROM resource_pool_inv;
DELETE FROM storage_pod_inv;
DELETE FROM storage_profile_inv;
DELETE FROM task_inv;
DELETE FROM vm_inv;
DELETE FROM property_map;

CONCLUSION

What we can keep in mind :

1 – Make sure that every component of your environment are time synced.
2 – Do not delete any file in the staging folder if the session transfer time-out value has not been reached.
3 РIn case of remaining  folder, double check the transfer session time out value and the transfer_session table of the vCloud database before deleting anything in the staging folder.

At the time I am writing this post, I’m observing something weird : For some cancelled downloads and although there is no more transfer session record, transfer session folders still exist in the staging folder. For each of them, a partial vmdk file (20 MB) is visible and an error is logged in vCloud logs. I’ve opened a case at VMware for this. Meanwhile, I gonna check the vCenter logs too.

I’ll let you know asap.

PS :
I did not track it yet as I did for a vApp but I assume that the mecanism I described here are the same for another catalog item download. I will update later on this.

I will try to make another post for the upload process soon.

Fresh update regarding the remaining files after a download cancellation : VMware support confirm a bug and will try to fix it for version 9 of vCloud, expected by july (2017).

June 2nd, 2017 by