How are storage cloud companies handling the “first backup” issue? Multiple terabytes or petabytes that need to be migrated to the cloud initially?
The incremental part of the process is a no-brainer.
One solution that we use at Bycast is to deploy two or more edge servers with attached storage at the customer's premise, and allow them to perform bulk ingest over the REST API (or via CIFS/NFS). When the ingest is complete or nearly complete, the object storage repository on disk can be physically shipped to be integrated into the "cloud", and subsequent transactions can be performed over the network.
Another approach taken by EMC's Mozy backup service is what they call "data seeding". Here, you purchase a 2 TB USB drive from EMC, store your data onto the drive, then ship it back to EMC in order to get the process going. I couldn't find any current references of this capability on their web site, so this may no longer be a supported feature.
With such hybrid models, you need the software intelligence to ensure that the data is always accessible via the cloud API, always protected from loss, corruption and unauthorized disclosure, and to ensure that the data is audited to ensure that you know that all data that you ingested locally made it into the cloud.
Despite the associated complexity, this is a very powerful approach, as one should never underestimate the bandwidth of a 747 full of disks or tapes.