It wouldn’t be accurate to say “you can’t”, but Microsoft doesn’t make it easy.
Whether you’re moving mailboxes or PST data to Office 365, your imports are throttled; that is, Microsoft imposes a limit on how fast you can move information into their data centers. The exact speed of your import process will vary according to a variety of factors, including what protocol (IMAP4, MAPI, or EWS) you’re using, what migration tool you’re using, and how many concurrent threads it can spin up, how busy the data center you’re importing into is, and the mix of item sizes in the mailboxes or PSTs you’re importing.
The problem with this throttling is that it’s largely opaque. Although Microsoft publishes “observed data,” my own observations have shown that migration throughput can vary widely based on these factors and a bunch of others besides, possibly including the phase of the moon and whether you have recently said anything disparaging about Microsoft anywhere on the Internet.
Recently I had a customer who wanted to migrate 30TB of PST data to Exchange Online Personal Archives. While this might sound ridiculous, it makes perfect sense given that Office 365 E4 plans include an unlimited-size Personal Archive for each mailbox. That’s a hard deal to beat… if you can figure out how to get the data in. At one point, in a fit of frustration we asked Microsoft whether we could just send them a bunch of disk drives containing the PSTs. “Of course not,” they said (with “silly boy” being the unspoken coda to that phrase). But it turns out that Azure is now providing bulk import of data by sending disks to them: the Windows Azure Import/Export Service is now in preview. With any luck, we’ll see a similar service from Office 365 in the not-too-distant future. And when it happens, remember, Andy Tanenbaum had the idea first.
3 responses to “The future of importing large quantities of Exchange data to Office 365?”
This is becoming a real issue at every client moving to Office 365. The performance observed moving mail data is awful. If you call Microsoft they will always tell you that the issue resides in the client network. It is very frustrating when you know they are wrong or lying to you. I have heard of a customer doing the same migration as you are working on, moving archive data to Office 365, and they were so disappointed in the time it took to migrate that they backed out and cancelled the entire project. Microsoft called to ask what happened with this customer and we can only throw our hands up and say it was Microsoft’s fault.
I have seen data transfer improve at one client by throwing a lot more memory at the hybrid server. Once the memory was increased to 24GB we saw a lot higher transfer rate. But that was only at one client and throwing more hardware at the hybrid server at other locations has not helped.
Personally I believe that Microsoft is still migrating data from Wave 14 to Wave 15 servers and we are dealing with that contention. They will not admit to it but I think that may be the case. The frustrating part is one of their primary cloud principles is that you can not audit or monitor Office 365 to see where the bottleneck lies. So we get to keep banging our heads against the wall until they get this straightened out.
“Your mileage may vary” seems to be the party line from the Office 365 team. It’s fair to say that there are a lot of factors that affect throughput, so I can see that at site A making a hardware change might help whereas at site B, where HW capacity isn’t the bottleneck, it doesn’t. I’d like to see better estimating and progress reporting tools from Microsoft, so that I have some way to tell customers roughly how long *Microsoft* thinks it will take to ingest a given volume of data– especially once it’s started. This is going to be a hot topic of discussion at the MVP summit and at MEC, I suspect.
There are ALOT of variables that affect the transfer rate but you can make sure everything is configured right and it is still slow as heck. It is just painful how slow it can be. The recommendation is also to add more CAS servers to increase performance but again this doesn’t always help. Better estimating and reporting tools as you mentioned would help a lot with setting expectations instead of waiting and wondering.