Migration headaches

| 4 Comments
Today marked the day we cut over the largest volume we have, the Fac/Staff shared volume. That's 1.9TB of 'disorganized file data' (a.k.a. bog standard file-server) to migrate. This is the last of the major volumes to move, and this was done intentionally. Because of this, we have our system down. Unfortunately, a wrench was thrown into the works. But before I get to the wrench, a description of how we migrated this puppy from NetWare.
  1. At M-18 days, we performed an initial sync of the data via robocopy.
  2. At M-16 days, when the first sync completed (it took about 29 hours) we performed a delta-sync.
  3. At M-17 days we performed another delta sync, 24 hours after the previous, so we could get a feel for how long a daily 'copy the changed files' job would take.
  4. M-16 days, create a daily copy-job (robocopy source dest /mir /r:1 /xo /log:e:somewhere)
  5. M-14 days, we perform the rights migration, and open up the new share to everyone with sufficient rights to change permissions on the volume. Inform these people to fix broken rights on the Microsoft share.
  6. M-12 days, after feedback from the techs, release guidance for how to re-organize directories to better work with Microsoft permissions.
  7. M-12 to M-1 day, Technicians reorganize data and repermission as needed, with our assistance.
  8. M-12 hours, we do a delta sync
  9. Migration: Change login scripts, kick off terminal delta-sync to get net-change.
  10. M+2 hours, 8am arrives, script is done, we are done. Yay! Start working problems as reported.
The problem occurred between steps 8 and 9. One department decided that migration-night was the perfect time to reorganize over 150GB of data. They would have struggled to find a worse time for it. The result of this is that the terminal delta-sync in step 9 will end up taking far, far longer than the 2 hours budgeted.

The problem here is that when people start logging in at 8am, all of their data isn't there. There were some people who worked right up until the M-12 hour mark reorganizing data and were surprised when it wasn't on the new system yet. These people were alphabetically below the department that moved 150GB of data last night, so they hadn't been synced yet. So they're seeing and working with old files while the new ones copy in.

The worry for me is PST and MDB files that have a tendency to be open all day. The copy script will not be able to replace these open files, so they will in effect experience data-loss because of this department. There is not much we can do about that. We can troll through the log file for the files listed as failed-to-copy-due-to-lock and hand copy them afterwards, after clearing locks. In which case they'll lose whatever data they committed to these files during the morning. So these files? There WILL be data-loss, guaranteed.

The other problem we ran into is one department set up their rights to lock us godlike admins out of certain directories, something you can do on Microsoft filesystems since there is no equivalent to Novell's "Supervisor" trustee right. We didn't notice this until step 9 when the log-files filled up with 'access denied' errors, and the 30 second retry it causes, which further delayed execution of the terminal sync script. Obviously, those files will not get synced.

I hate hate hate it when this kind of thing happens.

4 Comments

it sure sounds like you must have been busy :-)

I just have one question: why did you not isolate the (old) file server (in a vlan or with firewall rules, for instance) so that your clients could not interfere with the migration?

Hello

Looking for the scripts that you used in this migration. Can you point me in the right direction.

Thank you