Know your I/O: Putting it together, Exchange upgrade

| 2 Comments

It has come time to upgrade the email system to Exchange 2010. What's more, this time you have to build in email archiving. Rumors of 'unlimited email quota' have already leaked to the userbase, who have started sending you and your team candy and salty-snacks to urge you to get it done faster. Clearly, the existing mail-quota regime is not loved. The email team already has figured out what they want for software and server count, they just want your opinion for how to manage the storage. Meetings are held, email is sent around. A picture emerges.

  • Current email data is 400GB, with a fairly consistent but shallow growth curve.

  • Number of users is 4500, which makes it around 91MB per user.

  • The 'free' webmails all have quotas measurable in gigs.

  • Long-standing Helpdesk policy for handling email-retentive users is to convince them that PST files do the same thing.

  • This is email. Any outage is noticed.

  • Depending on time of the business cycle, internet-sourced mail-volume comes to between 2 and 6 GB a day. It is unknown what internal-generated mail-volume comes to.

  • The Email Archiving product is integrated into Exchange. Mail older than a system defined threshold is migrated to the Archive system and a pointer left in the mailbox. Any deleted mail is migrated to Archive immediately.

  • Bitter experience has shown that recovering whole mail servers can take multiple days when it has to be done from tape backup. Hours is the new target.

  • New Helpdesk policy will be to convince people that PST files aren't trustworthy and to keep all their email they want to save inside their backed-up mailboxes.

And now for the analysis.

  • Read/Write Percentage: I/O monitoring on the existing Exchange 2007 system suggests a 70/30 ratio. The log-files are 100% writes of course. On the Archiving side, it is predicted to be 20/80 ratio, as very little old email is actually accessed.

  • Average and Peak I/O: Peak I/O happens during the full database backups, and dwarfs Average I/O by a factor of 10. On the archiving side, the factor is even higher.

  • I/O Access Type: Highly random, and typically pretty small individual accesses. Queuing volumes are very highly transactional. Log files are constantly updated. On the Archiving side, significantly random but mostly writes.

  • Latency Sensitivity: Significant. Outlook Cached-Mode shields users from quite a lot, but they do notice when it takes longer than 15 seconds to send an email to someone across the cube-wall. As Exchange is DB backed, slowdowns in the transaction logs slows down the entire system so those are very highly latency sensitive. On the Archiving side, reads via Outlook need to be fast and are proxied through the Exchange system itself.

  • Storage Failure Handling: The latency tolerance of the transaction-logs suggest that the system has low tolerance for failure-induced slow-downs. The mailbox databases themselves have a higher tolerance but not overly so. On the Archiving side, as the amount of read access to that system is predicted to be much smaller than read of the 'online' email system, tolerance for slowdowns are higher.

  • Size and Growth: PRIMED FOR EXPLOSIVE GROWTH. Email growth has been repressed through draconian email quotas, which are now being removed. Users are used to GB-sized mailboxes on their private email systems. Some mailbox DB space will be liberated when the new Archiving system comes online and removes the 6+ month old email. Plan on no email being deleted for 6 months. 180 days, 3GB/day for internet-sourced email, call it 6GB/day for internal-sourced email, and you have 1.6TB for just your online mailbox databases and constantly growing as the average email size increases. The Archive system would grow 3.2TB a year for the first year.

The three main attributes driving the storage system are: size of the entire system, latency tolerance, and disaster-recovery engineering. The average I/O and I/O access types of the online system strongly, strongly suggests 15K SAS drives for usage. On the Archive side, the transaction logs should be on 15K SAS, but the data volumes could survive on 7.2K SAS.

The existing storage infrastructure includes several SAN-based storage arrays. The existing email system has been on the fastest one (FC-based, not SAS) and never suffered a fault. Analyzing the usage of the existing FC array shows plenty of head-room in controller CPU and disk queue lengths. 2TB of space will be needed on this system if it will house the online Exchange mailbox databases. RAID5 is sufficient, and rebuilds have not affected I/O performance on this system so far.

Another array containing a mix of 7.2K SATA and smaller 7.2K SAS drives also exists. The reliability of the SAS drives meets the reliability demands of the application, and that's what the Email Admins want to use. However, they'll need 6TB of it to start with and the ability to add more as mail grows ever larger. Analysis shows that existing controller CPU demands are minimal, but disk queue lengths are showing signs of periodic saturation.

Exchange has some disaster-recovery mechanisms built into it, which the Email Admins opt to use instead of array-based mirroring. This will require mirroring the online database in a remote site. This remote site has a single storage array populated with 7.2K SATA drives already showing signs of regular saturation, and performance tanks when doing a rebuild..

The existing Fibre Channel Drive based storage array has enough room to handle the new online mail system. The SAS/SATA one will require the purchase of new 7.2K SAS disks to dedicate to the Archive system. The controller on this second system has enough horsepower to drive the added disks, and should not run in to I/O contention with the already busy disks. The DR site will require the purchase of a brand new disk array, 7.2K SAS disks being the most probable choice.

The Archive systems will have their Transaction Log volumes on the FC SAN, and their data volumes on the SAS SAN. The Online system will have both transaction and data volumes on the FC SAN. The DR system will use periodic log-shipping, and keep both volume types on the local SAS disks.


Know your I/O: Access Patterns

Know your I/O: The Components

Know your I/O: The Technology

Know your I/O: Caching

Know your I/O: Putting it together, Blackboard

Know your I/O: Putting it together, Exchange 2007 Upgrade

2 Comments

You should check the Exchange 2010 SP1 information just released:
http://msexchangeteam.com/archive/2010/04/07/454533.aspx

There are some very nice changes to archiving that you may be interested in.