October 2006 Archives

Spam numbers

| 3 Comments
The following came out in the "Academic Technology News" yesterday:
To put the new spam filtering in perspective, consider the following: WWU errs on the side of caution to ensure that we do not filter any legitimate email; even with this 'cautionary' configuration more than 80% of all inbound email to campus is filtered out of our email system as known spam, compared to around 65% with our previous solution. In terms of numbers, that means that the staggering number of 1.3 million spam emails are filtered from your incoming mail each day.

Number of emails received: 1.6 million
Number of messages filtered: 1.3 million
Number of messages delivered: 0.3 million
There you have it. 1.6 million messages a day! Our Exchange system has around 6000 email accounts.


Tags:

Shaking the cane at those darned kids

Looks like IE7 has the same sort algoritm for the URL-bar drop-down that Firefox does. I've kvetched about this before, and is one of the main reasons I use SeaMoneky for my primary browser. If Opera converts to an age-based sort versus the most-recently-used sort that's been standard for over a decade, I may have to grit my teeth, shake my cane, and get used to it.

*sigh*

Changes to the Novell Rewards program

I just got a mail from the company that manages CoolSolutions. They're changing how they handle the points you earn for submitting articles, tips, and tools. Quote:
Beginning December 31, 2006, points will expire after one year of account inactivity. Point balances of members who have had no activity within the last calendar year (2006) will expire on Dec. 31, 2006.

Points will not expire as long as you participate in one of the following activities at least once a year:

- Earn points by participating in any of the opportunities listed at Novell Rewards
- Redeem points by requesting an award at http://www.novell.com/company/rewards

If you have any questions, please feel free to e-mail us at rewards@webwiseone.com
So. Looks like they're trying to clean up a bit. People who haven't submitted anything recently but are high contributers will get edged out. There are a few in the top 20. Me? My last submission was an AppNote in August, so I'm in good shape.

One more AppNote, and I can get that 80GB iPod. Mmmm. Or maybe those Bose QuietComfort 3 headphones.

Tags: ,

More brainshare possibilities

The openSuse roadmap says that openSUSE will release in early December. That's enough time for SLES 10.2 to release on or around BrainShare.

So. Two big topics:
  • Zen: The Next Generation
  • SLES 10.2 or possibly, SLES 10 sp1?
Tags: ,
Anandtech ran an article recently about enterprise storage. In it they go over SATA vs. SCSI vs. SAS. Most of it I already knew, but towards the back was a kernel of information that I hadn't caught before.

We know that generally speaking SATA drives can't quite keep up to the same kind of workloads that SCSI can. Differences in the manufacturing process, quality control, and the like. I don't fully understand it, which irks me, but there it is. One of those areas is something called 'nonrecoverable read error' rate.

Take a look at this Seagate drive. It's almost the last thing on the spec page. The Nonrecoverable Read Error rate is 1 bit in 1014 bits, or 1 bad read in 12.5TB. Mainline SCSI and FC drives have that error rate as high as 10 to the 15th or 16th. Every 12.5TB of data transferred includes a corrupted bit.

We don't see this as a problem in most enterprise situations because they all run in some form of redundant array setup. RAID5 drivers, usually in the RAID controller, see the bad bit and go to the parity data to fill in the real value. RAID1 drivers go to the mirror. No biggie. The problem comes with RAID5 rebuilds, when the entire array is read in order to generate the parity data. If you have 14 500GB drives in your RAID5 array, that means during a rebuild you transfer around 7TB of data. If a bad bit shows up during the rebuild process, a 56% chance, game over. That's a from-tape rebuild.

This is why systems such as RAID6 are showing up. That's a double parity system, so rebuilding one bad disk does not risk the whole array if a nonrecoverable read error occurs. You lose two disks to parity, but you can still have a 30-disk array without much risk.

One more reason why SATA isn't quite ready for realtime data applications. Nearline, yes, but not realtime. This'll play hob with our ideas for our BCC cluster.

Tags:

MSA, mirroring, and the NetWare cluster

Mirroring WUF is probably the easist thing we'll do, once we get the fibre interconnect between the local SAN and the BCC SAN. Setting up the software RAID devices in NetWare is a fairly simple thing to do, and will immediately be integrated into the cluster. It was a method like this that I used to migrate the SOFTWARE volume from a direct-attach on FACSRV2 to be on the SAN.

That said, there are some design considerations to take into account. SAN best-practices documents at both HP and Novell (and Microsoft) say it is better to create many LUNs than it is to use a few big LUNs. This is the practice we use in the Exchange cluster. The reasoning behind this is to allow the operating system to queue IO operations across many LUNs rather than stack them all up behind a few LUNs, which has the ultimate effect of making IO more efficient on the SAN device. I propose that we follow this practice when partitioning out the MSA.

The LUNs we create on the MSA for use in WUF will have a 64K stripe-size. This is the stripe that best supports file-server loads. For comparison, the stripe-size in the EVA is an unmodifyable 128K.

MSA guidelines strongly recommend against RAID5 arrays larger than 14 drives, which limits us to Drive Arrays of 6.5TB or smaller. Each drive-array we create loses us a drive for parity. Also, I'd like to designate one drive per shelf to be a hot-spare. This leaves us with 22 drives for use as storage.

Right this moment WUF has just over 6TB allocated to it which is almost to the max for a single drive array.

Tags: ,

MSA, Mirroring, and NetWare

The performance testing is largely done.

In short, Mirroring performance matches or is within a few percentage points of EVA performance. Mirror performance follows the slowest device, which in all tests is the EVA. Since the EVA is the benchmark by which we compare production performance, this tells me that we can safely expect to mirror at least some of the EVA data on the MSA.

MSA performance exceeds EVA performance significantly. This is NOT true for the same tests on a Windows server running locally. I can't theorize why this might be, but the data show it quite well.

Unfortunately, I lack the resources to do a TRUE concurrency test. I can't tell you how MSA vs EVA performs when 50 workstations are pounding random IO. From what I've seen, EVA should turn in better numbers in that case due to technological differences. On the other hand, EVA should have turned in faster numbers than it did in this single-streamer test.

Tags: ,

NetWare and Hyperthreading

| 1 Comment
It has long been the consensus view in the Novell support forums that Hyperthreading does nothing for NetWare. In fact, it can hurt, so turn it off when you can. I agree.

During this MSA testing I have a chance to see what it can do for me. The test server is brand new with dual processors. As expected, four present themselves when booted, two of which are the logical processors HT gives you. So I ran some IOZONE tests at various combinations of active processor to see what it does for me with a software-mirrored volume.
  • With all four processors live I get a throughput of 39MB/s, and CPU load around 7%
  • With one real processor and one logical processor I get a throughput of 32 MB/s, and CPU load around 25%
  • With one lone processor I get a throughput of 38MB/s and a 45% CPU load...
  • With two real processors I get a throughput of 40MB/s and a CPU load around 7%
The reason HT is mostly meaningless for NetWare is that for most places NetWare is used it doesn't help. For certain tasks like GroupWise reindexing, HT could help, but HT won't do a thing to help I/O. The above chart shows that. While it is possible that HT doesn't do any harm, it is quite possible that it can.

The best performance was had when there were two real CPUs in the system. Two CPUs with HT on gave us somewhat slower performance, but not the 66% shown by the link above. That could also be due to timing. The test with one real and one logical showed quite slower results, and is a better example of how HT can hurt.

The interesting thing to me is CPU loading. With a single real CPU loading was around 45%. Add a second real CPU and loading dropped more than half to 7%. Clearly, the I/O stack on NetWare 6.5 Sp5 is multi-threaded to a large degree. Interrupts can spread between the two CPUs, and that alone could account for most of the performance improvements.

As for the greater MSA test, the data are in, I just need to spend time crunching it. At first glance I see two trends:
  • Mirrored-Write performance follows the curve of the slowest device in the mirror
  • Mirrored-Read performance follows the curve of the fastest device in the mirror
Tags: ,

The power of Rainbow Tables

| 1 Comment
The concept of Rainbow Tables isn't new, but implementations of it are relatively new. Part of that is because computing power has increased enough that they're now feasible. Plus, with distributed computing skills increasing in the marketplace throwing a hundred CPU's worth of spare cycles is quite doable. You can even purchase them.

Since L0phtcrack has been discontinued, the Open Source community has come up with several replacements. One of which is Ophcrack, which even ships a LiveCD that in their words:
Just put it into the CD-ROM drive, reboot and it will try to find a Windows partition, extract its SAM and start auditing the passwords.
Not useful for remote network intrusions, but perhaps useful in a home-setting for recovering lost passwords. Yet another reason for physical security with your servers.

Passwords in the easy character sets, the ones that show up on the keyboard, are readily available. For reference:

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&' ()*+,-./:;&<=>?@[\]^_`{|}~ (including the space character)

For reference, that's upper-case, lower-case, numbers, and shift-symbols. The four most common character sets. I know of projects which are working on rainbow tables for the full ASCII table (remember [alt]+205 and the like: ═). Those are going to be for-purchase or sponsored sites, as those tables will be well over 1TB in size. It won't be long.

Of course, if you happen to possess a full-ASCII rainbow table, the biggest computational task to crack a password will be parsing through the 1.7TB (or whatever) of rainbow table for the correct hash. That will take a few minutes, but certainly much less time than hammering away at it one hash at a time with LCP or John the Ripper.

Another thing to keep in mind is that LanMan hashes are, still, the easiest to crack. NTLM hashes are also crackable, but require more CPU horsepower so the available tables for NTLM hashes are still fairly simple. I haven't seen any for NTLMv2. Another in a bucket-full of reasons for setting up your Windows domains and machines to only use NTLMv2.

Also of note, Rainbow Tables are available for MD5 as well as LanMan. One of the password crypt options on Linux is MD5, though that is typically salted which renders purchased Rainbow Tables useless. That is not so true with things like embedded devices that use unstalted MD5, or other software-based authentication systems such as commercial FTP servers.

Salting your hashes means that purchased Rainbow Tables are useless. On the other hand, if given enough time a hacker who obtained access to the password hashes will be able to solve all possible hashes on a server. Regardless of whether or not you changed your passwords after the intrusion. This sort of attack is different than the kind of /etc/passwd stealing we've had going on for a couple decades now, in that once the table is computed there ARE no safe passwords until you change the salt. Assuming ready access to the password file, of course.

Happily, the sort of hacker looking to perform this sort of attack is rare. Most of the hackers out there are looking to set up botnets and warez sites, and don't have the patience for this kind of attack. They're looking for lots of low hanging fruit, rather than one specific target.

Rainbow Tables are a powerful tool, especially against Windows networks that still maintain LanMan hashes. LanMan has been depreciated for quite some time, but backwards compatibility forces continued use. Older version of Samba don't speak NTLM, which forces Windows machines to speak (and therefore store hashes for) LanMan. With Raindow Tables in the equation and a Windows system that maintains LanMan hashes, passwords 14 characters and shorter (I was wrong, it's 7+7 not 8+8) are effectively the same as clear-text.
One of the challenges of coming up with a reasonable password complexity policy is taking into account the relative strengths and weaknesses of the operating environments those passwords will be used in. Different operating systems have different strengths and weaknesses when it comes to password strength. Different environments have different threat exposures.

The two biggest things to worry about for brute-force password problems are random guessing, and hash-grab-and-crack. I'm ignoring theft or social engineering for the moment, as plain old password complexity doesn't do a lot to address those issues. Random guessing is the reason intruder lockout was created. Hash-grab-and-crack is what pwdump1/2/3/4 was created to do, with offline processing.

Password guessing will work on any system, given sufficient time. Not all systems even permit grabbing the password hashes, like NDS passwords, where others are rather well known (/etc/shadow). Grabbing the password hashes is preferred, since it permits offline guessing of passwords that won't trip any intruder-lockout alarms.

As for OS-specific password issues, we have three different systems here at WWU. Our main student shell server is running Solaris8, so passwords longer than 8 characters are meaningless; only the first 8 characters count. Our eDirectory tree is running Universal passwords, so passwords of any length are usable. Our Windows environment is not restricted to NTLM2 which means we have NTLM password hashes stored; and in this era of RainbowTables any password shorter than 16 characters (of ANY character, regardless of char-set) is laughably easy to crack if you have the hash.

This leads us to strange cases. This password:
1ßÜb$R=0
Is very, very secure in Solaris, but laughably easy in Windows. And this password:
0123456789abcefBubba2pAantz
Is a very good Windows password, but laughably easy on Solaris.

So, what are we to do? That's a good question. Solaris passwords prefer complexity over length, and Windows passwords prefer length over complexity. This would imply that the optimal password policy is one that mandates long (longer than 16 characters) complex (the usual rules) passwords. Solaris will only take the first 8 characters, so the complexity requirement needs to be beefy enough that the first 8 characters are cryptographically strong.

One of the first things a hacker does once they gain SYSTEM access on a windows box is dump the SAM list on that server. I've seen this done every time I've had to investigate a hacked server. When the machine being hacked is a domained machine, the threat to the domain as a whole increases. So far I haven't seen a hacker successfully dump the entire AD domain. On the other hand, one memorable case saw the SAM dump at 12:06am and a text-file containing the cleartext passwords was dumped including the local-administrator account (a password 10 characters long, three character sets, no dictionary words, in other words a good Solaris password) at 12:17am; clearly a Rainbow Table had been used to crack it that fast. This was almost two years ago.

One problem with long, complex passwords that are complex enough in the first 8 characters is the human memory. 8-10 characters is about as long as anyone can remember a gibberish password like "{BJ]+5Bf", and it'll take that person a while to learn it. Going the irregular-case and number-substitution route can add complexity, but cryptographically speaking not a lot. Password crackers like John the Ripper contains algorithms to replace "a" with "4" and "A", to make sure your super secret password "P45$w0r|)" is cracked within 1 minute. Yet something like "FuRgl1uv2" works out, as it contains bad spelling.

Never underestimate the cryptographic potential of bad spelling. Especially creative bad spelling.

We still haven't solved this one. We're working on upgrading Solaris to a version that'll take longer passwords, and the resultant migration that'll required. We know where we need to go, but getting the culture at WWU shifted so that such requirements won't end up with a user revolt and passwords on post-its is the problem. Two-factor is not viable for a number of reasons (cost, and web-access being the two top ones). Mandatory password rotation is something we only do in the 'high security' zone (banner), not something we do for our regular ole systems. It's a bad habit we're trying to break, but institutional inertia is in the way and that takes time to overcome.

If Microsoft decided to salt their NTLM hashes, and therefore render Rainbow Tables mostly useless, we wouldn't be in this mess. They've seen the light (NTLM2, and whatever Vista-server will bring out), but that won't help all the legacy settings out there. NTLM is already legacy, yet we have to keep it around for a number of reasons, right up there being Samba doesn't speak NTLM2.

Who knows, it may end up that what solves this for us is getting Solaris to take long passwords, rather than educating all of our users on what a good password looks like.

Tags: ,

Slashdot, close to home

| 1 Comment
Ask Slashdot: Web Censorship on the University Campus?

Summary: A private university in Texas is censoring their internet feed. Students and staff get the willies. The CIO says this is a common practice and to get used to it.

Um, not around here. Heck, until very very recently we didn't even have a box with the name 'firewall' in the title between our network and the greater internet. We used router rules for that, good ole stateless packet blocking. Very complex router rules, but router rules none the less. Our staff and faculty are very sensitive to issues of freedom of speach, and have spoken out loudly about it over the past decade.

At OldJob we were just getting some form of web filtering in place when I left. It hadn't quite hit prime time but was really close. OldJob was also much more corporate than a public University. All it takes in a corporate environment for the censorship boom to fall is for multiple people getting caught with smut on their work machines (or a single person with anything that could be mistaken for kiddie porn). On the other hand, our Legal department had legitimate need to be able to go everywhere, so our filter needed to accomodate that.

My history may be a bad, but I've caught rumors that we operated an optional web-cache proxy for a while. That also caught grief for spying, since the squid access_log can be used to trace web usage. Yes, we're touchy.

ResTek is another issue. A while back we segregated our Campus and ResTek networks. The ResTek network is paid by student housing fees, IIRC, and not general fund, and is managed completely separately from the Campus network. Because it is the network that is in the dorms, it is used for everything from p2p to gaming to web surfing to skype to podcasting. SOME form of controls are absolutely required over there, which is why they have a packet-shaper and we don't. I don't know if they outright block any traffic, but I doubt it. I do expect that they slow it down a lot rather than prevent it outright. The ResTek network is much more ISP-like than the Campus network, and we have a lot of ISP-like qualities as it is.

So no, we're not going to be filtering our internet feed any time soon.

MSA Performance update

An update to the MSA performance testing.
  1. RAID stripe performance (standard IOZONE, and a 32GB file IOZONE)
    1. 64K both Raid0 and Raid5
    2. Default stripes: 16K Raid5, and 128K Raid0
    3. Versus EVA performance
  2. Software mirror performance (software Raid1)
    1. Windows/NetWare: MSA/EVA
    2. Windows: MSA/MSA
    3. ?? Windows: EVA/EVA
  3. Concurrency performance
    1. Multiple high-rate streams to the same Disk Array (different logical drives)
    2. Multiple high-rate streams to different Disk Arrays
    3. Random I/O & Sequential I/O performance interaction on the same array
Testing EVA performance versus MSA performance was a bit of a trick. The EVA is in production, where the MSA is 100% devoted to this test. Hardly apples to apples. I also learned that the stripe size on the EVA is 128KB.

One thing became very, very clear when testing the default stripe sizes. A 16KB stripe size on a RAID5 array on the MSA gives faster read performance, but much worse Write performance. Enough worse, that I'm curious why it's a default. We'll be going with a 64K stripe for our production use, as that's a good compromise between read/write performance.

The Windows part of the mirror/unmirror test is completed. Write performance tracks, as in the curve has the same shape, the MSA performance. This makes sense, because software mirroring needs to have both writes commit before it'll move on to the next operation. This by necessity forces write performance to follow the slowest performing storage device. All in all, Write performance trailed MSA performance, which in turn trailed EVA performance for the large file test.

Read performance is where the real performance gains were to be had. This also makes sense because software Raid1 generally has each storage device alternating serving blocks. On reflection this could play a bit of hob with in-MSA or in-EVA predictive reads, but testing that is difficult. Performance matched EVA performance for files under 8GB in size, and still exceeded MSA performance for the 32GB file.

I'm running the NetWare test right now. Because this has to run over the network, I can't compare these results to the Windows test. But I can at least get a feeling for whether or not NetWare's software mirror provides similar performance characteristics. Considering how slow this test is running (Gig Ether isn't having as much of an impact as I thought it would), it'll be next week before I'll have more data.

Because of the delays I'm seeing, I've had to strike a few tests from the testing schedule. This needs to be in production during Winter Break, so we need time to set up pre-production systems and start building the environment.

Tags: ,

NSS read-ahead

One of the tuning items that has come up as I've been doing all of this benchmarking is NSS Read Ahead. This can be configured by two command-line parameters:

nss /AllocAheadBlks=[vol]:[count]
nss /ReadAheadBlks=[vol]:[count]

AllocAhead allocates blocks on writes, where ReadAhead is just that, blocks read ahead of the read. Both behaviors are to make access to the base I/O subsystem more efficient and to improve Read performance.

By default as of NetWare 6.5, the default ReadAhead is 2 blocks (8KB), and the default AllocAhead is 15 blocks (60KB).

So what is the recommended settings for these? The manual has this to say:
The most efficient value for block count depends on your hardware. In general, we recommend a block count of 8 to 16 blocks for large data reads; 2 blocks for CDs, 8 blocks for DVDs, and 2 blocks for ZLSS.
ZLSS is, I believe, a standard volume.

The question then begs what is the real optimal setting for this, based on what you can find out about your storage systems. I don't know, but I do have some suggestive ideas. If I have time, I'll see what I can do about testing it.

The gold standard is having very good data on how I/O is performed on your volume. For a volume consisting of mostly databases, such as Access files, the read-ahead should be set to a value close to the average record-read size. For a plain ole home-directory volume file size is probably the better determiner of 'best'.

Running some stats on the STU1 volume, I've found the following:

RAID Stripe size: 128KB
NSS Block size: 4KB
Median Size: 8192KB
Average Size: 293KB
File-count Median Size: 16KB
  • 50% of the files on STU1 are 16K or smaller
  • 50% of the files on STU1 are responsible for 0.55% of the total space used on STU1
  • 90% of the files on STU1 are 256K or smaller, which represents 7.9% of the total space used on STU1
  • 10% of the files on STU1 are responsible for over 90% of the data on STU1
Based on this, a ReadAhead value of "4" is probably in order. This represents a file size of 16K, which 50% of the files on the volume exceed. A ReadAhead value of 32 (128K) would match the RAID stripe size and would very likely enhance, possibly greatly, reads of those files that exceed 128K in length.

The GIS volume is another story.

Median Size: 200MB
Average Size: 11.5MB
File-count Median Size: 8KB
  • Total files on the volume is vastly smaller than on STU1
  • 43% of the data on the GIS volume are in files larger than 256MB
  • The largest file-type is TIF, which is an uncompressed graphics format that is read as a whole, not as sub-records
  • Files under 64MB in size represent 93% of the files, but only 7.7% of the data. Compare that with 99.97% and 89.3% respectively on STU1
In this case a ReadAhead setting of much higher is called for. The Novell guidance of "16" makes sense in this case, since that is 1/2 the stripe size and most of the reads on the volume are probably going to take advantage of this activity.

Tags: ,

News and things

From the support forums:
There will certainly be a SP6, it's in beta now. Normal schedule would
see it out in February, but of course that may vary depending on test
results and other factors.
So yes, boys and girls, there will be a SP6. And it'll come out a bit over a year after SP5 came out. This service-pack will be a doozy. I know this because of all of the post-SP5 patches I had to put in when I rolled SP5 out earlier.

As for the rights thing I posted about earlier, I do take a nightly trustee backup using TRUSTEE.NLM. This handy thing dumps out a CSV-formatted list of trustees and IRFs on the file systems. I've been thinking about ways to leverage that to get the lists people may request. What might work better would be something that interfaces with the Virtual File System stuff out there on _admin. The trick there will be dereferencing group names, but that could be done with simple LDAP.

I'm just hoping I don't have to build it, but I have all the bits I need to construct something should the case arise.

I also noticed yesterday that Novell has updated their documentation on VFS for NetWare. Ever wonder what all that stuff in that odd _ADMIN volume was for? This will tell you what it is for, and how to interface with it.

Tags: ,

"Plain talk" rights

| 3 Comments
We're beginning to get a few more requests from manager-types along the lines of, 'who has access to my stuff,' and, 'how are rights set up in my shared areas?'

It's pretty easy to give them a list of which groups have rights to what directories. What isn't so easy is explaining how trustees work, inheritance, and how rights-filters (which we use in a few key areas) affect flow. Plus, NDS rights factor into this significantly and those aren't presented on the file-system.

I'm pretty sure that we'll get a request from a high level manager to develop a system that will allow managers to see who as rights to their areas. Not just groups, but a full de-referenced list of users for who has rights to a specific directory and what those rights are. We'd also have to provide a second column next to each user showing what groups they got their rights from and where said right was assigned.

In other words, a big ole mess.

There is a reason we've tried to keep the managing of rights to be as much of a black-box as possible. I fully understand how they work. But explaining that to managers would require, IIRC, Day 2 of the Certified NetWare Administrator class.

What makes it even more fun is that we use IIS for some of our web-development, and we use rights there. Rights flow on Windows is different than on NetWare. Explaining THAT will take even more happy-fun meeting time.

Tags: ,

More MSA performance

Since my OES benchmark went so well, I've been asked to do a series on the MSA we just received for our BCC cluster. Long time readers will remember that the BCC cluster will be done with free or cheap software, not Novell BCC. Unfortunately, the same goes for the hardware. So I get to find out if the MSA will really live up to our performance expectations.

The testing series I've worked out is this:
  1. RAID stripe performance (standard IOZONE, and a 32GB file IOZONE)
    1. 64K both Raid0 and Raid5
    2. Default stripes: 16K Raid5, and 128K Raid0
    3. Versus EVA performance
  2. Software mirror performance (software Raid1)
    1. Windows/NetWare: MSA/EVA
    2. Windows: MSA/MSA
    3. ?? Windows: EVA/EVA
  3. Concurrency performance
    1. Multiple high-rate streams to the same Disk Array (different logical drives)
    2. Multiple high-rate streams to different Disk Arrays
    3. Random I/O & Sequential I/O performance interaction on the same array
The dark green ones are the steps I've completed so far. I'm in the process of restriping for the 16K/128K stripes, which will probably take the rest of the day to complete. I may be able to start off the testing series before I go home tonight. If so, it'll probably get done sometime Sunday evening.

One thing the testing has already shown, and that is for Raid5 performance a quiescent MSA out-performs the in-production EVA. Since there is no way to do tests against the EVA without competing at the disk level for I/O supporting production, I can't get a true apples to apples comparison. By the numbers, EVA should outperform MSA. It's just that classes have started to the EVA is currently supporting the 6 node NetWare cluster and the two node 8,000 mailbox Exchange cluster, where the MSA is doing nothing but being subjected to benchmarking loads.

The other thing that is very apparent in the tests are the prevalence of caching. Both the host server and the MSA have caching. The host server is more file-based caching, and the MSA (512MB) is block-level caching. This has a very big impact on performance numbers for files under 512MB. This is why the 32GB file test is very important to us, since that test blows past ALL caching and yields the 'worst case' performance numbers for MSA.

Tags: ,

Performance of the MSA

I'm doing another performance series on the MSA we'll be putting into Bond Hall. This will be our BCC SAN as well as the home for the 'backup to disk' storage.

One of the tests I ran was to do a full IOZONE series on a 32GB file. This is to better get a feel for how such large files perform on the MSA, since I suspect that any backup-to-disk system will be generating files that large. But I got some s-t-r-a-n-g-e numbers. It turns out that the random-write test is much faster than the random-read test. Weird.

RandomR



Rec KB
2048 4096 8192
Thru
10881 17115 23311
RandomW



Rec KB
2048 4096 8192
Thru 64491 63949 62681

So, um. Yeah. And you want to know the scary part? This holds true for both a Raid0 and Raid5 array. Both have a 64K stripe size, which is not default. Raid5's default stripe is 16K, and Raid0 is 128K. I'll test the default stripes next to see if they affect the results any. But this is STILL weird.

Perhaps writes are cached and reordered, and reads just come off of disk? Hard to say. But read speed does improve as the record size increases. The 16MB record size turns in a read speed of only .5 that of the write speed. Yet the read performance at 64K is .12 that of write. Ouch! I'm running the same test on the EVA to see if there is a difference, but I don't know what the EVA stripe-size is.


Tags: ,