Recently in virtualization Category

A network problem

| 1 Comment | No TrackBacks
I have a server attempting to talk SMTP to our internal smart-host. But it seems our hardware load-balancer is getting in the way. When sniffing the switch-port the server is on, the  conversation goes like this:

Server -> Mailer [SYN]
Mailer -> Server [SYN, ACK]
Server -> Mailer [Ack]
Mailer -> Server [RST, ACK]
[3 seconds pass]
Mailer -> Server [SYN, ACK]
Server -> Mailer [RST]
[6 seconds pass]
Mailer -> Server [SYN, ACK]
Server -> Mailer [RST]

What's going on here?

Well, the first three packets are the classic TCP 3-step handshake. The Mailer then issues a Acknowledge-Reset packet, which shuts down the conversation. Then things get weird. Three seconds pass, and the mailer retransmits the second packet. The Server, having shut down the TCP conversation normally like it was told to in the 4th packet, just issues a RESET packet telling the sender there is no connection to ACK and to stop trying. This repeats 6 seconds later.

So how did the Mailer forget it had torn down the TCP connection? That is the mystery. I haven't had a chance to get a sniffer on the Mailer side of things yet, so I'm not certain what it's seeing. It could be the load-balancer is throwing a fit, and the follow-on packets at 3 and 6 seconds are from the Mailer server itself somehow.

Strange things.

Desktop virtualization

| No Comments | No TrackBacks
Virtualizing the desktop is something of a rage lately. Last year when we were still wondering how the Stimulus Fairy would bless us, we worked up a few proposals to do just that. Specifically, what would it take to convert all of our labs to a VM-based environment?

The executive summary of our findings: It costs about the same amount of money as the normal regular PC and imaging cycle, but saves some labor compared to the existing environment.

Verdict: No cost savings, so not worth it. Labor savings not sufficient to commit.

Every dollar we saved in hardware in the labs was spent in the VM environment. Replacing $900 PCs with $400 thin clients (not their real prices) looks cheap, but when you're spending $500/seat on ESX licensing/Storage/Servers, it isn't actually cheaper. The price realities may have changed from 12 months ago, but the simple fact remains that the stimulus fairy bequeathed her bounty upon the salary budget to prevent layoffs rather than spending on swank new IT infrastructure.

The labor savings came in the form of a unified hardware environment minimizing the number of 'images' needing to be worked up. This minimized the amount of time spent changing all the images in order to install a new version of SPSS for instance. Or, in our case, integrating the needed changes to cut over from Novell printing to Microsoft printing.

This is fairly standard for us. WWU finds it far easier to commit people resources to a project than financial ones. I've joked in the past that $5 in salary is equivalent to $1 cash outlay when doing cost comparisons. Our time management practices generally don't allow hour by hour level accounting for changed business practices.

Lemonade

| No Comments | No TrackBacks
Two days ago (but it seems longer) the drive that holds my VM images started vomiting bad sectors. Even more unfortunately, one of the bad sectors took out the MFT clusters on my main Win XP management VM. So far that's the only data-loss, but it's a doozy. I said unto my manager, "Help, for I have no VM drive any more, and am woe." Meanwhile I evacuated what data I could. Being what passes for a Storage Administrator around here, finding the space was dead easy.

Yesterday bossman gave me a 500GB Western Digital drive and I got to work restoring service. This drive has Native Command Queueing, unlike the now-dead 320GB drive. I didn't expect that to make much of a difference, but it has. My Vista VMs (undamaged) run noticibly faster now. "iostat -x" shows await times markedly lower than they were before when running multiple VMs.

NCQ isn't the kind of feature that generally speeds up desktop performance, but in this case it does. Perhaps lots of VM's are a 'server' type load afterall.

Fabric merges

| No Comments | No TrackBacks
When doing a fabric merge with Brocade gear, when they say that the Zone configuration needs to be exactly the same on both switches, they mean that. The merge process does no parsing, it just compares the zone config. If the metaphorical diff returns anything it doesn't merge. So if one zone has a swapped order of two nodes but is otherwise identical, it'll not merge.

Yes, this is very conservative. And I'm glad for it, since failure here would have brought down our ESX cluster and that's a very wince-worthy collection of highly visible services. But it took a lot of hacking to get the config on the switch I'm trying to merge into the fabric to be exactly right.

SANS Virtualization

| No Comments | No TrackBacks
Mr. Tom Liston of ISC Diary fame is at the SANS Virtualization Summit right now. He has been tweeting it. I wish I was there, but there is zero chance of me convincing my boss to send me. Even if it was a year in which out of state travel was allowed.

Mostly just interesting quotes so far, but there have been a few interesting ones.

"When your server is a file, network access equals physical access" - Michael Berman, Catbird

From earlier: "You can tell how entrenched virtualization has become when the VM admin has become the popular IT scapegoat" - Gene Kim

On VMsprawl: "The 'deploy all you want, we'll right click and make more' mentality." Herb Goodfellow, Guident.

I expect to see more as the week progresses.

I have VMWare Workstation installed on my workstation. It is very handy. We have an ESX cluster so I could theoretically export VMs I work up on my machine directly to the ESX hosts. I haven't done that yet, but it is possible.

Unfortuantely, I've run into several performance problems since I installed it. The base system as it started, right after I switched from Xen:
  • OpenSUSE 10.2, 64-bit
  • The VM partition is running XFS
  • Intel dual core E6700 processor
  • 4GB RAM
  • 320GB SATA drives
The system as it exists now:
  • OpenSUSE 11.0 64-bit
  • The VM Partition is running XFS, with these fstab settings: logbufs=8,noatime,nodiratime,nobarrier
  • The XFS partition was reformatted with lazy-count=1, and log version 2
  • Intel quad core Q6700
  • 6GB RAM (would have been 8, but I somehow managed to break one of my RAM sockets)
  • 320GB SATA drives, with the 'deadline' scheduler set for the disk with the VM partition on it.
It still doesn't perform that great. I've done enough system monitoring to know that I'm being I/O bound. I hear ext4 is supposed to be better at this than XFS, so I just might go there when openSUSE 11.2 drops. One of these would go a long way to helping fix this problem, but I don't think I'll be able to get the funding to do it.

High availability

| 1 Comment | No TrackBacks
64-bit OES provides some options to highly available file serving. Now that we've split the non-file services out of the main 6-node cluster, all that cluster is doing is NCP and some trivial other things. What kinds of things could we do with this should we get a pile of money to do whatever we want?

Disclaimer: Due to the budget crisis, it is very possible we will not be able to replace the cluster nodes when they turn 5 years old. It may be easier to justify eating the greatly increased support expenses. Won't know until we try and replace them. This is a pure fantasy exercise as a result.

The stats of the 6-node cluster are impressive:
  • 12 P4 cores, with an average of 3GHz per core (36GHz).
  • A total of 24GB of RAM
  • About 7TB of active data
The interesting thing is that you can get a similar server these days:
  • HP ProLiant DL580 (4 CPU sockets)
  • 4x Quad Core Xeon E7330 Processors (2.40GHz per core, 38.4GHz total)
  • 24 GB of RAM
  • The usual trimmings
  • Total cost: No more than $16,547 for us
With OES2 running in 64-bit mode, this monolithic server could handle what six 32-bit nodes are handling right now. The above is just a server that matches the stats of the existing cluster. If I were to really replace the 6 node cluster with a single device I would make a few changes to the above. Such as moving to 32GB of RAM at minimum, and using a 2-socket server instead of a 4-socket server; 8 cores should be plenty for a pure file-server this big.

A single server does have a few things to recommend it. By doing away with the virtual servers, all of the NCP volumes would be hosted on the same server. Right now each virtual-server/volume pair causes a new connection to each cluster node. Right now if I fail all the volumes to the same cluster node, that cluster node will legitimately have on the order of 15,000 concurrent connections. If I were to move all the volumes to a single server itself, the concurrent connection count would drop to only ~2500.

Doing that would also make one of the chief annoyances of the Vista Client for Novell much less annoying. Due to name cache expiration, if you don't look at Windows Explorer or that file dialog in the Vista client once every 10 minutes, it'll take a freaking-long time to open that window when you do. This is because the Vista client has to enumerate/resolve the addresses of each mapped drive. Because of our cluster, each user gets no less than 6 drive mappings to 6 different virtual servers. Since it takes Vista 30-60 seconds per NCP mapping to figure out the address (it has to try Windows resolution methods before going to Novell resolution methods, and unlike WinXP there is no way to reverse that order), this means a 3-5 minute pause before Windows Explorer opens.

By putting all of our volumes on the same server, it'd only pause 30-60 seconds. Still not great, but far better.

However, putting everything on a single server is not what you call "highly available". OES2 is a lot more stable now, but it still isn't to the legendary stability of NetWare 3. Heck, NetWare 6.5 isn't at that legendary stability either. Rebooting for patches takes everything down for minutes at a time. Not viable.

With a server this beefy it is quite doable to do a cluster-in-a-box by way of Xen. Lay a base of SLES10-Sp2 on it, run the Xen kernel, and create four VMs for NCS cluster nodes. Give each 64-bit VM 7.75GB of RAM for file-caching, and bam! Cluster-in-a-box, and highly available.

However, this is a pure fantasy solution, so chances are real good that if we had the money we would use VMWare ESX instead XEN for the VM. The advantage to that is that we don't have to keep the VM/Host kernel versions in lock-step, which reduces downtime. There would be some performance degradation, and clock skew would be a problem, but at least uptime would be good; no need to perform a CLUSTER DOWN when updating kernels.

Best case, we'd have two physical boxes so we can patch the VM host without having to take every VM down.

But I still find it quite interesting that I could theoretically buy a single server with the same horsepower as the six servers driving our cluster right now.
A long while back I mentioned I had a perl script that we use to track certain disk space details on my NetWare and Windows servers. That goes into a database, and it can make for some pretty charts. A short while back I got asked if I could do something like that for the ESX datacenter volumes.

A lot of googling later I found how to turn on the SNMP daemon for an ESX host, and a script or two to publish the data I need by SNMP. It took some doing, but it ended up pretty easy to do. One new perl script, the right config for snmpd on the ESX host, setting the ESX host's security policy to permit SNMP traffic, and pointing my gathering script at the host.

The perl script that gathers the local information is very basic:
#!/usr/bin/perl -w

use strict;
my $partition = ".";
my $partmaps = ".";
my $vmfsvolume = "\Q/vmfs/volumes/$ARGV[0]\Q";
my $vmfsfriendly = $ARGV[1];
my $capRaw = 0;
my $capBlock = 0;
my $blocksize = 0;
my $freeRaw = 0;
my $freeBlock = 0;
my $freespace= "";
my $totalspace= "";
open("Y", "/usr/sbin/vmkfstools -P $vmfsvolume|");
while () {
if (/Capacity ([0-9]*).*\(([0-9]*).* ([0-9]*)\), ([0-9]*).*\(([0-9]*).*a
vail/) {
$capRaw = $1;
$capBlock = $2;
$blocksize = $3;
$freeRaw = $4;
$freeBlock = $5;
$freespace = $freeBlock;
$totalspace = $capBlock;
$blocksize = $blocksize/1024;
#print ("1 = $1\n2 = $2\n3 = $3\n4 = $4\n5 = $5\n");
print ("$vmfsfriendly\n$totalspace\n$freespace\n$blocksize\n");
}
}


Then append the /etc/snmp/snmp.conf file with the following lines (in my case):

exec .1.3.6.1.4.1.6876.99999.2.0 vmfsspace /root/bin/vmfsspace.specific 48cb2cbc
-61468d50-ed1f-001cc447a19d Disk1

exec .1.3.6.1.4.1.6876.99999.2.1 vmfsspace /root/bin/vmfsspace.specific 48cb2cbc
-7aa208e8-be6b-001cc447a19d Disk2


The first parameter after exec is the OID to publish. The script returns an array of values, one element per line, that are assigned to .0, .1, .2 and on up. I'm publishing the details I'm interested in, which may be different than yours. That's the 'print' line in the script.

The script itself lives in /root/bin/ since I didn't know where better to put it. It has to have execute rights for Other, though.

The big unique-ID looking number is just that, a UUID. It is the UUID assigned to the VMFS volume. The VMFS volumes are multi-mounted between each ESX host in that particular cluster, so you don't have to worry about chasing the node that has it mounted. You can find the number you want by logging in to the ESX host on the SSH console, and doing a long directory on the /vmfs/volumes folder. The friendly name of your VMFS volume is symlinked to the UUID. The UUID is what goes in to the snmp.conf file.

The last parameter ("Disk1" and "Disk2" above) is the friendly name of the volume to publish over SNMP. As you can see, I'm very creative.

These values are queried by my script and dropped into the database. Since the ESX datacenter volumes only get space consumed when we provision a new VM or take a snapshot, the graph is pretty chunky rather than curvy like the graph I linked to earlier. If VMware ever changes how the vmfstools command returns data, this script will break. But until then, it should serve me well.

That darned budget

| 1 Comment | No TrackBacks
This is where I whine about not having enough money.

It has been a common complaint amongst my co-workers that WWU wants enterprise level service for a SOHO budget. Especially for the Win/Novell environments. Our Solaris stuff is tied in closely to our ERP product, SCT Banner, and that gets big budget every 5 years to replace. We really need the same kind of thing for the Win/Novell side of the house, such as this disk-array replacement project we're doing right now.

The new EVAs are being paid for by Student Tech Fee, and not out of a general budget request. This is not how these devices should be funded, since the scope of this array is much wider than just student-related features. Unfortunately, STF is the only way we could get them funded, and we desperately need the new arrays. Without the new arrays, student service would be significantly impacted over the next fiscal year.

The problem is that the EVA3000 contains between 40-45% directly student-related storage. The other 55-60% is Fac/Staff storage. And yet, the EVA3000 was paid for by STF funds in 2003. Huh.

The summer of 2007 saw a Banner Upgrade Project, when the servers that support SCT Banner were upgraded. This was a quarter million dollar project and it happens every 5 years. They also got a disk-array upgrade to a pair of StorageTek (SUN, remember) arrays, DR replicated between our building and the DR site in Bond Hall. I believe they're using Solaris-level replication rather than Array-level replication.

The disk-array upgrade we're doing now got through the President's office just before the boom went down on big expensive purchases. It languished in the Purchasing department due to summer-vacation related under-staffing. I hate to think how late it would have gone had it been subjected to the added paperwork we now have to go through for any purchase over $1000. Under no circumstances could we have done it before Fall quarter. Which would have been bad, since we were too short to deal with the expected growth of storage for Fall quarter.

Now that we're going deep into the land of VMWare ESX, centralized storage-arrays are line of business. Without the STF funded arrays, we'd be stuck with "Departmental" and "Entry-level" arrays such as the much maligned MSA1500, or building our own iSCSI SAN from component parts (a DL385, with 2x 4-channel SmartArray controller cards, 8x MSA70 drive enclosures, running NetWare or Linux as an iSCSI target, with bonded GigE ports for throughput). Which would blow chunks. As it is, we're still stuck using SATA drives for certain 'online' uses, such as a pair of volumes on our NetWare cluster that are low usage but big consumers of space. Such systems are not designed for the workloads we'd have to subject them to, and are very poor performers when doing things like LUN expansions.

The EVA is exactly what we need to do what we're already doing for high-availability computing, yet is always treated as an exceptional budget request when it comes time to do anything big with it. Since these things are hella expensive, the budgetary powers-that-be balk at approving them and like to defer them for a year or two. We asked for a replacement EVA in time for last year's academic year, but the general-budget request got denied. For this year we went, IIRC, both with general-fund and STF proposals. The general fund got denied, but STF approved it. This needs to change.

By October, every person between and Governor Gregoir will be new. My boss is retiring in October. My grandboss was replaced last year, my great grand boss also has been replaced in the last year, and the University President stepped down on September 1st. Perhaps the new people will have a broader perspective on things and might permit the budget priorities to be realigned to the point that our disk-arrays are classified as the critical line-of-business investments they are.

Woot!

| 1 Comment | No TrackBacks
The EVAs are scheduled to deliver today! This means that we are very probably going to be taking almost every IT system we have down starting late Friday 9/5 and going until we're done. We have a meeting in a few minutes to talk strategy.

There was some fear that the gear wouldn't get here in time for the 9/5 window. The 9/12 window has one of the key, key people needed to handle the migration in Las Vegas for VMWorld, and he won't be back until 9/21 which also screws with the 9/19 window. The 9/19 window is our last choice, since that weekend is move-in weekend and the outage will be vastly more noticeable with students around. Being able to make the 9/5 window is great! We need these so badly that if we didn't get the gear in time, we'd have probably done it 9/12 even without said key player.

The one hitch is if HP can't do 9/5-6 for some reason. Fret. Fret.

Other Blogs

My Other Stuff

About this Archive

This page is an archive of entries from June 2010 listed from newest to oldest.

May 2010 is the previous archive.

July 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.