June 2010 Archives

It has been cold

In most of the rest of the US June is actually a summer month, but not here in the Pacific Northwest. For us, Summer typically starts on July 12th, give or take a day. I typically make it longer for me by visiting the warmer parts of the country over the 4th of July weekend. But this June has been unusually gloomy and chilly. Take a look at the monthly temp chart from the Seattle airport. We're usually 1-5 degrees (F) colder than Seattle depending on a variety of things but the trend is still the same

KSEA June temperature record
The green band represents the normal high and low. As you can see, this time of year our highs should be in the 70's, but instead they've hung in the 60's. We had a nice patch late last week, but overall the month has been markedly colder than normal. You can see where we set a record low high-temp back on the 19th.

Even during normal years we only have three months with an expected high above 70 (very roughly, June 15 through September 15). What this means is that we're actually a pretty good candidate for ambient-air datacenter cooling. Those kinds of systems didn't really exist in any meaningful way back when this building was built, but if we were to build this building again something like that would be considered.

Universities in general have an environmentalist bent to them, and WWU is not immune. We have the Huxley College of the Environment, one of the first such programs in existence. The last few buildings we've built on campus have been LEED certified to various degrees. With that kind of track-record, an ambient air system for a new large data-center is something of a gimmie.

Heck, I would not be surprised if a Capital Request gets put in sometime in the 5-10 year range to try and convert our current system to at least be partially ambient. We're running up against a power and cooling wall right now. Virtualization has helped with that quite a bit, but our UPS has been running in the 70-85% range for several years now. We're going to have to address that at some point. Since that'll also require shutting the room down for a while (eeeeek!) may as well redo cooling while they've got the availability.

We'll see if that actually happens.
My thoughts on this quote:

Theoretical risks and real risks are generally the same thing when you're talking about IT security.
In large part, this is correct. Especially when getting audited. We have regular audits here, both internal and external. We have servers that handle credit-card data, so we have to deal with PCI compliance as well. So yeah, we know about this. We're also familiar with the debate.

In order to get our PCI stuff certified we have to have security scans performed against our credit-card processing servers. In order to do this, we grant a specified IP address full and unrestricted access to an internal IP list. The third party then scans that from wherever they are, and sends us the report full of red Xes.

The internal debate goes like this. I'm not naming names for obvious reasons. I like my job.

Tech: Why do we have to let them in to scan? That's, like, completely bypassing the security provided by our firewall. Both firewalls. It's not like a regular hacker has that kind of access. These servers can not normally be reached from the internet at all! They should be scanning THAT!

Manager: Because that's what the PCI standard says they have to do.

Tech:  It makes no sense!
The reason for this is because they're testing how vulnerable we are if our other servers get hacked and they have enhanced access to that subnet. That's also very unlikely in our case (see also: two firewalls), but the fact remains that it still has to be checked. Because we've never been attacked that way (that we know of), that kind of attack is seen as theoretical rather than real.

All it takes is one attacker, or a group of attackers, to REALLY WANT SOMETHING for theoretical attacks to become real. The concerted attacker, as opposed to the casual attacker, is the one that'll employ novel methods of getting what they want. Door-rattlers looking for phat pipes for their warez repos are looking for any fat pipes they can find and the resources they expend per target are pretty small. Someone looking to break in for a specific reason is targeting us specifically, and the resources they'll expend to get it is a LOT higher.

It is the concerted attacker that'll spend the time to worm their way from internet-facing systems, to intranet-facing systems, to get to secure-net facing systems. It is this kind of attacker that'll do targeted phishing against user most likely to have inner-firewall access of some kind and then attempt to create VPN sessions with those credentials to do scanning from a far more advantageous network position. It is the concerted attacker that'll do targeted DNS hijacks in order to get better information. These are not the kinds of things that Joe Warezer or Ben BotHerder are going to bother with.

It is also true that the concerted attacker can be vastly more damaging than their younger cousin who is just looking to leech resources or reputation. So yeah, it's a very low likelihood of running into that kind of threat, but the risks of not doing something about it are pretty high. That's what makes the theoretical real. 
Using multiple web-browsers is kind of a power-user thing. Most people just stick with one, and only vary if they need to access a certain site that is doggedly IE only, such as Outlook Web Access, so they have to leave Firefox to use it. Since 51% of you, my intrepid readers, are Firefox users, someone you know well what I'm talking about.

A long time in Mozilla's past, it was possible to run multiple Mozilla instances running out of different profiles. This was handy if you desired process separation for browsing activity, or, ahem, didn't want to pollute your work profile with certain, ahm, sites. For that we now have Porn Mode Privacy Mode. Mozilla removed multiple concurrent profile support with Firefox, IIRC.

However, having multiple browsers is still useful. The reasons I do it:
  • I can stay logged in to multiple GMail accounts this way.
  • I can log in to Google in one browser, and do all of my other searching, browsing, whatnot in another browser and not have all of those searches directly associated with that one Google ID.
  • Sites that are browser specific (OWA is a major one).
  • The Opera email client is really very good.
  • I can have a different plugin-setup, which may help diagnose problems with sites.
  • Browsers on different operating systems behave differently. This can be useful.
The main problem getting in the way of sticking with a single browser is Firefox's insistence that only one profile can be active at any given time. Thus, the need for more than one. I've been using SeaMonkey, the Mozilla descendant and also a Gecko-based browser, for a lot of this.

The browser I use for most general purpose surfing I don't leave logged into Google, Facebook, Twitter, or anything else like that. It minimizes what these social networks track of my browsing habits, especially with Facebook Like and Twitter badges appearing everywhere these days. Ad-networks grab this stuff too, but at least I don't have a login with them that explicitly links me with my browsing habits; it's implicit for them. If WWU ever goes GoogleApps for whatever reason, this will be doubly useful.

The down side is I have to have enough RAM to support two browsers. I'm lucky enough that I do. Useful though!

We have this sneaky building

Bond Hall on campus is sneaky. It was built in the 1960's. A certain thing wasn't pointed out to me until recently. And now I share it with you.
BondHallBig.jpg

Those of you of a certain age cohort may be looking at that and wondering what's so familiar. Have a closer look.

BondHallSmall.jpg

Yep, it's the windows. Some of you may have figured this out already (there was a bit of a fad for this in the 1960's and early 70's). It was achingly familiar to me, but I couldn't figure it out. I'm not quite old enough. It's all in the aspect ratio.

The clues are in the bandwidth

It looks like quite a few people caught the Paraguay/Italy game over lunch today. From our bandwidth chart.


You can see when the game started around 11:30 our time, half-time about 12:15, and when the game finished about 1:20pm. The only reason this is visible is because last week was finals week and the dorms are now empty. Otherwise, you couldn't have seen this in the noise. I expect the chart to be bigger on Friday, when England plays.

So nice of FIFA to schedule games during lunch.

ServerFault, but more of that

It is no secret that I'm active over on ServerFault. It looks like the creators of it and StackOverflow are soliciting ideas for future sites in a rather community-centered way. Float a proposal, see who salutes. Kind of similar to openSUSE's FATE system.

Area51: Home of the unknown and potential

The software isn't OSS, but the idea is nifty. Speaking as an active user of ServerFault, a dedicated site for WebDev is very much something that's get a lot of traffic. It's one of the proposals currently up for debate.

That said, there are a whooooole bunch of other proposals out there, some not even technology related. There is one for Gardening and Farming Organically, for instance.

Unlike OpenFATE, once enough people have said "I'd like that", they start doing the early stages of defining what the community would look like. They need enough users to commit to using it (for values of 'commit' that include, 'if it comes, I'll try and answer questions for a few days'). And ultimately, a beta stage to see if it works at all.

An interesting approach.

Reduced packaging in IT

| 5 Comments
I've talked about this before, and I'm sure I'll do it again. We do need to reduce some of the excessive packaging on the things we get. I can completely understand the need to swaddle a $57,000 storage controller in enough packaging to survive a 3 meter drop. What I don't understand is shipping the 24 hard drives that go with that storage controller in individual boxes. It wouldn't take much engineering to come up with a 6-pack foam holder for hard-drives. It would seriously reduce bulk, which makes it easier and cheaper to ship, and there is less material used in the whole process. But I guess that extra SKU is too much effort.

Today I turned this:
HP-BoxesA.jpg

Into this:

HP-BoxesB.jpg

The big box at the top of the stack contained 24 individual hard-drive boxes. Each box had:
  • 1 hard-drive.
  • 1 anti-static bag requiring a knife to open.
  • 2 foam end-pieces to hold the drive in place in the box.
  • 1 piece of paper of some kind, white.
  • 1 cardboard box, requiring a knife to open.
When I was done slotting all of those in, I had a large pile of cardboard boxes, a big jumble of green foam bits, a slippery pile of anti-static bags, and a neat pile of paper. The paper and cardboard can easily be recycled. The anti-static bags and foam bits... not so much. Although, the foam bits were marked type 4 plastic (LDPE), which means they were possibly made from recyclable materials, right?

Right?

I'd still like to use less of it.

TCP problems

| 3 Comments
My testing for a cheap NAS solution has progressed to the option that costs the most money, Windows 2008 running KernSafe's iStorage. As it happens, it works really well when the iSCSI initiator is Windows but Linux clients don't really want to talk to it. Windows: 30-50 MB/s. Linux: 3-5 MB/s. Biiiig difference there.

Looking at packets I'm noticing a similar pattern on the wire to one I'd seen before. Back when I was troubleshooting exactly why NetWare backups to DataProtector were horrible I came across this problem. It seems that TCP Windowing is fundamentally broken between Server 2008 and NetWare which leads to really bad throughputs, which in turn is very bad for half TB backups. The receiving server seemed to feel the need to ACK after every two packets, which in turn really slowed things down. And that's what the Linux clients are doing for iSCSI to Server 2008.

It has to be something affecting basic TCP services but not complex protocols. Using smbclient to upload a 4GB DVD iso runs at 50MB/s but the iSCSI throughput on the same client is a piddly 3-5MB/s. I'm sure some kind of tuning on either side might be able to jar things loose, heaven knows Linux 2.6.31 is a heck of a lot more current on TCP settings than NetWare 6.5 SP8 is. I just haven't found it yet.

Conversely, Server 2008 talking to a Linux iSCSI client works at line speed pretty much. I'm testing this for completeness's sake. We need something that can serve up to 30TB via both iSCSI and SMB. My findings aren't fully complete yet, but in general:
  • OpenFiler: GREAT iSCSI host, completely blows for SMB in our environment.
  • OpenSolaris: Great iSCSI host, just can't convince the kernel-mode CIFS to join our domain. Also, worst-of-breed random I/O performance.
  • OpenFiler + Windows: OpenFiler for iSCSI, Windows (mounting an iSCSI share) for SMB. Should work GREAT. Current best-best for the future.
  • OpenSolaris + Windows: As previous option, but I/O problems make it less attractive.
  • Windows + KernSafe: GREAT SMB performance, solid iSCSI for Windows hosts. Linux hosts will take lots of tuning (perhaps, or it could be intractable).

Privacy, lack thereof.

| 1 Comment
This past weekend I got into a pretty long discussion about privacy, governmental, corporate, and criminal tracking of everything you do (Big/Little/Silent brother), and such related topics. It was good debate. One of 'em was an actual lawyer versed in these issues who works for a library-related non-profit. How cool is that? Working as I do for a liberal institution of higher ed we do value our individuality and right to express same.

Big Brother. We know this one, Orwell told us all about it in his book 1984. Governmental tracking of people for their own safety.

LIttle Brother. A more recent development, but private-sector tracking of people for reasons relating to profit. Your browsing habits are being tracked by the ad agencies. That kind of thing.

Silent Brother. A term I came up with, but it's obvious enough I wouldn't be surprised to learn someone else came up with it too. Criminal tracking of everything you do for reasons of illicit profit. Russian crime gangs specializing in identity theft.

Now thats out of the way, some nitty gritty. Under the fold.

Aging solid-state drives

There is a misconception about solid-state drives that's rather pernicious. Some people have grabbed onto the paranoia surrounding the SSDs of several years ago and have hung onto that as gospel truth. What am I referring to? The fundamental truth that our current flash-drive technology has an upper limit on the number of writes per memory cell, coupled with a lack of faith in ingenuity.

Once Upon a Time, it was commonly bandied about that SSD memory cells only had 100,000 writes in them before going bad. Cross that with past painful experience with HD bad sectors and you scared off a whole generation of storage administrators. These scars seem to linger.

The main problem cited is/was hot-spotting on the drive itself. Certain blocks get written to a LOT (the journal, the freespace bitmap, certain critical inodes, etc) and once those wear out, well... you have a brick.

This perception has some basis in truth, most especially in the el-cheapo SSD drives of several years ago, but not any more. The enterprise class solid-state drive has not had this problem for a very long time. The exact technical details have been covered quite a lot in the media and Anandtech has had several good articles on it.

Part of the problem here is the misconception that a storage block as seen by the operating system corresponds with a single block on the storage device itself. This hasn't been the case since the 1980's when SCSI drives introduced sector reallocation as a way to handle bad sectors. Back in the day, and heck right now for rotational media, each hard drive keeps a stash of reallocation sectors that act as substitutes for actual bad sectors. When this happens most operating systems will throw alarms about pre-fail, but the data is still intact. What's more, the operating system doesn't necessarily know which sectors got reallocated. What looks like a contiguous block on the file allocation table actually has a sector significantly apart from the rest, which can impact performance of that file.

Solid state disks take this to another level. SSD vendors know darned well that flash ages, so they allocate a much larger chunk of storage for this exact reallocation scheme. The enterprise SSD drives out there have a larger percentage of this reserve space than consumer-grade SSD drives. As blocks wear out, they're substituted in real time from the reallocation block, and since solid-state-drives don't cause I/O latency increases when accessing non-contiguous blocks you'll never know the difference.

The other thing SSDs do is something called wear-leveling. The exact methods vary by manufacturer, but they all do it. The chipset on the drive itself makes sure that no cell get pounded with writes more than others. For instance, It'll write to a new block and mark the old block as free, while handling an 'overwrite' operation. The physical block corresponding to a logical block can change on a daily basis thanks to this. Blocks that get written to constantly, that darned journal again, will be constantly on the move.

The really high end SSD drives have a super-capacitor built into them and onboard cache. The chipset moves the high-write blocks to that cache to further reduce write-wear. The super-cap is there in case of sudden power loss where it'll commit blocks in the cache into flash. When you're paying over $2K for 512GB of space, this is the kind of thing you're buying.

All of these techniques combine to ensure your shiny new SSD will NOT wear itself out after only 100K writes. Depending on your workload, these drives can happily last three years or more. Obviously, if the workload is 100% writes they won't last as long, but you generally don't want SSD for 100% write loads anyway; you use SSDs for the blazing fast reads.

For modern SSD drives:
  • You do NOT need special SSD-aware filesystems. Generally, these are only for stupid SSD drives like a RAID array of MicroSD cards.
  • For most common workloads you do NOT need to worry about write-minimization.
  • They can handle in the millions to tens of millions of write operations per logical block (yes, it'll consume multiple physical blocks over its lifetime for that, but that's how this works).
It's time to move on.

OpenSolaris

| 6 Comments
I've been checking out OpenSolaris for a NAS possibility, and it's pretty nifty. A different dialect than I'm used to, but still nifty.

Unfortunately, it seems to have a nasty problem in file I/O. Here are some metrics (40GB file, with 32K and 64K record-sizes).

OpenFiler                                 random  random
              KB  reclen   write    read    read   write
        41943040      32  296238  118598   15682   62388
        41943040      64  297141  118861   23731   86620

OpenSolaris                               random  random
              KB  reclen   write    read    read   write
        41943040      32  259170 1179515    8458    7461
        41943040      64  244747 1133916   13894   13001
The identical hardware, but different operating system. I've figured out that the stellar Read performance is due to the zfs 'recordsize' being 128k. When I drop it down to 4k, similar to the block-size of XFS in OpenFiler, the Read performance is very similar. What I don't get is what's causing the large difference in random I/O. Random-write is exceedingly bad. With the recordsize dropped to 4K on XFS the random-read gets even worse; I haven't stuck through it enough to see what it does to random-write.

Poking into iostats show that both OpenFiler and OpenSolaris are striping I/O across the four logical disks available to them. I know the storage side is able to pump the I/O, as witnessed by the random-write speed on OpenFiler. The chosen file-size is larger than local RAM so local caching effects are minimized.

As I mentioned back in the know-your-IO article series, random-read is the best analog of the type of I/O pattern your backup process follows when backing up large disorganized piles of files. Cache/pre-fetch will help with this to some extent, but the above numbers give a fair idea as to the lower bound of speed. OpenSolaris is w-a-y too slow. At least, how I've got it configured, which is largely out-of-the-box.

Unfortunately, I don't know if this bottleneck is a driver issue (HP's fault) or an OS issue. I don't know enough of the internals of ZFS to hazard a guess.

Furlough bill

WWU has delivered its guidance to the Office of Financial Management. You can read it here (pdf). In short, we're looking at a choice between one furlough day, or leaving tenure track faculty positions open for the next 18 months. WWU management like the second option, since furloughs are problematical due to the different ways they'd be handled in the different classes of employees.

Nice to know. I really didn't want to muck with 11 such days.

Proxy ARP and netmasks

Proxy ARP is enabled on our routers. I'm 100% certain this has saved the bacon of many of the technicians here on campus, since our subnet is in the Class-B range (140.160.0.0/16), and Windows knows this so applies a default subnet of 255.255.0.0 when setting up a static IP address. Without Proxy ARP, a tech who doesn't fix this will soon find that talking to anything on campus doesn't work, but talking to, say, sysadmin1138.net works just fine. With Proxy ARP, it all works just fine and the tech is never the wiser.

We just had this crop up on a server, only with a twist.

It turns out that the F5 BigIP will also issue a Proxy ARP for Virtual Servers that are configured on it. Which means that for some addresses on some subnets, we actually have two network devices issuing Proxy ARP packets. This, as you can well imagine, is sub-optimal. How it works is this, from a Layer 2 point of view...

Mailer: Who has 140.160.243.16? Tell Mailer.
BigIP: 140.160.243.16 is BigIP
Mailer to BigIP: TCP/25 to 140.160.243.16 [SYN]
Cisco: 140.160.243.16 is Cisco
BigIP to Mailer: TCP/43124 to Mailer [SYN/ACK]
Mailer to Cisco: TCP/25 to 140.160.243 [ACK]
Cisco to Mailer: [Reset]

What you're seeing is an ARP update in the middle of the TCP 3-way handshake. The Mailer server dutifully updates its ARP table for 140.160.243.16, which takes it down a different network path than the BigIP expects, and gets a TCP Reset issued.

What was throwing us on this one was that the connection would reset, but subsequent attempts would work just fine. This is because we were still within the ARP timeout value when the second attempt was made, and things just worked, at least for a little while.

Setting the network mask correctly forces the Mailer to realize that 140.160.243.1 is NOT a local address and the traffic transmits correctly through the gateway, and everything works.