Recently in backup Category

Natural inconvenience

| 3 Comments
Because really, 'natural disaster' is not what we're dealing with.

Yes, I felt the earthquake. It was me and the one other ex-west-coastie in the office who realized what that was first. It was quite unmistakable in this unreinforced brick-building. The ground just picked us up and dropped us a few times, the kind of motion a building like this just wasn't designed to handle. Some existing damage to the building may have gotten worse. There was a lot of brick dust thanks to the wood rafters dancing around. That's it.

At home we had two casualties. A drinking glass shook itself off a shelf and took a diver, breaking. And a mason-jar leaned against a cabinet door, so when I opened it last night it fell at me, requiring me to parry it into a metal popcorn bowl (loud!) and causing a large bruise on my left middle finger.

And we have a hurricane heading our way.

I'm not worried about the hurricane. On its current track, if we DO get a dead-center hit the eye will have tracked across 300-400 miles of land and will be significantly weakened ("tropical depression"). Far more likely is a glancing blow where it spins off the coast. Irene is a big storm so we would just get some of the western rain-bands. From what I hear, this'll be a lot like the big Winter rain-storms the PacNW gets, so nothing I haven't lived through before.

Data Protector deduplication

| 1 Comment
When last I looked at HP Data Protector deduplication, I was not impressed. The client-side requirements were a resource hungry joke, and were seriously compromised by Microsoft failover clusters.

I found a use-case for it last week. We have a server in a remote office across a slow WAN link. Backing that up has always been a trial, but I had the right resources to try and use dedupe and at least get the data off-site.

Sometime between then and now (v6.11) HP changed things.

The 'enhincrdb' directory I railed against is empty. Having just finished a full and incremental backup I see that the amount of data transferred over the wire is exactly the same for full and incremental backups, but the amount of data stashed away is markedly different. Apparently the processing to figure out what needs to be in the backup has been moved from the client to the backup server, which make this useless over slow links.

It means that enhanced incremental backups will take just as long as the fulls do, and we don't have time for that on our larger servers. We're still going to stick with an old fashioned Full/Incremental rotation.

It's an improvement in that it doesn't require such horrible client-side resources. However, this implementation still has enough quirks that I just do not see the use-case.

Shopping for a datacenter

| 1 Comment
A nice check-list for things you want to have in your new datacenter was posted today on ServerFault. Some good things in there, and all in one list!

However, there is one thing that is not quite right.

"It should have Halon fire suppression, not sprinklers."
Actually, it should have both FM200 (or equivalent) AND sprinklers, at least according to most fire-codes.

Sprinklers in the datacenter? Yep. One of the last things I did at my old job (11/2003 to be exact) was help move into a newly built datacenter, and I had quite a shock when I saw sprinkler heads over the server racks. I asked my boss about that, she had been neck deep in the building of it, and she replied that yes, it was indeed fire-code and that was new since the last time she helped build a datacenter in the 80's.

Building and fire codes are nigh impossible to get at in their entirety online, it seems they're put behind pay-walls so I can't link directly to them. Considering which governmental agency was going to be putting machines in that room, I consider that fire-code ruling to be highly trustworthy. FM200 protects all that very expensive gear in the datacenter. The sprinklers protect the room and building.

Those sprinklers should be dry-pipe, pre-action sprinklers. No sense tempting fate more than required.

Tape is dead, Long live Disk!

| No Comments
Except if you're using HP Data Protector.

Much as I'd like to jump on the backup-to-disk de-dup bandwagon, I can't. Can't afford it. It all comes down to the cost-per-GB of storage in the backup system.

With tape, Data Protector licenses on the following items:
  • Per tape-drive over 2
  • Per tape library with a capacity between 50 and 250 slots
  • Per tape library that exceeds 250 slots
  • Per media pool with more than some-big-number of tapes
With disk, DP licenses on the following items:
  • Per TB in the backup-to-disk system
Obviously, the Disk side is much easier to license. In our environment we had something like 500 SDLT320 tapes, and our library had 6 drives and 45 slots. We only had to license the 4 extra tape drives.

Then our library started crapping out, and we outgrew it anyway. Prime time to figure out what the future holds for our backup environment. TO DISK!

HOLY CRAP that's expensive.

HP licenses their B2D space by the Terabyte. After you do the math it comes down to about $5/GB. Without using a de-duplication technology, you can easily make 10 copies or more of every bit of data subject to backup. Which means that for every 1 GB of data in the primary storage, 10GB of data is in the B2D system, and that'll set us back a whopping $50/GB. So... about the de-duplication system...

Too bad it doesn't work for non-file data, and kinda sorta explicitly doesn't work for clustered systems. Since 70% or so of our backup data is sourced from clustered file-servers or is non-file data (Exchange, SQL backups), this means the gains from HP's de-dup technology are pretty minor. Looks like we're stuck doing standard backups at $50/GB (or more).

So, about that 'dead' tape technology! We've already shelled out for the tape-drive licenses so that's a sunk cost. The library we want doesn't have enough slots to force us to get that license. All that's left is the media costs. Math math math, and the amortized cost of the entire library and media set comes to about $0.25/GB. Niiice. Factor in the magnification factor, and each 1 GB of backup will cost $2.50/GB, a far, far cry from $50/GB.

We still have SOME backup to disk space. This is needed since these LTO4 drives are HUNGRY critters, and the only way to feed them fast enough to prevent shoe-shining is to back everything up to disk, and then copy the jobs to tape directly from disk. So long as we have a week's worth of free-space, we're good. This is a sunk cost too, happily.

So. To-disk backups may be the greatest thing since the invention of the tape-changing robot, but our software isn't letting us take advantage of it. 

New backup hardware

| No Comments
Friday represented the first production use of our new LTO4-based tape library. This replaced the old SDLT320 based Scalar 100 we've had for entirely too long. The simple fact that all of the media and drives are BRAND NEW should make our completion rate go very close to 100%. This excites me.

Friday we did a backup of our main file-serving cluster and the Blackboard content volume in a single job that streamed to a single tape drive.

Total data backed up: 6.41TB
Total time: 1475 minutes
Speed: 4669 MB/Min, or 77 MB/s

Still not flank speed for LTO4 (that's closer to 120 MB/s) but still markedly faster than the SDLT stuff we had been doing. The similar backup on the Scalar 100 took around 36 hours (2160 minutes) instead of the 24ish hours this one took, and it used 4 tape drives to do it.

Ahhhh, modern technology, how I've desired you.

*pets it*

Now to resist taking a fire ax to the old library. We have to surplus it through official channels, and they won't take it if it has been "obviously defaced". Ah well.

Spent money

| 2 Comments
It has been a week plus since we spent a lot of money and the question has been raised, what are we doing with that storage? Exactly?

It isn't fancy storage. In fact, it is the cheapest performant storage we could budget for. It's not SATA, but it is 7.2K RPM SAS. And there are 35TB of it. It's a server, with direct attached storage. Not a dedicated storage unit. Not fibre channel. An off the shelf server, a high quality RAID card, a bunch of storage shelves, and a pair of network ports.

A final decision hasn't been made yet for how we're presenting this storage to consumers, but iSCSI of some kind is the 90% likely choice. Whether that's Linux (a.k.a. the free option) or something else (a.k.a. the pay option) remains to be seen. The whole point of this storage is to be cheap per GB.

We're also adding a pair of Fibre Channel Drive enclosures to our EVA4400 to provide true high-speed low-latency service at a much reduced cost versus the EVA6100. Yes FC drives are EOL in a very short while, but the EVA4400 doesn't support SAS (yet). This is where our ESX cluster is likely to expand into when the time comes, that kind of stuff.

And a new tape library. It's an HP StorageWorks MSL4048. It is LTO4, fibre-attached, and has lots of slots. The native capacity of this guy is 37.5TB which is a whole lot larger than the 7.9TB of our current SDLT320 unit. It only has two drives for now which will limit flexibility somewhat, but it is upgradeable to four drives later when the money tree starts producing again. If we really need to, we can stack another MSL4048 on top of it for even more storage. Because it is drive-limited, we kind of have to stage all backups to disk and then copy from disk to tape; we won't be doing any backups directly to tape.

When it'll get here is anyone's guess. Purchasing is currently digging out of an avalanche of last-minute orders just like ours, so they're w-a-y backed up down there.

Spending money

| 2 Comments
Today we spent more money in one day than I've ever seen done here. Why? Well substantiated rumor had it that the Governor had a spending freeze directive on her desk. Unlike last year's freeze, this one would be the sort passed down during the 2001-02 recession; nothing gets spent without OFM approval. Veterans of that era noted that such approval took a really long time, and only sometimes came. Office scuttle-butt was mum on whether or not consumable purchases like backup tapes would be covered.

We cut purchase orders today and rushed them through Purchasing. A Purchasing who was immensely snowed under, as can be well expected. I think final signatures get signed tomorrow.

What are we getting? Three big things:
  1. A new LTO4 tape library. I try not to gush lovingly at the thought, but keep in mind I've been dealing with SDLT320 and old tapes. I'm trying not to let all that space go to my head. 2 drives, 40-50 slots, fibre attached. Made of love. No gushing, no gushing...
  2. Fast, cheap storage. Our EVA6100 is just too expensive to keep feeding. So we're getting 8TB of 15K fast storage. We needs it, precious.
  3. Really cheap storage. Since the storage area networking options all came in above our stated price-point, we're ending up with direct-attached. Depending on how we slice it, between 30-35TB of it. Probably software ISCSI and all the faults inherent in the setup. We still need to dicker over software.
But... that's all we're getting for the next 15 months at least. Now when vendors cold call me I can say quite truthfully, "No money, talk to me in July 2011."

The last thing we have is an email archiving system. We already know what we want, but we're waiting on determination of whether or not we can spend that already ear-marked money.

Unfortunately, I'll be finding out a week from Monday. I'll be out of the office all next week. Bad timing for it, but can't be avoided.

Budget plans

| No Comments
Washington State has a $2.6 Billion deficit for this year. In fact, the finance people point out that if something isn't done the WA treasury will run dry some time in September and we'll have to rely on short-term loans. As this is not good, the Legislature is attempting to come up with some way to fill the hole.

As far as WWU is concerned, we know we'll be passed some kind of cut. We don't know the size, nor do we know what other strings may be attached to the money we do get. So we're planning for various sizes of cuts.

One thing that is definitely getting bandied about is the idea of 'sweeping' unused funds at end-of-year in order to reduce the deficits. As anyone who has ever worked in a department subject to a budget knows, the idea of having your money taken away from you for being good with your money runs counter to every bureaucratic instinct. I have yet to meet the IT department that considers themselves fully funded. My old job did that; our Fiscal year ended 12/15, which meant that we bought a lot of stuff in October and November with the funds we'd otherwise have to give back (a.k.a. "Christmas in October"). Since WWU's fiscal year starts 7/1, this means that April and May will become 'use it or lose it' time.

Sweeping funds is a great way to reduce fiscal efficiency.

In the end, what this means is that the money tree is actually producing at the moment. We have a couple of crying needs that may actually get addressed this year. It's enough to completely fix our backup environment, OR do some other things. We still have to dicker over what exactly we'll fix. The backup environment needs to be made better at least somewhat, that much I know. We have a raft of servers that fall off of cheap maintenance in May (i.e. they turn 5). We have a need for storage that costs under $5/GB but is still fast enough for 'online' storage (i.e. not SATA). As always, the needs are many, and the resources few.

At least we HAVE resources at the moment. It's a bad sign when you have to commiserate with your end-users over not being able to do cool stuff, or tell researchers they can't do that particular research since we have no where to store their data. Baaaaaad. We haven't quite gotten there yet, but we can see it from where we are.

The costs of backup upgrades

| 1 Comment
Our tape library is showing its years, and it's time to start moving the mountain required to get it replaced with something. So this afternoon I spent some quality time with google, a spread-sheet, and some oldish quotes from HP. The question I was trying to answer is what's the optimal mix of backup to tape and backup to disk using HP Data Protector. The results were astounding.

Data Protector licenses backup-to-disk capacity by the amount of space consumed in the B2D directories. You have 15TB parked in your backup-to-disk archives, you pay for 15TB of space.

Data Protector has a few licenses for tape libraries. They have costs for each tape drive over 2, another license for libraries with between 61-250 slots, and another license for unlimited slots. There is no license for fibre-attached libraries like BackupExec and others do.

Data Protector does not license per backed up host, which is theoretically a cost savings.

When all is said and done, DP costs about $1.50 per GB in your backup to disk directories. In our case the price is a bit different since we've sunk some of those costs already, but they're pretty close to a buck fiddy per GB for Data Protector licensing alone. I haven't even gotten to physical storage costs yet, this is just licensing.

Going with an HP tape library (easy for me to spec, which is why I put it into the estimates), we can get an LTO4-based tape library that should meet our storage growth needs for the next 5 years. After adding in the needed DP licenses, the total cost per GB (uncompressed, mind) is on the order of $0.10 per GB. Holy buckets!

Calming down some, taking our current backup volume and apportioning the price of largest tape library I estimated over that backup volume and the price rises to $1.01/GB. Which means that as we grow our storage, the price-per-GB drops as less of the infrastructure is being apportioned to each GB. That's a rather shocking difference in price.

Clearly, HP really really wants you to use their de-duplication features for backup-to-disk. Unfortunately for HP, their de-duplication technology has some serious deficiencies when presented with our environment so we can't use it for our largest backup targets.

But to answer the question I started out with, what kind of mix should we have, the answer is pretty clear. As little backup-to-disk space as we can get away with. The stuff has some real benefits, as it allows us to stage backups to disk and then copy to tape during the day. But for long term storage, tape is by far the more cost-effective storage medium. By far.

Bad tapes

| 4 Comments
It seems that HP Data Protector and BackupExec 10 have different opinions on what constitutes a bad tape. BackupExec seems to survive them better. This means that as we cycle old media into the new Data Protector environment we're getting the occasional bad tape. We've been averaging 3 bad tapes per 40 tape rotation.

While that not may sound like a lot, it really is. Our very large backups are extremely vulnerable to bad tapes, since all it takes is one bad tape to kill an entire backup session. When you're doing a backup of 1.3TB of data, you don't want those backups to fail.

Take that 1.3TB backup. We're backing up to SDL320 media, so we're averaging somewhere between 180GB and 220GB a tape depending on what kinds of files are being backed up. So that's 7-8 tapes for this one backup. How likely is it that this 7 to 8 tape backup will include at least one of the 3 bad tapes?

When the first tape is picked the chance is 3 in 40 (7.5%).
When the second tape is picked, assuming the first tape was good, the chance is 3 in 39 (7.69%).
When the third tape is picked, presuming the first two were good, the chance is 3 in 38 (7.89%).
When the 7th tape is picked, presuming the first six were good, the chance has increased to 3 in 34 (8.82%)

8.82% doesn't sound like much. However, the probability is cumulative. The true probability can be computed:

(3/40)+(3/39)+(3/38)+(3/37)+(3/36)+(3/35)+(3/34) = 0.56923444 or 56.92%

So with 3 bad tapes in a given 40 tape set, the chance of this one 7 tape backup having at least one of them in the tape set is over 50%. For an 8 tape backup the probability increases to 66.01%.

The true probability is a different number, since these backups are taken concurrent with other backups. So when the 7th tape gets picked, the number of available tapes is much less than 34, and the number of bad tapes still waiting to be found may not be 3. Also, these backups are mutliplexed so the true tape set may be as high as 9 tapes for this backups if that one backup target is slow in sending data to the backup server.

So the true probability is not 56.92%, it changes on a week to week basis. However, 56.92% (or 66%) is a good baseline. Some weeks it'll be a lot more. Others, such as weeks where the bad tapes are found by other processes and the target server is streaming fast, less.

We have a couple more weeks until we've cycled through all of our short-retention media. At that point our error rate should drop a lot. Until then, it's like dodging artillery shells.

Other Blogs

My Other Stuff

Monthly Archives