Recently in NSS Category

An older problem

| 1 Comment
I deal with some large file-systems. Because of what we do, we get shipped archives with a lot of data in them. Hundreds of gigs sometimes. These are data provided by clients for processing, which we then do. Processing sometimes doubles, or even triples or more, the file-count in these filesystems depending on what our clients want done with their data.

One 10GB Outlook archive file can contain a huge number of emails. If a client desires these to be turned into .TIFF files for legal processes, that one 10GB .pst file can turn into hundreds of thousands of files, if not millions.

I've had cause to change some permissions at the top of some of these very large filesystems. By large, I mean larger than the big FacShare volume at WWU in terms of file-counts. As this is on a Windows NTFS volume, it has to walk the entire file-system to update permissions changes at the top.

This isn't the exact problem I'm fixing, but it's much like in some companies where granting permissions to specific users is done instead of to groups, and then that one user goes elsewhere and suddenly all the rights are broken and it takes a day and half to get the rights update processed (and heaven help you if it stops half-way for some reason).

Big file-systems take a long time to update rights inheritance. This has been a fact of life on Windows since the NT days. Nothing new here.

But... it doesn't have to be this way. I explain under the cut.

Migrating off of NetWare

It has been around a year since we did the heavy lifting of migrating off of NetWare and retiring our eDirectory tree. By this point last year we had our procedures in place, we just needed to pull the trigger and start moving data around. I was asked to provide some hints about it, but the mail bounced with a 550-mailbox-not-found error *ahem*.

Because it's such a narrowly focused topic, and the WWU people who read me lived through it and therefore already know this stuff, I'm putting the meat of the post under the fold.

You're welcome.

High availability

| 1 Comment
64-bit OES provides some options to highly available file serving. Now that we've split the non-file services out of the main 6-node cluster, all that cluster is doing is NCP and some trivial other things. What kinds of things could we do with this should we get a pile of money to do whatever we want?

Disclaimer: Due to the budget crisis, it is very possible we will not be able to replace the cluster nodes when they turn 5 years old. It may be easier to justify eating the greatly increased support expenses. Won't know until we try and replace them. This is a pure fantasy exercise as a result.

The stats of the 6-node cluster are impressive:
  • 12 P4 cores, with an average of 3GHz per core (36GHz).
  • A total of 24GB of RAM
  • About 7TB of active data
The interesting thing is that you can get a similar server these days:
  • HP ProLiant DL580 (4 CPU sockets)
  • 4x Quad Core Xeon E7330 Processors (2.40GHz per core, 38.4GHz total)
  • 24 GB of RAM
  • The usual trimmings
  • Total cost: No more than $16,547 for us
With OES2 running in 64-bit mode, this monolithic server could handle what six 32-bit nodes are handling right now. The above is just a server that matches the stats of the existing cluster. If I were to really replace the 6 node cluster with a single device I would make a few changes to the above. Such as moving to 32GB of RAM at minimum, and using a 2-socket server instead of a 4-socket server; 8 cores should be plenty for a pure file-server this big.

A single server does have a few things to recommend it. By doing away with the virtual servers, all of the NCP volumes would be hosted on the same server. Right now each virtual-server/volume pair causes a new connection to each cluster node. Right now if I fail all the volumes to the same cluster node, that cluster node will legitimately have on the order of 15,000 concurrent connections. If I were to move all the volumes to a single server itself, the concurrent connection count would drop to only ~2500.

Doing that would also make one of the chief annoyances of the Vista Client for Novell much less annoying. Due to name cache expiration, if you don't look at Windows Explorer or that file dialog in the Vista client once every 10 minutes, it'll take a freaking-long time to open that window when you do. This is because the Vista client has to enumerate/resolve the addresses of each mapped drive. Because of our cluster, each user gets no less than 6 drive mappings to 6 different virtual servers. Since it takes Vista 30-60 seconds per NCP mapping to figure out the address (it has to try Windows resolution methods before going to Novell resolution methods, and unlike WinXP there is no way to reverse that order), this means a 3-5 minute pause before Windows Explorer opens.

By putting all of our volumes on the same server, it'd only pause 30-60 seconds. Still not great, but far better.

However, putting everything on a single server is not what you call "highly available". OES2 is a lot more stable now, but it still isn't to the legendary stability of NetWare 3. Heck, NetWare 6.5 isn't at that legendary stability either. Rebooting for patches takes everything down for minutes at a time. Not viable.

With a server this beefy it is quite doable to do a cluster-in-a-box by way of Xen. Lay a base of SLES10-Sp2 on it, run the Xen kernel, and create four VMs for NCS cluster nodes. Give each 64-bit VM 7.75GB of RAM for file-caching, and bam! Cluster-in-a-box, and highly available.

However, this is a pure fantasy solution, so chances are real good that if we had the money we would use VMWare ESX instead XEN for the VM. The advantage to that is that we don't have to keep the VM/Host kernel versions in lock-step, which reduces downtime. There would be some performance degradation, and clock skew would be a problem, but at least uptime would be good; no need to perform a CLUSTER DOWN when updating kernels.

Best case, we'd have two physical boxes so we can patch the VM host without having to take every VM down.

But I still find it quite interesting that I could theoretically buy a single server with the same horsepower as the six servers driving our cluster right now.

OES2 SP1 ships!

According to this documentation, the storing of NSS/NetWare metadata in xattrs is turned off by default. You turn it on for OES2 servers through the "nss /ListXattrNWMetadata" command. This allows linux level utilities (i.e. cp, tar) to be able to access and copy the NSS metadata. This also allows backup software that isn't SMS enabled for OES2 to be able to backup the NSS information.

This is handy, as HP DataProtector doesn't support NSS backup on Linux. I need to remember this.

A good article on trustees

| 1 Comment
Over on the Novell Cool Solutions site, Marcel Cox just posted an article about how Trustees are handled on the Novell Filesystems (TFS and NFS). If you wanted to know the fundamentals of how ACLs are done on NSS volumes and how it relates to eDirectory, this is a good start.

NetWare and Xen

Here is something I didn't really know about in virtualized NetWare:

Guidelines for using NSS in a virtual environment

Towards the bottom of this document, you get this:

Configuring Write Barrier Behavior for NetWare in a Guest Environment

Write barriers are needed for controlling I/O behavior when writing to SATA and ATA/IDE devices and disk images via the Xen I/O drivers from a guest NetWare server. This is not an issue when NetWare is handling the I/O directly on a physical server.

The XenBlk Barriers parameter for the SET command controls the behavior of XenBlk Disk I/O when NetWare is running in a virtual environment. The setting appears in the Disk category when you issue the SET command in the NetWare server console.

Valid settings for the XenBlk Barriers parameter are integer values from 0 (turn off write barriers) to 255, with a default value of 16. A non-zero value specifies the depth of the driver queue, and also controls how often a write barrier is inserted into the I/O stream. A value of 0 turns off XenBlk Barriers.

A value of 0 (no barriers) is the best setting to use when the virtual disks assigned to the guest server’s virtual machine are based on physical SCSI, Fibre Channel, or iSCSI disks (or partitions on those physical disk types) on the host server. In this configuration, disk I/O is handled so that data is not exposed to corruption in the event of power failure or host crash, so the XenBlk Barriers are not needed. If the write barriers are set to zero, disk I/O performance is noticeably improved.

Other disk types such as SATA and ATA/IDE can leave disk I/O exposed to corruption in the event of power failure or a host crash, and should use a non-zero setting for the XenBlk Barriers parameter. Non-zero settings should also be used for XenBlk Barriers when writing to Xen LVM-backed disk images and Xen file-backed disk images, regardless of the physical disk type used to store the disk images.

Nice stuff there! The "xenblk barriers" can also have an impact on the performance of your virtualized NetWare server. If your I/O stream runs the server out of cache, performance can really suffer if barriers are non-zero. If it fits in cache, the server can reorder the I/O stream to the disks to the point that you don't notice the performance hit.

So, keep in mind where your disk files are! If you're using one huge XFS partition and hosting all the disks for your VM-NW systems on that, then you'll need barriers. If you're presenting a SAN LUN directly to the VM, then you'll need to "SET XENBLK BARRIERS = 0", as they're set to 16 by default. This'll give you better performance.
I posted last week about DataProtector and its Enhanced Incremental Backup. Remember that "enhincrdb" directory I spoke of? Take a look at this:

File sizes in the enhincr directory

See? This is an in-progress count of one of these directories. 1.1 million files, 152MB of space consumed. That comes to an average file-size of 133 bytes. This is significantly under the 4kb block-size for this particular NTFS volume. On another server with a longer serving enhincrdb hive, the average file-size is 831 bytes. So it probably increases as the server gets older.

On the up side, these millions of weensy files won't actually consume more space for quite some time as they expand into the blocks the files are already assigned to. This means that fragmentation on this volume isn't going to be a problem for a while.

On the down side, it's going to park (in this case) 152MB of data on 4.56GB of disk space. It'll get better over time, but in the next 12 months or so it's still going to be horrendous.

This tells me two things:
  • When deciding where to host the enhincrdb hive on a Windows server, format that particular volume with a 1k block size.
  • If HP supported NetWare as an Enhanced Incremental Backup client, the 4kb block size of NSS would cause this hive to grow beyond all reasonable proportions.
Some file-systems have real problems dealing with huge numbers of files in a single directory. Ext3 is one of these, which is why the b-tree hashed indexes were introduced. Reiser does better in this case out of the box. NSS is pretty good about this, as all GroupWise installs before GW became available for non-NetWare platforms created this situation by the sheer design of GW. Unlike NSS, ext3 and reiser have the ability of being formatted with different block-sizes, which makes creating a formatted file-system to host the enhincrdb data easier to correctly engineer.

Since it is highly likely that I'll be using DataProtector for OES2 systems, this is something I need to keep in mind.

DataProtecter 6 has a problem

We're moving our BackupExec environment to HP DataProtector. Don't ask why, it made sense at the time.

Once of the niiiice things about DP is what's called, "Enhanced Incremental Backup". This is a de-duplication strategy, that only backs up files that have changed, and only stores the changed blocks. From these incremental backups you can construct synthetic full backups, which are just pointer databases to the blocks for that specified point-in-time. In theory, you only need to do one full backup, keep that backup forever, do enhanced incrementals, then periodically construct synthetic full backups.

We've been using it for our BlackBoard content store. That's around... 250GB of file store. Rather than keep 5 full 275GB backup files for the duration of the backup rotation, I keep 2 and construct synthetic fulls for the other 3. In theory I could just go with 1, but I'm paranoid :). This greatly reduces the amount of disk-space the backups consume.

Unfortunately, there is a problem with how DP does this. The problem rests on the client side of it. In the "$InstallDir$\OmniBack\enhincrdb" directory it constructs a file hive. An extensive file hive. In this hive it keeps track of file state data for all the files backed up on that server. This hive is constructed as follows:
  • The first level is the mount point. Example: enhincrdb\F\
  • The 2nd level are directories named 00-FF which contain the file state data itself
On our BlackBoard content store, it had 2.7 million files in that hive, and consumed around 10.5GB of space. We noticed this behavior when C: ran out of space. Until this happened, we've never had a problem installing backup agents to C: before. Nor did we find any warnings in the documentation that this directory could get so big.

The last real full backup I took of the content store backed up just under 1.7 million objects (objects = directory entries in NetWare, or inodes in unix-land). Yet the enhincrdb hive had 2.7 million objects. Why the difference? I'm not sure, but I suspect it was keeping state data for 1 million objects that no longer were present in the backup. I have trouble believing that we managed to churn over 60% of the objects in the store in the time I have backups, so I further suspect that it isn't cleaning out state data from files that no longer have a presence in the backup system.

DataProtector doesn't support Enhanced Incrementals for NetWare servers, only Windows and possibly Linux. Due to how this is designed, were it to support NetWare it would create absolutely massive directory structures on my SYS: volumes. The FACSHARE volume has about 1.3TB of data in it, in about 3.3 million directory entries. The average FacStaff User volume (we have 3) has about 1.3 million, and the average Student User volume has about 2.4 million. Due to how our data works, our Student user volumes have a high churn rate due to students coming and going. If FACSHARE were to share a cluster node with one Student user volume and one FacStaff user volume, they have a combined directory-entry count of 7.0 million directory entries. This would generate, at first, a \enhincrdb directory with 7.0 million files. Given our regular churn rate, within a year it could easily be over 9.0 million.

When you move a volume to another cluster node, it will create a hive for that volume in the \enhincrdb directory tree. We're seeing this on the BlackBoard Content cluster. So given some volumes moving around, and it is quite conceivable that each cluster node will have each cluster volume represented in its own \enhincrdb directory. Which will mean over 15 million directory-entries parked there on each SYS volume, steadily increasing as time goes on taking who knows how much space.

And as anyone who has EVER had to do a consistency check of a volume that size knows (be it vrepair, chkdsk, fsck,or nss /poolrebuild), it takes a whopper of a long time when you get a lot of objects on a file-system. The old Traditional File System on NetWare could only support 16 million directory entries, and DP would push me right up to that limit. Thank heavens NSS can support w-a-y more then that. You better hope that the file-system that the \enhincrdb hive is on never has any problems.

But, Enhanced Incrementals only apply to Windows so I don't have to worry about that. However.... if they really do support Linux (and I think they do), then when I migrate the cluster to OES2 next year this could become a very real problem for me.

DataProtector's "Enhanced Incremental Backup" feature is not designed for the size of file-store we deal with. For backing up the C: drive of application servers or the inetpub directory of IIS servers, it would be just fine. But for file-servers? Good gravy, no! Unfortunately, those are the servers in most need of de-dup technology.

BrainShare Thursday

Not a good day. My first course, "Advanced BASH," could more accurately be described as, "BASH scripting tips & tricks". I then proceeded to skip the other three sessions I had signed up for.
  • Novell Open Enterprise Server 2 Interoperability with Windows and AD. All about Domain Services for Windows and Samba. Neither of which we'll ever use. No idea why I wanted to be in this session.
  • Rapid Deployment of ZENworks Configuration Management. Other people around here have suggested that if we haven't moved yet, wait until at least SP3 before moving. If then. So, demotivated. Plus I was rather tired.
  • Configuring Samba on OES2. CIFS will do what we need, I don't need Samba. Don't need this one. Skipped.
DL236: Advanced BASH Course
BASH tips and tricks. I got a lot out of it, but the developers around me were quietly derisive.

ZEN Overview and Features
Not so much with the futures, but it did explain Novell's overall ZEN strategy. It isn't a coincidence that most of Novell's recent purchases have been for ZEN products.

TUT303: OES2 Clusters, from beginning to extremes
This was great. They had a full demo rig, and they showed quite a bit in it. Including using Novell Cluster Services to migrate Xen VM's around. They STRONGLY recommended using AutoYast to set up your cluster nodes to ensure they are simply identical except for the bits you explicitly want different (hostname, IP). And also something else I've heard before, you want one LUN for each NSS Pool. Really. Plus, the presenters were rather funny. A nice cap for the day.

And tonight, Meet the Experts!