Tuesday, July 21, 2009

Digesting Novell financials

It's a perennial question, "why would anyone use Novell any more?" Typically coming from people who only know Novell as "That NetWare company," or perhaps, "the company that we replaced with Exchange." These are the same people who are convinced Novell is a dying company who just doesn't know it yet.

Yeah, well. Wrong. Novell managed to turn the corner and wean themselves off of the NetWare cash-cow. Take the last quarterly statement, which you can read in full glory here. I'm going to excerpt some bits, but it'll get long. First off, their description of their market segments. I'll try to include relevant products where I know them.

We are organized into four business unit segments, which are Open Platform Solutions, Identity and Security Management, Systems and Resource Management, and Workgroup. Below is a brief update on the revenue results for the second quarter and first six months of fiscal 2009 for each of our business unit segments:



Within our Open Platform Solutions business unit segment, Linux and open source products remain an important growth business. We are using our Open Platform Solutions business segment as a platform for acquiring new customers to which we can sell our other complementary cross-platform identity and management products and services. Revenue from our Linux Platform Products category within our Open Platform Solutions business unit segment increased 25% in the second quarter of fiscal 2009 compared to the prior year period. This product revenue increase was partially offset by lower services revenue of 11%, such that total revenue from our Open Platform Solutions business unit segment increased 18% in the second quarter of fiscal 2009 compared to the prior year period.

Revenue from our Linux Platform Products category within our Open Platform Solutions business unit segment increased 24% in the first six months of fiscal 2009 compared to the prior year period. This product revenue increase was partially offset by lower services revenue of 17%, such that total revenue from our Open Platform Solutions business unit segment increased 15% in the first six months of fiscal 2009 compared to the prior year period.

[sysadmin1138: Products include: SLES/SLED]



Our Identity and Security Management business unit segment offers products that we believe deliver a complete, integrated solution in the areas of security, compliance, and governance issues. Within this segment, revenue from our Identity, Access and Compliance Management products increased 2% in the second quarter of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 45%, such that total revenue from our Identity and Security Management business unit segment decreased 16% in the second quarter of fiscal 2009 compared to the prior year period.

Revenue from our Identity, Access and Compliance Management products decreased 3% in the first six months of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 40%, such that total revenue from our Identity and Security Management business unit segment decreased 18% in the first six months of fiscal 2009 compared to the prior year period.

[sysadmin1138: Products include: IDM, Sentinal, ZenNAC, ZenEndPointSecurity]



Our Systems and Resource Management business unit segment strategy is to provide a complete “desktop to data center” offering, with virtualization for both Linux and mixed-source environments. Systems and Resource Management product revenue decreased 2% in the second quarter of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 10%, such that total revenue from our Systems and Resource Management business unit segment decreased 3% in the second quarter of fiscal 2009 compared to the prior year period. In the second quarter of fiscal 2009, total business unit segment revenue was higher by 8%, compared to the prior year period, as a result of our acquisitions of Managed Object Solutions, Inc. (“Managed Objects”) which we acquired on November 13, 2008 and PlateSpin Ltd. (“PlateSpin”) which we acquired on March 26, 2008.

Systems and Resource Management product revenue increased 3% in the first six months of fiscal 2009 compared to the prior year period. The total product revenue increase was partially offset by lower services revenue of 14% in the first six months of fiscal 2009 compared to the prior year period. Total revenue from our Systems and Resource Management business unit segment increased 1% in the first six months of fiscal 2009 compared to the prior year period. In the first six months of fiscal 2009 total business unit segment revenue was higher by 12% compared to the prior year period as a result of our Managed Objects and PlateSpin acquisitions.

[sysadmin1138: Products include: The rest of the ZEN suite, PlateSpin]



Our Workgroup business unit segment is an important source of cash flow and provides us with the potential opportunity to sell additional products and services. Our revenue from Workgroup products decreased 14% in the second quarter of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 39%, such that total revenue from our Workgroup business unit segment decreased 17% in the second quarter of fiscal 2009 compared to the prior year period.

Our revenue from Workgroup products decreased 12% in the first six months of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 39%, such that total revenue from our Workgroup business unit segment decreased 15% in the first six months of fiscal 2009 compared to the prior year period.

[sysadmin1138: Products include: Open Enterprise Server, GroupWise, Novell Teaming+Conferencing,

The reduction in 'services' revenue is, I believe, a reflection in a decreased willingness for companies to pay Novell for consulting services. Also, Novell has changed how they advertise their consulting services which seems to also have had an impact. That's the economy for you. The raw numbers:


Three months ended


April 30, 2009

April 30, 2008

(In thousands)


Net revenue
Gross
profit


Operating
income (loss)


Net revenue
Gross
profit


Operating
income (loss)

Open Platform Solutions


$ 44,112
$ 34,756

$ 21,451

$ 37,516
$ 26,702

$ 12,191

Identity and Security Management



38,846

27,559


18,306


46,299

24,226


12,920

Systems and Resource Management



45,354

37,522


26,562


46,769

39,356


30,503

Workgroup



87,283

73,882


65,137


105,082

87,101


77,849

Common unallocated operating costs





(3,406 )

(113,832 )



(2,186 )

(131,796 )























Total per statements of operations


$ 215,595
$ 170,313

$ 17,624

$ 235,666
$ 175,199

$ 1,667



























Six months ended


April 30, 2009

April 30, 2008

(In thousands)


Net revenue
Gross
profit


Operating
income (loss)


Net revenue
Gross
profit


Operating
income (loss)

Open Platform Solutions


$ 85,574
$ 68,525

$ 40,921

$ 74,315
$ 52,491

$ 24,059

Identity and Security Management



76,832

52,951


35,362


93,329

52,081


29,316

Systems and Resource Management



90,757

74,789


52,490


90,108

74,847


58,176

Workgroup



177,303

149,093


131,435


208,840

173,440


155,655

Common unallocated operating costs





(7,071 )

(228,940 )



(4,675 )

(257,058 )























Total per statements of operations


$ 430,466
$ 338,287

$ 31,268

$ 466,592
$ 348,184

$ 10,148

So, yes. Novell is making money, even in this economy. Not lots, but at least they're in the black. Their biggest growth area is Linux, which is making up for deficits in other areas of the company. Especially the sinking 'Workgroup' area. Once upon a time, "Workgroup," constituted over 90% of Novell revenue.
Revenue from our Workgroup segment decreased in the first six months of fiscal 2009 compared to the prior year period primarily from lower combined OES and NetWare-related revenue of $13.7 million, lower services revenue of $10.5 million and lower Collaboration product revenue of $6.3 million. Invoicing for the combined OES and NetWare-related products decreased 25% in the first six months of fiscal 2009 compared to the prior year period. Product invoicing for the Workgroup segment decreased 21% in the first six months of fiscal 2009 compared to the prior year period.
Which is to say, companies dropping OES/NetWare constituted the large majority of the losses in the Workgroup segment. Yet that loss was almost wholly made up by gains in other areas. So yes, Novell has turned the corner.

Another thing to note in the section about Linux:
The invoicing decrease in the first six months of 2009 reflects the results of the first quarter of fiscal 2009 when we did not sign any large deals, many of which have historically been fulfilled by SUSE Linux Enterprise Server (“SLES”) certificates delivered through Microsoft.
Which is pretty clear evidence that Microsoft is driving a lot of Novell's Operating System sales these days. That's quite a reversal, and a sign that Microsoft is officially more comfortable with this Linux thing.

Labels: , , , , , , , ,


Tuesday, February 17, 2009

tsatest and incrementals

Today I learned how to tell TSATEST to do an incremental backup. I also learned that the /path parameter requires the DOS namespace name. Example:

tsatest /V=SHARE: /path=FACILI~1 /U=.username.for.backup /c=2

That'll do an incremental (files with the Archive bit set) backup of that specific directory, on that specific volume.

Labels: , , ,


Wednesday, February 11, 2009

High availability

64-bit OES provides some options to highly available file serving. Now that we've split the non-file services out of the main 6-node cluster, all that cluster is doing is NCP and some trivial other things. What kinds of things could we do with this should we get a pile of money to do whatever we want?

Disclaimer: Due to the budget crisis, it is very possible we will not be able to replace the cluster nodes when they turn 5 years old. It may be easier to justify eating the greatly increased support expenses. Won't know until we try and replace them. This is a pure fantasy exercise as a result.

The stats of the 6-node cluster are impressive:
  • 12 P4 cores, with an average of 3GHz per core (36GHz).
  • A total of 24GB of RAM
  • About 7TB of active data
The interesting thing is that you can get a similar server these days:
  • HP ProLiant DL580 (4 CPU sockets)
  • 4x Quad Core Xeon E7330 Processors (2.40GHz per core, 38.4GHz total)
  • 24 GB of RAM
  • The usual trimmings
  • Total cost: No more than $16,547 for us
With OES2 running in 64-bit mode, this monolithic server could handle what six 32-bit nodes are handling right now. The above is just a server that matches the stats of the existing cluster. If I were to really replace the 6 node cluster with a single device I would make a few changes to the above. Such as moving to 32GB of RAM at minimum, and using a 2-socket server instead of a 4-socket server; 8 cores should be plenty for a pure file-server this big.

A single server does have a few things to recommend it. By doing away with the virtual servers, all of the NCP volumes would be hosted on the same server. Right now each virtual-server/volume pair causes a new connection to each cluster node. Right now if I fail all the volumes to the same cluster node, that cluster node will legitimately have on the order of 15,000 concurrent connections. If I were to move all the volumes to a single server itself, the concurrent connection count would drop to only ~2500.

Doing that would also make one of the chief annoyances of the Vista Client for Novell much less annoying. Due to name cache expiration, if you don't look at Windows Explorer or that file dialog in the Vista client once every 10 minutes, it'll take a freaking-long time to open that window when you do. This is because the Vista client has to enumerate/resolve the addresses of each mapped drive. Because of our cluster, each user gets no less than 6 drive mappings to 6 different virtual servers. Since it takes Vista 30-60 seconds per NCP mapping to figure out the address (it has to try Windows resolution methods before going to Novell resolution methods, and unlike WinXP there is no way to reverse that order), this means a 3-5 minute pause before Windows Explorer opens.

By putting all of our volumes on the same server, it'd only pause 30-60 seconds. Still not great, but far better.

However, putting everything on a single server is not what you call "highly available". OES2 is a lot more stable now, but it still isn't to the legendary stability of NetWare 3. Heck, NetWare 6.5 isn't at that legendary stability either. Rebooting for patches takes everything down for minutes at a time. Not viable.

With a server this beefy it is quite doable to do a cluster-in-a-box by way of Xen. Lay a base of SLES10-Sp2 on it, run the Xen kernel, and create four VMs for NCS cluster nodes. Give each 64-bit VM 7.75GB of RAM for file-caching, and bam! Cluster-in-a-box, and highly available.

However, this is a pure fantasy solution, so chances are real good that if we had the money we would use VMWare ESX instead XEN for the VM. The advantage to that is that we don't have to keep the VM/Host kernel versions in lock-step, which reduces downtime. There would be some performance degradation, and clock skew would be a problem, but at least uptime would be good; no need to perform a CLUSTER DOWN when updating kernels.

Best case, we'd have two physical boxes so we can patch the VM host without having to take every VM down.

But I still find it quite interesting that I could theoretically buy a single server with the same horsepower as the six servers driving our cluster right now.

Labels: , , , , , , ,


Tuesday, December 09, 2008

The price of storage

I've had cause to do the math lately, which I'll spare you :). But as of the best numbers I have, the cost of 1GB of space on the EVA6100 is about $16.22. Probably more, since this 6100 was created out of the carcass of an EVA3000, and I don't know what percentage of parts from the old 3000 are still in the 6100 and thus can't apportion the costs right.

For the EVA4400, which we have filled with FATA drives, the cost is $3.03.

Suddenly, the case for Dynamic Storage Technology (formerly known as Shadow Volumes) in OES can be made in economic terms. Yowza.

The above numbers do not include backup rotation costs. Those costs can vary from $3/GB to $15/GB depending on what you're doing with the data in the backup rotation.

Why is the cost of the EVA6100 so much greater than the EVA4400?
  1. The EVA6100 uses 10K RPM 300GB FibreChannel disks, where the the EVA4400 uses 7.2K RPM 1TB (or is it 450GB?) FATA drives. The cost-per-gig on FC is vastly higher than it is on fibre-ATA.
  2. Most of the drives in the EVA6100 were purchased back when 300GB FC drives cost over $2000 each.
  3. The EVA6100 controller and cabinets just plain cost more than the EVA4400, because it can expand farther.
To put it into a bit of perspective, lets take the example of a 1TB volume of, "unorganized file data", the seemingly official term for "file-server". If you place that 1TB of data on the EVA6100, that data consumes $16609.28 worth of storage. So what if 70% of that data hasn't been modified in a year (not unreasonable), and is then put on the EVA4400 instead? So you'd have 307GB on the 6100 and 717GB on the 4400. Your storage cost now drops to $5909.75. That's real money.

Labels: , , , ,


Wednesday, December 03, 2008

Infiltrating the market

Over on The Open Road there is a very interesting blog post. It talks about how Microsoft and Red Hat approach the market, and touches on Novell.
Microsoft offers a full ecosystem of software to would-be buyers, but its greatest success may actually result from its strategy to present customers with an "and" decision when initially purchasing Microsoft technology, rather than a difficult "or" decision.
And I really see this. The argument has been made internally that what you get from a Microsoft Enterprise CAL is worlds above what we can get from a Novell academic seat license, which follows into cost-effective discussions (and not good ones). It is soooooo easy to go all Microsoft, whereas a pure Linux solution requires a lot of stitching, and translation-glue.

The article goes on to point out that Red Hat's targeting people looking to do forklift upgrades from Unix to Linux. And then points out that Microsoft wins more of that traffic than Red Hat does, by a good margin. Largely because the Microsoft family of products is very complete.

As it points out, Novell figured this out a few years ago when they launched their collaboration with Microsoft. The fruits of which arrived today with OES2 SP1, and the Novell CIFS stack and Domain Services for Windows. This allows OES2 to do something you can't do with Samba (yet), pretend to be a full up AD Domain Controller.

And yeah, Novell's current marketing slogan is, "Making IT work as One," which is a clear embracing of the "and" concept described. If they could make DSfW work on plain SLES, it may make it an even more attractive product for people.

Labels: , ,


OES2 SP1 ships!

Full announcement.

It's out!

Labels: , , , , , ,


Tuesday, December 02, 2008

The NetWare 7 that never was

My last post generated some comments lamenting where NetWare has gone. I hear ya.

I have friends and have spoken with people at BrainShare who were closer to things than I was regarding how the next version of NetWare evolved. And to be truthful, it sounded a lot like how Microsoft moved from XP to Vista. If you'll recall, "the version of Windows after XP," was something of a moving target for many years. I recall media reports of Microsoft scrapping the whole project and starting afresh at least once.

My very first BrainShare was 2001, and that was the release party for NetWare 6. It was in 2003 when Novell bought Ximian, and bought SuSE, so it is clear when Novell probably decided to bet the house on this whole Linux thing. Yet at BS01 there was talk about NW7, or if there would be a NW6.1 version out. The rumors I remember from back then had NW7 being a progression towards a more application-friendly environment. I also remember hearing the L word around once or twice.

What we actually got was NetWare 6.5, which solidified NetWare 6 and made the core services better and more mature. What it wasn't was any more application friendly than NetWare 6 was (or even NetWare 5.1 for that matter). NetWare 6.5 released in August of 2003, the same month as the Ximian purchase. This is what tells me that Novell had decided on a path for NetWare 7, and it was green, not red. Open Enterprise Server arrived in 2005, which gives OES a solid year and a half dev-time between when SuSE was bought and when we started seeing public betas of OES. The NetWare version of OES was NetWare 6.5 SP3.

What happened to NetWare 7? It got lost on the roadmap. When NW6 came out, Novell probably had 6.5 on the roadmap as the next rev, with NW7 next down. The rumors we were hearing were very provisional, as the spot on the map held by NW7 was at least 3 years away. Sometime between BrainShare 2001 and when Novell started buying its way into the Linux world NW7 was dropped and the decision was made to port to a completely different Kernel. That decision was probably made in the summer of 2003, as the NetWare 6.5 development was entering final beta, and the task of allocating developer resources for the next full rev needed to be made.

Which brings us to today. OES2 SP1 is going to drop any day now, probably in time for Novell's quarterly earnings report. SP1 finally brings the Linux-kernel 'NetWare Services' to feature-comparable with the NetWare kernel. There are still a few things missing, like an eDirectory integrated SLP server, but now all the major points are covered. If you count it up, this has taken Novell a bit over 5 years to get to this point.

In my opinion, that's about right for an organization the size of Novell. Porting over the proprietary NetWare services to completely new kernel requires a LOT of developer attention, and Novell is a lot smaller than Microsoft. Also of note, it took Microsoft 5 years to give us Vista after XP, including the presumed nuke-and-rewrite they did. Novell got a boost in that they had already ported eDirectory to Linux, so that helped out the NCP side. But that didn't help the NSS folks, who had to figure out a way to do a NetWare-style rich metadata file-system on a kernel and driver model that expects POSIX-spartan file-systems. The problems Novell had with this were amply displayed in the performance problems reported with OES1-FCS. Samba doesn't scale to the same levels as CIFS-on-NetWare did, so that meant Novell had to create their own CIFS stack from scratch. The AFP stack on Linux is a joke, and the resurgence of Apple since 2003 meant they had to do something about that as well; by making a proprietary AFP stack. All of this represents nuke-and-rebuild-from-spec, which takes time.

Novell probably should have started the migration in 2000 instead of 2003. They already knew that Exchange 5.5 upgrades were driving a LOT of customers into Active Directory, which was triggering migrations away from NetWare. But, there are business concerns here. Novell managed to survive the fall of NetWare by diversifying their product portfolio enough that GroupWise, Zen, and Identity Management could support the company. It took until this year to return to the black, but they did it. Had they shot the NetWare cash cow two years earlier, it is entirely possible that Novell couldn't have survived the lean years.

Labels: , , , ,


Monday, November 17, 2008

Signs and portents

Last Thursday I was over on download.novell.com looking for an eDirectory patch. I was staging up a new NetWare box and needed to see what the latest edir levels were. I knew 8.8.3 came out in August, and we're not there yet, so I needed 8.8.2 ftf2. However, I noticed that one of the searchable versions was 8.8.4. There was nothing in the list, but it was an option. It's not there right now, but it was then.

Thus emboldened I checked around a few more places. NetWare 6.5 SP8 was in the list, and still is right now. As is Open Enterprise Server 2 SP1. Both have the public betas posted, though.

But 8.8.4 was there. I saw it. Must have been a test or something. All this tells me that OES2 SP1 (a.k.a. NW65SP8) is just around the corner. Since we were told back at BrainShare that Sp1 would be in the Q4 time-frame, it's about due.

Labels: , , ,


Monday, November 10, 2008

NetStorage, WebDav, and Vista

I figured out how to get it working! You need KB907306. This updates the Web Folders in Vista to support how Novell does WebDav through NetStorage.

In our case you'll also need to add the CA that serves the SSL certificate that's on top of NetStorage (a.k.a. MyFiles). But, it works.

Labels: , , , ,


Monday, October 20, 2008

Dorm printing

On my post about finally running vista patrickbuller asked:
So you have printers that students in the dorms can print to? Wow. Do you audit all those and charge the numbers of pages against the student?
The answer to that is that we make big use of AND Technology's PCounter product. When paired with their PrintStations, it makes a very nice way to put a lid on unrestricted 'free' printing in the dorms. The PrintStations also make sure that only jobs people want to pick up get printed, which saves a serious amount of paper.

PCounter is core to our student printing. We'll only move our NDPS/iPrint infrastructure over to OES2-linux when Pcounter is supported on that platform, not before. We'll keep a 2 node NetWare cluster around just for printing if we have to. Since accounting support is one of the features that's supposed to be in OES2-SP1, it is my hope that PCounter will support OES2-Linux within a year after SP1's release. But I haven't heard any specifics.

Labels: , , , , ,


Wednesday, October 15, 2008

OES2 SP1 (public beta) has been posted

The public beta of OES2 SP1 has been posted.

I believe the NDA has lifted, but I'm not 100% on that. Will check. But, some of the new stuff in SP1:
  • An AFP stack that doesn't suck. Or more specifically, an AFP stack that scales beyond 100 users and is eDirectory integrated.
  • A new CIFS stack written by Novell, so it can scale well past the Samba limit.
  • A migration toolkit in one UI, rather than a cluster of scripts.
  • A new version of iFolder
  • EDirectory integrated DNS/DHCP. But no eDir integrated SLP yet, open-source politics you know.
  • IIRC a beta of eDir 8.8 SP4.
  • The ability to put iPrint-for-Linux on NSS volumes (handy for Clustering).
  • And lots more I can't remember off the top of my head.
Go forth, and have fun. There is a beta-feedback box on the beta page I linked to above in case you find a bug and want to tell Novell about it.

One thing I think it is safe to say, is that even though it says "Beta4" on it, it's really a release-candidate. Only major bugs are getting quashed right now. UI freeze was a month or more ago, and strange, annoying behaviors may get "fixed in doc" rather than getting true fixes which will have to wait for SP2. Still report them anyway, since it'll go on the list to fix in the next SP.

Labels: , , , , ,


Tuesday, October 07, 2008

Erm, about the budget

From an email sent to all points from the U President this afternoon:

In the OFM spreadsheets received today, we were stunned to find that targets had been set for higher education. Western, today, is now expected, from the sorts of measures outlined in the August 4 memorandum, to "save" $1,827,000 in the current fiscal year. (This major reduction applies across all budgets, including instructional budgets.)

Add that to the earlier number, and our total budget reduction is NOT the $176,000 representing 1% of non-instructional budgets. It is $2,003,682.

Aaaaaand...

Further, we have been advised to expect these reductions to be permanent; that is, to also be a part of our 2009-11 budget.

Pardon me whilst I mutter things.

This means that it is nearly certain that we will NOT be getting any new hardware for the Novell cluster next summer. We'll have to do it on hardware we already own right now. This means I won't be able to partake of that lovely 64-bit goodness. Drat drat drat.

We're already under-funded for where we need to be, this won't help. Even with the storage arrays we just bought, in terms of total disk-space we've managed to fully commit all of it. There is no excess capacity. What's more, there is no easy way to ADD new capacity since any significant amounts will require purchasing new storage shelves.

In the intermediate term, this means that WWU will now descend into bureaucratic charge-back warfare. As service-providing departments like ours try to find ways to finance the needed growth, we'll start being hard-ass about charging for exceptional services. And they'll do it to us too. So if the College of Arts and Sciences comes to us and asks us for space to host 2TB of, say, NASA data, we'll have to bill them for it. And that cost will be a 'total cost' which will by necessity include the backup costs. In return, if we need 16 ethernet jacks added to the AC datacenter, Telecom may start billing us.

And I get a new boss Thursday. Happily, since there is overlap between outgoing and incoming they've been briefing a lot. This is to prepare the new guy for the challenges he'll face in his first few weeks flying solo. There may even be the odd phone-call for advice, we'll see.

Gonna get real interesting around here.

Labels: , , ,


Wednesday, September 24, 2008

Fickle fortune

I lost a RAID card in one of my Beta servers. Crap. These beasties are all old beasties since that's the only hardware that could be released for the beta. And with crap servers, comes a crap failure rate. This is the second RAID card I've lost, and I've lost one hard-drive too. It isn't common to lose more RAID cards than hard-drives. Arrg.

This puts a kink into things. This was going to be an edirectory host, so I could host my replicas on one set of servers and abuse the crap out of the non-replica application servers. I may have to dual host. Icky icky.

Labels: , ,


Wednesday, September 10, 2008

That darned budget

This is where I whine about not having enough money.

It has been a common complaint amongst my co-workers that WWU wants enterprise level service for a SOHO budget. Especially for the Win/Novell environments. Our Solaris stuff is tied in closely to our ERP product, SCT Banner, and that gets big budget every 5 years to replace. We really need the same kind of thing for the Win/Novell side of the house, such as this disk-array replacement project we're doing right now.

The new EVAs are being paid for by Student Tech Fee, and not out of a general budget request. This is not how these devices should be funded, since the scope of this array is much wider than just student-related features. Unfortunately, STF is the only way we could get them funded, and we desperately need the new arrays. Without the new arrays, student service would be significantly impacted over the next fiscal year.

The problem is that the EVA3000 contains between 40-45% directly student-related storage. The other 55-60% is Fac/Staff storage. And yet, the EVA3000 was paid for by STF funds in 2003. Huh.

The summer of 2007 saw a Banner Upgrade Project, when the servers that support SCT Banner were upgraded. This was a quarter million dollar project and it happens every 5 years. They also got a disk-array upgrade to a pair of StorageTek (SUN, remember) arrays, DR replicated between our building and the DR site in Bond Hall. I believe they're using Solaris-level replication rather than Array-level replication.

The disk-array upgrade we're doing now got through the President's office just before the boom went down on big expensive purchases. It languished in the Purchasing department due to summer-vacation related under-staffing. I hate to think how late it would have gone had it been subjected to the added paperwork we now have to go through for any purchase over $1000. Under no circumstances could we have done it before Fall quarter. Which would have been bad, since we were too short to deal with the expected growth of storage for Fall quarter.

Now that we're going deep into the land of VMWare ESX, centralized storage-arrays are line of business. Without the STF funded arrays, we'd be stuck with "Departmental" and "Entry-level" arrays such as the much maligned MSA1500, or building our own iSCSI SAN from component parts (a DL385, with 2x 4-channel SmartArray controller cards, 8x MSA70 drive enclosures, running NetWare or Linux as an iSCSI target, with bonded GigE ports for throughput). Which would blow chunks. As it is, we're still stuck using SATA drives for certain 'online' uses, such as a pair of volumes on our NetWare cluster that are low usage but big consumers of space. Such systems are not designed for the workloads we'd have to subject them to, and are very poor performers when doing things like LUN expansions.

The EVA is exactly what we need to do what we're already doing for high-availability computing, yet is always treated as an exceptional budget request when it comes time to do anything big with it. Since these things are hella expensive, the budgetary powers-that-be balk at approving them and like to defer them for a year or two. We asked for a replacement EVA in time for last year's academic year, but the general-budget request got denied. For this year we went, IIRC, both with general-fund and STF proposals. The general fund got denied, but STF approved it. This needs to change.

By October, every person between and Governor Gregoir will be new. My boss is retiring in October. My grandboss was replaced last year, my great grand boss also has been replaced in the last year, and the University President stepped down on September 1st. Perhaps the new people will have a broader perspective on things and might permit the budget priorities to be realigned to the point that our disk-arrays are classified as the critical line-of-business investments they are.

Labels: , , , , , , , , , , , ,


Thursday, August 14, 2008

Virtualization and Fileservers

There are some workloads that fit well within VM of any kind, and others that are very tricky. Fileservers are one area that are not good candidates for VM. In some cases they qualify as highly transactional. In others, the memory required to do fileserving well makes them very expensive. When you can fit 40 web-servers on a VM host, but only 4 fileservers, it makes the calculus obvious.

This is on my mind since we're running into memory problems on our NetWare cluster. We've just plain outgrown the 32-bit memory space for file-cache. NW can use memory above the 4GB line, it does have PAE support, but memory access above there is markedly slower than it is below the line. Last I heard the conventional wisdom is that 12GB is about the point where it starts earning you performance gains again. eek!

So, I'm looking forward to 64-bit memory spaces and OES2. 6GB should do us for a few years. That said, 6GB of actually-used RAM in a virtual-host means that I could fit... two of them on a VM server with 16GB of RAM.

16GB of RAM in, say, an ESX cluster is enough to host 10 other servers. Especially with memory deduplication. In the case of my hypothetical 6GB file-servers, 5.5GB of that RAM will be consumed by file-cache that will be unique to that server and thus very little gains from memory de-dup.

In the end, how well a fileserver fits in a VM environment is based on how large of a 'working set' your users have. If the working set it large enough, it can mean that you'll get small gains for virtualization. However, I realize fileserving on the scale we do it is somewhat rare, so for departmental fileservers VM can be a good-sized win. As always, know your environment.

In light of the budgetary woes we'll be having, I don't know what we'll do. Last I heard the State is projected to have a 2.7 billion deficit for the 2009-2011 (fiscal year starts July 1) budget cycle. So it may very well be possible that the only way I'll get access to 64-bit memory spaces is in an ESX context. That may mean a 6 node cluster on 3 physical hosts. And that's assuming I can get new hardware at all. If it gets bad enough I'll have to limp along until 2010 and play partitioning games to load-balance my data-loads across all 6 nodes. By 2011 all of our older hardware falls off of cheap-maintenance and we'll have to replace it, so worst-case that's when I can do my migration to 64-bit. Arg.

Labels: , , , ,


Monday, August 11, 2008

Novell Client for Vista, the ecosystem

I just reported a bug in the beta that surprised me. I can't talk details about it, but it strikes me as the kind of bug that should have been at least reported shortly after the client released. Perhaps it was just so overall buggy that it got lost in the forest, but still. The Vista client has been out for some time now.

Having said the following rant several times over the past few days, I figure it's time to post it ;).

The problem we're running in to is that the number of users of the Vista Client is a small, small sub-set of the overall users of the Novell Client, which are by now a minority of overall users of Novell NCP file-servers. Novell spent years hyping 'clientless' approaches to file-serving, through the CIFS stack on NetWare. A lot of places bought in to that. Because of this, the percentage of NCP-client Vista users among the overall Novell File-Server market is a rather small one.

And small means you don't get a lot of testing done by people-who-are-not-us, and seemingly obvious bugs showing up in the beta Sp1 builds. I don't have any Vista workstations, so I've done exactly zero testing of the Vista Client; this particular bug was reported and troubleshot by someone who is not me (I just filed it). Even though we have beta builds of the Vista client as part of this beta, I'm not testing it. All things considered, I probably should.

Since we're wedded hard to the Novell Client, it's probably time for us to start devoting resources to the ecosystem in order to keep it alive.

Labels: , ,


Friday, July 25, 2008

Handling eDirectory core-files on linux

If you've been getting core files generated by ndsd on your Linux servers, and want to call Novell Support about it, there are a few things you can do to maximize what Novell will get out of the files themselves. You may not get much, but these will help the people with the debug symbols figure out what's going on.

Packaging the Core


First and foremost, you already have the tool to package core files for delivery to Novell already on your system. TID3078409 describes the details of how to use 'novell-getcore.sh'. It is included on 8.7.3.x installations as well as 8.8.x installations.

Running it looks like this:
edirsrv1:~ # novell-getcore -b /var/opt/novell/eDirectory/data/dib/core.31448 /opt/novell/eDirectory/sbin/ndsd
Novell GetCore Utility 1.1.34 [Linux]
Copyright (C) 2007 Novell, Inc. All rights reserved.


[*] User specified binary that generated core: /opt/novell/eDirectory/sbin/ndsd
[*] Processing '/var/opt/novell/eDirectory/data/dib/core.31448' with GDB...
[*] PreProcessing GDB output...
[*] Parsing GDB output...
[*] Core file /var/opt/novell/eDirectory/data/dib/core.31448 is a valid Linux core
[*] Core generated by: /opt/novell/eDirectory/sbin/ndsd
[*] Obtaining names of shared libraries listed in core...
[*] Counting number of shared libraries listed in core...
[*] Total number of shared libraries listed in core: 72
[*] Corefile bundle: core_20080725_092227_linux_ndsd_edirsrv1
[*] Generating GDBINIT commands to open core remotely...
[*] Generating ./opencore.sh...
[*] Gathering package info...
[*] Creating core_20080725_092227_linux_ndsd_edirsrv1.tar...
[*] GZipping ./core_20080725_092227_linux_ndsd_edirsrv1.tar...
[*] Done. Corefile bundle is ./core_20080725_092227_linux_ndsd_edirsrv1.tar.gz


Once you have the packaged core, you can upload it to ftp.novell.com/incoming as part of your service-request.

Including More Data


If you're lucky enough to be able to cause the core file to drop on demand, or it just plain happens often enough that repetition isn't a problem, there is one more thing you can do to include better data in the core you ship to Novell. TID3113982 describes a setting you can add to the ndsd launch script (/etc/init.d/ndsd) that'll include more data. The TID describes what is being done pretty well. In essence, you're using an alternate malloc call that fails with better information than the normal one. You don't want to run with this set for very long, especially in busy environments, as it impacts performance. But if you have a repeatable core, the information it can provide is better than a 'naked' core. Setting MALLOC_CHECK_=2 is my recommendation.

Be sure to unset this once you're done troubleshooting. As I said, it can impact performance of your eDirectory server.

Labels: , , , , ,


Wednesday, July 16, 2008

Patching SLES

Last night I attempted to patch one of our OES2 servers. This particular server is an elderly beast, a P3 1GHz machine. So I wasn't expecting anything like fastness out of it. Especially with rug.

But still, it was painful!
normandy: ~#: rug lu
Waking up ZMD...
[8 minutes later]
[list of one update, libzypp]
normandy: ~#: rug update
Resolving Dependencies....
[8 minutes later]
Install this update? (y/N)
y
[12 minutes later]
Restarting ZMD...
[8 minutes later]
normandy: ~#: rug lu
[list of updates. No need to wait 8 minutes this time.]
normandy: ~#: rug update
Resolving Dependencies...
[8 minutes later]
Dependency resolution failed for bind-util and bind-libs. libdns-whatzihoozit required by bind-util is provided by bind-libs. Please fix you hoser.
[insert swearing here]
normandy: ~#: rug in bind-util bind-libs
Resolving Dependencies....
[8 minutes later]
Install these updates? (y/N)
y
[12 minutes later]
normandy: ~#: exit

As this had taken far longer than even I was expecting, I stopped. I'll finish up tonight. As this is an OES2 server, this means SLES10-SP1. I can attest that SLES10-SP2 on identical hardware is MUCH faster. I can't wait until OES2-SP1 comes out and this dinosaur can get faster patching.

Labels: , , ,


Monday, June 30, 2008

Novell Client for Linux, packaged for OpenSUSE

It has been mentioned many places, and I've done some of the mentioning, that since openSUSE is the foundation for SLED, it makes sense for Novell to distribute an NCL for openSUSE. It turns out they're working on just that. And here is the Novell beta page. I'm soooo going to try this out, since I'm running openSUSE 10.3 on my work desktop and won't be moving to openSUSE 11 until I can run the client on it (oh, wait, I can).

It should also be mentioned that Ubuntu is a very frequently requested target for another NCL, but I have reason to believe that'll never happen. First of all, any Novell Client involves closed source 3rd party licensed code, which makes it hard to port to Linux in the first place (a relic of being based on code from the days when open-source was just an ethical standpoint rather than a tangible market force). Second, Novell has proven to be rather light in developer resources in certain areas, and linux integration with non-SUSE linux distros is very minimal.

Labels: , , , , ,


Tuesday, June 24, 2008

Backing up NSS, note for the future

According to this documentation, the storing of NSS/NetWare metadata in xattrs is turned off by default. You turn it on for OES2 servers through the "nss /ListXattrNWMetadata" command. This allows linux level utilities (i.e. cp, tar) to be able to access and copy the NSS metadata. This also allows backup software that isn't SMS enabled for OES2 to be able to backup the NSS information.

This is handy, as HP DataProtector doesn't support NSS backup on Linux. I need to remember this.

Labels: , , , , ,


Thursday, May 29, 2008

OES2 and SLES10-SP2

Per Novell:

Updating OES2

OES2 systems should NOT be updated to SLES10 SP2 at this time!
Very true. And most especially true if you're running virtualized NetWare! The paravirtualization components in NW65SP7 are designed around the version of Xen that's in SLES10-SP1, and SP2 contains a much newer version of Xen (trying to play catch-up to VMWare means a fast dev cycle, after all). So, expect problems if you do it.

Also, the OES2 install does contain some kernel packages, such as those relating to NSS.

OES2 systems need to wait until either Novell gives the all clear for SP2 deployments on OES2-fcs, or OES2-SP1 ships. OES2-SP1 is built around SLES10-Sp2.

Labels: , , , , ,


Wednesday, May 21, 2008

SLES10 SP2 shipped

According to Novell, SLES10 SP2 has shipped.

This means that the ongoing OES2 SP1 beta I'm a part of will be done on released code for the SLES side of it. So any bugs we find there may end up as patches on the SP2 channel.

One nice thing in the new code?

"rug refresh --clean"

This will do what I posted about a few days ago. It'll nuke the zmd database and rebuild it fresh! Niiiice! Unfortunately, a truly better version of rug won't come until "Code 11".

Labels: , , ,


Wednesday, May 14, 2008

NetWare and Xen

Here is something I didn't really know about in virtualized NetWare:

Guidelines for using NSS in a virtual environment

Towards the bottom of this document, you get this:

Configuring Write Barrier Behavior for NetWare in a Guest Environment

Write barriers are needed for controlling I/O behavior when writing to SATA and ATA/IDE devices and disk images via the Xen I/O drivers from a guest NetWare server. This is not an issue when NetWare is handling the I/O directly on a physical server.

The XenBlk Barriers parameter for the SET command controls the behavior of XenBlk Disk I/O when NetWare is running in a virtual environment. The setting appears in the Disk category when you issue the SET command in the NetWare server console.

Valid settings for the XenBlk Barriers parameter are integer values from 0 (turn off write barriers) to 255, with a default value of 16. A non-zero value specifies the depth of the driver queue, and also controls how often a write barrier is inserted into the I/O stream. A value of 0 turns off XenBlk Barriers.

A value of 0 (no barriers) is the best setting to use when the virtual disks assigned to the guest server’s virtual machine are based on physical SCSI, Fibre Channel, or iSCSI disks (or partitions on those physical disk types) on the host server. In this configuration, disk I/O is handled so that data is not exposed to corruption in the event of power failure or host crash, so the XenBlk Barriers are not needed. If the write barriers are set to zero, disk I/O performance is noticeably improved.

Other disk types such as SATA and ATA/IDE can leave disk I/O exposed to corruption in the event of power failure or a host crash, and should use a non-zero setting for the XenBlk Barriers parameter. Non-zero settings should also be used for XenBlk Barriers when writing to Xen LVM-backed disk images and Xen file-backed disk images, regardless of the physical disk type used to store the disk images.

Nice stuff there! The "xenblk barriers" can also have an impact on the performance of your virtualized NetWare server. If your I/O stream runs the server out of cache, performance can really suffer if barriers are non-zero. If it fits in cache, the server can reorder the I/O stream to the disks to the point that you don't notice the performance hit.

So, keep in mind where your disk files are! If you're using one huge XFS partition and hosting all the disks for your VM-NW systems on that, then you'll need barriers. If you're presenting a SAN LUN directly to the VM, then you'll need to "SET XENBLK BARRIERS = 0", as they're set to 16 by default. This'll give you better performance.

Labels: , , , , , ,


Thursday, April 17, 2008

And a gripe

2.5 hours is too freakin' long for "rug lu" to tell me which patches need application to this particular OES2 server. This needs fixing. I hope its fixed in SLES10 SP2.

Labels: , ,


Tuesday, April 01, 2008

Slow blogging

I found out at BrainShare that WWU has been accepted as a Novell Authorized Beta site for OES2 SP1. And that's what I've been doing for the better part of the past week. Due to the NDA required, I can't talk about it. So, not much bloggable stuff to bring forward.

We requested entry into the program in part because of what I learned at BrainShare 2007. Specifically, Novell doesn't test for our scales of users. Therefore, it is in our best interest to make sure that organizations like us are in the beta. We have the hardware to make a go of it right now (all those new ESX boxes are liberating some still-useful 3-5 year old servers), and I have the time. Unfortunately, the only 64-bit testing we'll be doing will be in VMWare, so the newest of the new code will have to be really tested by other people.

That's why I've been quiet.

Labels: , ,


Thursday, March 20, 2008

BrainShare Thursday

Not a good day. My first course, "Advanced BASH," could more accurately be described as, "BASH scripting tips & tricks". I then proceeded to skip the other three sessions I had signed up for.
  • Novell Open Enterprise Server 2 Interoperability with Windows and AD. All about Domain Services for Windows and Samba. Neither of which we'll ever use. No idea why I wanted to be in this session.
  • Rapid Deployment of ZENworks Configuration Management. Other people around here have suggested that if we haven't moved yet, wait until at least SP3 before moving. If then. So, demotivated. Plus I was rather tired.
  • Configuring Samba on OES2. CIFS will do what we need, I don't need Samba. Don't need this one. Skipped.
DL236: Advanced BASH Course
BASH tips and tricks. I got a lot out of it, but the developers around me were quietly derisive.

ZEN Overview and Features
Not so much with the futures, but it did explain Novell's overall ZEN strategy. It isn't a coincidence that most of Novell's recent purchases have been for ZEN products.

TUT303: OES2 Clusters, from beginning to extremes
This was great. They had a full demo rig, and they showed quite a bit in it. Including using Novell Cluster Services to migrate Xen VM's around. They STRONGLY recommended using AutoYast to set up your cluster nodes to ensure they are simply identical except for the bits you explicitly want different (hostname, IP). And also something else I've heard before, you want one LUN for each NSS Pool. Really. Plus, the presenters were rather funny. A nice cap for the day.

And tonight, Meet the Experts!

Labels: , , , , , , ,


BrainShare Wednesday

The Wednesday keynote was, indeed, a bunch of demos. It was also mostly pointless as far as the technology I'm concerned with. Lots of GroupWise (don't care), lots and lots of PlateSpin (can't afford it), lots of Zen (not the bits I'd use).

That said, the new GroupWise WebAccess is gorgeous. I wish Exchange had their non-ActiveX pages look that good.

TUT175: RBAC: Avoiding the horror, getting past the hype
Mostly about IDM as it turned out. Only minimally interesting from an abstract viewpoint about roles in general.

TUT 277: Advanced eDirectory Configuration, new features, and tuning for performance
I learned a few things I didn't know, such as the fact that each object as an "AncestorList" attribute listing who their parent objects are. This apparently greatly speeds up searching. SP3, coming out this Summer, will have faster LDAP binds for a couple of reasons. Right now Novell is recommending 2 million objects as a reasonable maximum size for a partition for performance reasons.

And also they reiterated something I've heard before...
You know how back in the NetWare 4 days, we said to design your tree by geography at the first level, and then get to departments? Um, sorry about that. It was great back then, but for LDAP or IDM it really, really slows things down.
Yep. I took my first class for my CNA when 'Green River' was just coming out, or was just out. So I remember that.

TUT221: iPrint on Linux, what Novell Support wants you to know
A nice session from a mainline support guy about the ways people don't do iPrint on linux correctly. We're not going there until pcounter can run in linux, so this is still somewhat abstract. But, nice to know.
  • The reason that some print jobs render differently than direct-print jobs, is because of how Windows is designed. Direct-print jobs render with the 'local print provider', and iPrint jobs render with the 'network print provider'. This is a Microsoft thing, not an iPrint thing. You can duplicate it by setting up a microsoft IPP printer (assuming you're not mandating SSL like we are) and printing to the same printer with the same driver.
  • The Manager on Linux doesn't use a Broker, it uses a 'driver store'.
  • The Manager on NetWare doesn't always bind to the same broker. I didn't know that.
  • It is recommended to have only one Broker, or one driver store per tree.
  • Novell recommends using DNS rather than IP for your printer-agents, check your manager load scripts.

Labels: , , , , , , , ,


Tuesday, March 18, 2008

BrainShare Tuesday

Today started off with a bit of panic, as I hadn't set my alarm. Me being a west-coaster, 7:20 (when I woke up) is an entirely reasonable time to get up as far as my body is concerned. Only, I needed to get dressed and breakfasted before my first session at 8:30. Aie! I had to eat quick, but I got there. Didn't get a chance to check work email, though.

ATT326: Advanced Linux Troubleshooting
An ATT, therefore hard to summarize. But I learned about a few new commands I didn't know about before. Like strace. And vimdiff.

TUT130: Challenges in Storage I/O in Virtualization
Another nice one, but an emergency at work (printing down in a dorm, during finals week) distracted me heavily during the first half of it. Which resulted in the following note in my notes:
NPIV looks really nifty. Look into it.
NPIV being how you can use fibre-channel zoning to zone off VM's, rather than HBA's. Highly useful. I also learned about a neat new thing called Virtual Fabrics. Virtual Fabrics work kind of like VLANS for fabrics. You can segregate your fabrics into fabrics that share hardware but nothing else. Handy if your, say, Solaris admins don't want you mucking about with their zoning, while saving money through consolidated hardware.

TUT216: OES2 SP1 Architectural Overview
There is a LOT of new stuff in SP1.
  • It will include eDir 8.8.4 (8.8.3 will ship this summer sometime)
  • NCP and eDir will be fully 64-bit
  • OES2 SP1 will be based on SLES SP2, which will be releasing about the same time
  • AFP Support
    • AFP 3.1
    • Uses Diffie-Helman 1 for password exchange, meaning the 8-character password problem is solved.
    • Fully SMP-safe
    • Has cross-protocol locking with NCP. CIFS doesn't have cross-protocol locking yet, but IIRC, Samba does
    • Does not need LUM enabled users
  • CIFS Support
    • NTLMv1, but v2 is a possibility if enough people ask, so file those enhancement requests!!
    • CIFS is separate from Samba, therefore can not be used in conjunction with Domain Services for Windows
    • As with AFP, fully SMP safe
  • EDir 8.8.4
    • LDAP auditing enhanced
    • "newer auth protocols", but they didn't say what.
I should also mention that they're still deploying Novell Integrated Samba, which is what you'll have to use to get Domain Services For Windows. Samba still doesn't scale as far as I'd like ('only' 700-800 concurrent users), so that may be an issue for higher ed types who want high concurrency CIFS and also DSFW on the same box.

TUT211: Enhanced Protocol Support in OES2 SP1
This is the session where they went into detail about the AFP and CIFS support. They said that netatalk, the existing AFP stack on Linux, gets really slow once you go over the 20 concurrent users. Whoa! I can soooo understand why Novell felt the need to make a new one.
  • The 8 character password limit has been fixed! They now support DH1 for passing passwords.
  • The 'afptcp' daemon can use one password protocol at a time, so you can only use DH1, or one of the other three I can't remember.
  • Support for OSX 10.1 and 10.2 is scanty, and 10.5 is limited but users may not notice anyway.
  • Passwords will be case sensitive.
  • Kerberos will be in a future release
  • Performance is faster than NetWare, partly due to the ability to multi-thread
  • Can register services by way of SLP
  • Only supports NSS for the time being, the other Linux file-systems will be a future feature.
  • Can support 500 concurrent users, and 1000+ in the future. This fits our current AFP loads.
  • We can configure more about how it works than we could on NetWare, such as how many worker threads to spawn.
  • Has meaninful debug logs!
  • Has a new command, 'afpstat' that works like 'netstat' for giving a snapshot of afp connections.
And then some CIFS stuff. We can't use it for political reasons so I didn't pay attention. Sorry.

Tonight was the night formerly known as 'Sponsor Night,' but has a new name now that everyone who gets a booth is no longer a 'sponsor'. Some are sponsors, some are exhibiters. I can't keep track. Anyway, today was their party. "World of Novellcraft!" Homage to vid-gaming.

Lots of Wii, lots of Rock Band, some Halo, lots of women dressed in Renaissance Festival gear getting their pictures taken by the 90%+ male audience. I've blogged before about my ambivalence about Sponsor Night. I lasted until about 7, when I came back to the hotel.

Tomorrow I have an actual LUNCH BREAK in my schedule! Ooo! And Soul Asylum Soul Coughing Collective Soul plays the concert! I've been listening to two of their CD's for the past two months so I think I may even know a few songs by now.

Labels: , , , , , ,


Monday, March 17, 2008

Today at Brainshare

Monday. Opening day. I had trouble getting to sleep last night due to a poor choice of bed-time reading (don't read action, don't read action, don't read action). And had to get up at 6am body time in order to get breakfast before the morning keynote. There be zombies.

Breakfast was uninspired. As per usual, the hashbrowns had cooled to a gellid mass before I found everything and got a seat.

The Monday keynotes are always the CxO talks about strategy and where we're going. Today a mess of press releases from Novell give a good idea what the talks were about. Hovsepian was first, of course, and was actually funny. He gave some interesting tid-bits of knowledge.
  • Novell's group of partners is growing, adding a couple hundred new ones since last year. This shows the Novell 'ecosystem' is strong.
  • 8700 new customers last year
  • Novell press mentions are now only 5% negative.
Jeff Jaffe came on to give the big wow-wow speech about Novell's "Fossa" project, which I'm too lazy to link to right now. The big concern is agility. He also identified several "megatrends" in the industry:
  • High Capacity Computing
  • Policy Engines
  • Orchestration
  • Convergence
  • Mobility
I'm not sure what 'Convergence' is, but the others I can take a stab at. Note the lack of 'virtualization' in this list. That's soooo 2007. The big problem is now managing the virtualization, thus Orchestration. And Policy Engines.

Another thing he mentioned several times in association with Fossa and agility, is mergers and acquisitions. This is not something us Higher Ed types ever have to deal with, but it is an area in .COM land that requires a certain amount of IT agility to accommodate successfully. He mentioned this several times, which suggests that this strategy is aimed squarely at for-profit industry.

Also, SAP has apparently selected SLES as their primary platform for the SMB products.

Pat Hume from SAP also spoke. But as we're on Banner, and it'll take a sub-megaton nuclear strike to get us off of it, I didn't pay attention and used the time to send some emails.

Oh, and Honeywell? They're here because they have hardware that works with IDM. That way the same ID you use for your desktop login can be tied to the RFID card in your pocket that gets you into the datacenter. Spiffy.

ATT375 Advanced Tips & Tricks for Troubleshooting eDir 8.8
A nice session. Hard to summarize. That said, they needed more time as the Laptops with VMWare weren't fast enough for us to get through many of the exercises. They also showed us some nifty iMonitor tricks. And where the high-yield shoot-your-foot-off weapons are kept.

BUS202 Migrating a NetWare Cluster to OES2
Not a good session. The presenter had a short slide deck, and didn't really present anything new to me other than areas where other people have made major mistakes. And to PLAN on having one of the linux migrations go all lost-data on you. He recommended SAN snapshots. It shortly digressed into "Migrating a NetWare Cluster to Linux HA", which is a different session all together. So I left.

TUT215 Integrating Macintosh with Novell
A very good session. The CIO of Novell Canada was presenting it, and he is a skilled speaker. Apparently Novell has written a new AFP stack from scratch for OES2 Sp1, since NETATALK is comparatively dog slow. And, it seems, the AFP stack is currently out performing the NCP stack on OES2 SP1. Whoa! Also, the Banzai GroupWise client for Mac is apparently gorgeous. He also spent quite a long time (18 minutes) on the Kanaka client from Condrey Consulting. The guy who wrote that client was in the back of the room and answered some questions.

Labels: , , , , , ,


Monday, February 25, 2008

First OES2

This weekend I upgraded the one replica server running OES1-Linux to OES2-Linux. It already was at eDir 8.8.2 so the only real changes were to the base OS. It went rather well. The upgrade documentation provided by Novell was just fine. Really, a simple upgrade.

It being done on a Pentium III 1.2GHz machine meant it took a while. But very little in the way of complication. The one hitch was that it changed the certificate the NLDAP server loads to the default, which I didn't catch until a certain service we wrote failed. But that was a very easy fix.

Labels: , , ,


Thursday, February 14, 2008

OES2-SP1 soon to be in closed beta

Novell just announced that OES2 SP1 is going into closed beta.

"What is in this release of Open Enterprise Server

Novell Open Enterprise Server 2 Support Pack 1 refreshes the SUSE Linux Enterprise Server 10 distribution with SLES10 SP2, fixes defects found since the release of OES2 and also adds in the following functionality:

  • Novell engineered CIFS and AFP protocols
  • New version of iFolder (3.7)
  • Updated iPrint with an accounting API
  • 64-bit version of eDirectory
  • Enhanced migration tools and migration GUI
  • Improved performance of the XEN hypervisor
  • Domain Services for Windows
  • NetWare 6.5 Support Pack 8

Note that although Domain Services for Windows is part of OES2 SP1, a separate beta program will be run in order to collate DSfW feedback."

Novell engineered CIFS? I soooo want to know what that is. Is is a completely new CIFS stack, or is it Samba with Novell extensions whacked on? I want to know! The other important bit of information:

The beta test program is currently scheduled to begin mid March and run through October.
Which means there won't be product for my 2008 upgrade window. Fie. Well, at least we'll have ample time to prototype and test for the 2009 upgrade window.

Update 9/2008: Novell has posted on their beta site that a public beta is 'coming soon'.

Update 10/2008: The public beta for OES2 SP1 has been posted.

Labels: , , ,


Friday, January 25, 2008

A needed patch.

Novell has released a patch for the "ConsoleOne sorting problem."

The sorting problem happens when you have eDir 8.8 installed. Suddenly C1 starts sorting things by creation date rather than as you've ever seen it before. This is... confusing. ConsoleOne 1.3h helped some of it for us, but not all. And now, we have a patch!

Let ConsoleOne Sort Correctly!

Labels: , , , ,


Wednesday, January 02, 2008

Where NetWare Fits

NetWare 6.5 still holds top honors in one server niche. Even though it is a 32-bit operating system. That niche is the "large file-server" segment. I define "large" as, "lots of data, way-lots of concurrent users". Yeah, that's highly scientific. But "way-lots" means "over 1000 concurrent" to my thinking.

We regularly run between 1200-6000 concurrent connections on our cluster nodes. This is a density that just doesn't happen all that often in the market. If you have 6000 users close enough together to all talk to the same file-server at LAN speeds using a protocol designed for file-serving (such as NCP, SMB/CIFS, or AFP), you're a big organization. 6000 is a large corporate campus, a large governmental entity of some kind, or a larger .EDU like us. Nationally, the number of 'large' file-servers like that is peanuts compared to the number of 'workgroup' (i.e. under 300 concurrent users) servers out there.

It is therefore no surprise to me that Novell is not devoting a lot of engineering to supporting the top end of this market. While it may pay well, there just isn't enough revenue coming from these customers to try and handle the hardest-to-test use-case: very high concurrency. I find it disappointing because I AM one of those customers (a larger .EDU), but I understand the business drivers supporting the decision.

For the moment, NetWare 6.5 (32bit) is the top-dog performance wise for our environment. That isn't going to stay true for much longer. It would not surprise me to find out that a Windows Enterprise Server (x86_64) with 16GB of RAM can out-perform a NetWare 6.5 (32bit) server with 4GB of RAM, simply due to the added room for a file-cache. What I don't know is how CPU-bound file-serving I/O is on a Windows Enterprise Server, that's the one area that could keep NetWare 6.5 (32bit) on top. I already know that OES2-Linux out-performs NetWare for NCP traffic, so long as you stay within CPU bounds.

For high-concurrency applications, as far as I know NetWare still wins.

Labels: , , , ,


Wednesday, December 19, 2007

eDir 8.8 is in

And as far as upgrades go, it was pretty much a non-event.

Whenever you do upgrades like this you always wonder if those balls you're juggling are tennis-balls or grenades. It took about a half hour per server and didn't have any significant hitches. The one problem that did surface is that the OES1-linux server's LDAP server had its certificate change from the one it was using to SSL CertificateDNS. This was not good, as that certificate doesn't have the subject-name we need and this caused some S/LDAP binds to fail due to SSL validation problems. That was an easy fix. The LDAP servers on the NetWare boxes didn't change.

This was a tennis-ball upgrade. So far.

We haven't turned on case-sensitive LDAP binds yet, but soon. Soon.

One unexpected side-effect of getting all three eDir servers upgraded to 8.8 like this, is that the Change Cache is now cleaned of those permanent residents we've had for ages. Woo!

Labels: , , ,


Monday, December 17, 2007

Not dead.

Wow, last post was the 30th? Jeez. I was on vacation all last week, which accounts for some of it. And it's looking like I'll be out sick for at least a pair of days with a crud I got while wandering about. Not sharing that with work, nosir.

On my list of things to do during the winter inter-session is to get eDir 8.8 deployed in the production tree. I just need to have ALL the servers in the tree (all, not just replica holders due to backlink updates) up and talking when I do the first one, and that could take some scheduling. This is the first step to OES2, which will be deployed on the eDir servers first.

As soon as I get some new hardware, since they're getting old.

Labels: , , , ,


Friday, November 30, 2007

OES2 SP1 timing

Novell just posted the third draft of their OES2 Best Practices guide. Which you can locate here. In that guide is this text:
Domain Services for Windows, which is scheduled to ship with OES 2 SP1 (currently scheduled for late 2008), will also offer some clear advantages.
"Late 2008" means they WILL NOT have SP1 out by August of 2008. This means that the upgrade of our 6 node cluster to OES will have to wait until 2009. Grrarrr!

Another 21 months of a 32-bit operating system on the single biggest storage consumer on campus. We'll have at least one hardware refresh before then for some of the nodes, and... boy I hope they have NetWare drivers for that. The very limited testing I did with NetWare-in-Xen was not encouraging from a performance stand-point. If it looks like I'll have to deploy that way for the next servers we get in the cluster, I'll have to do more real testing to characterize the performance hit (if any). The idea of a 64-bit memory space for file-caching makes me drool. Not getting it for 21 months is painful.

That said, if Novell releases the eDirectory enabled AFP server for OES2-Linux outside of the service-pack I could still make the 2008 window. That's our only dependency for SP1.

Update (09/08/08): Looks like 'late October' is the date for SP1's release. Should be in public beta before then.

Update (12/03/08): It's out!

Labels: , , , , ,


Wednesday, November 28, 2007

I/O starvation on NetWare, HP update

Last week I talked about a problem we're having with the HP MSA1500cs and our NetWare cluster. The problem is still there, of course. I've opened cases with both HP and Novell to handle this one. HP because I really thing that such command latencies are a defect, and Novell since they're having starvation issues with clusters.

This morning I got a voice-mail from HP, an update for our case. Greatly summarized:
The MSA team has determined that your device is working perfectly, and can find no defects. They've referred the case to the NetWare software team.
Or...
Working as designed. Fix your software. Talk to Novell.
Which I'm doing. Now to see if I can light a fire on the back-channels, or if we've just made HP admit that these sorts of command latencies are part of the design and need to be engineered around in software. Highly frustrating.

Especially since I don't think I've made back-line on the Novell case yet. They're involved, but I haven't been referred to a new support engineer yet.

Labels: , , , , , , ,


Wednesday, November 21, 2007

I/O starvation on NetWare

The MSA1500cs we've had for a while has shown a bad habit. It is visible when you connect a serial cable to the management port on the MSA1000 controller, and doing a "show perf" after starting performance tracking. The line in question is "Avg Command Latency:", which is a measure of how long it takes to execute an I/O operation. Under normal circumstances this metric stays between 5-30ms. When things go bad, I've seen it as far as 270ms.

This is a problem with our cluster nodes. Our cluster nodes can seen LUNs on both the MSA1500cs and the EVA3000. The EVA is where the cluster has been housed since it started, and the MSA has taken up two low-I/O-volume volumes to make space on the EVA.

IF the MSA is in the high Avg Command Latency state, and
IF a cluster node is doing a large Write to the MSA (such as a DVD ISO image, or B2D operation),
THEN "Concurrent Disk Requests" in Monitor go north of 1000

This is a dangerous state. If this particular cluster node is housing some higher trafficked volumes, such as FacShare:, the laggy I/O is competing with regular (fast) I/O to the EVA. If this sort of mostly-Read I/O is concurrent with the above heavy Write situation it can cause the cluster node to not write to the Cluster Partition on time and trigger a poison-pill from the Split Brain Detector. In short, the storage heart-beat to the EVA (where the Cluster Partition lives) gets starved out in the face of all the writes to the laggy MSA.

Users definitely noticed when the cluster node was in such a heavy usage state. Writes and Reads took a loooong time on the LUNs hosted on the fast EVA. Our help-desk recorded several "unable to map drive" calls when the nodes were in that state, simply because a drive-mapping involves I/O and the server was too busy to do it in the scant seconds it normally does.

This is sub-optimal. This also doesn't seem to happen on Windows, but I'm not sure of that.

This is something that a very new feature in the Linux kernel could help out, that that's to introduce the concept of 'priority I/O' to the storage stack. I/O with a high priority, such as cluster heart-beats, gets serviced faster than I/O of regular priority. That could prevent SBD abends. Unfortunately, as the NetWare kernel is no longer under development and just under Maintenance, this is not likely to be ported to NetWare.

I/O starvation. This shouldn't happen, but neither should 270ms I/O command latencies.

Labels: , , , , , , ,


Monday, October 15, 2007

Peer-to-peer sharing

One feature that has shown up in some applications and widgets lately has gained some traction internally. That is the concept of peer to peer sharing of disk space without going through all the pain of getting things approved and formally set up. The general idea is this one.

I want to share U:\SharedStuff\ApacheGroup\ to five other users. U: is my home directory, which is actually map-rooted so I don't see the top level directory. So I go to a web page and tell it I want to share this directory, to these people, for this long. Go.

It struck me that this sort of thing can be engineered with NetWare and OES. The key components are eDirectory, NSS, and NetStorage.

The web server takes the request and translates $Path into a real path by referencing the HomeDirectory attribute of the user who requested the share. Then, using LDAP it creates two objects:

A Group Object
  • Created and named dynamically
  • [AuxClass] Attribute with user-defined name
  • [AuxClass] Attribute with the creator
  • [AuxClass] Attribute with the expiry date
  • Since this is eDirectory, group memberships apply immediately rather than taking a logout/login cycle to refresh the access token like in MS networks.
A Storage Location Object
  • Created & named dynamically
  • Associated to the created group
  • Assigned to the specified users
  • This allows the share to show up in NetStorage
The web server sends a request to a file daemon that handles the actual trustee assignment.

There is a small constellation of maintenance tasks that also need to be created, such as a janitor process to deal with expirations, a helpdesk view to track who has what shares, a historic view to see what shares got deleted recently that suddenly need to be back RIGHT NOW, something to interface this with whatever disk or directory quota systems are in use.

The use of NetStorage allows WebDAV to be used as an access method, which allows the shares to be seen. The really brave may be able to leverage DFS to create actual directory structures reflecting the shares in the actual directories so drive mappings can be used; unfortunately I have no idea if a DFS database that large is a good idea.

Users would love this. No need to go through management to get a directory set up on the shared space. You just set up and go. Great for adhoc groups, or small private gatherings.

Unfortunately, this sort of share model is one that a lot of sys-admins are familiar with. If you've ever had a chance to examine the network of a small business with under 15 users, all of whom call themselves 'not that good with computers', you know what I'm talking about. This model of sharing is the one that Windows for Workgroups was designed for, and is still the default mode for plain old WinXP. Excessive use of peer to peer sharing like that can lead to one unholy mess, especially if a key person leaves (or in the case of the Windows example, one hard drive crashes hard).

If left unchecked, you can get whole business processes designed with the assumption that [username] will never retire. That already happens to an alarming extent, but this would make the dependency more invisible to those of us charged with making it all work again when it breaks. You can have shared spaces that are business critical to the company living 100% inside a user's self-managed space, and vulnerable to deletion on termination of that employee.

This is all part of the balance we as system administrators have to keep between end user functionality, and data protection. Desktop techs fight a constant battle to get users to save data on the server where it is backed up, and Novell puts out things like iFolder to help that whole thing become more invisible. We created shared directories to draw a big line between 'my stuff' and 'us stuff'.

That said, data-access habits are changing all the time. My own boss prefers to email a 150KB Excel spreadsheet to all of us, even though all of us have ready access a shared directory setup just for that. SharePoint integrates with Office to make the web-server look like a file-server. We still have to adapt with the times.

User-directed sharing is something I can see as highly desirable among the student population and faculty as well. Among staff, I'm less sure its a good idea outside of the 'trivial' personal use we're allowed.

Labels: , ,


Wednesday, September 26, 2007

OES2 release date

Just got out of the WebCast they had. First, the important stuff:

OES2 will be released on October 5th.
OES2-SP1 is targeted for mid-April, 2008.
AFP integration will be in SP1.

I sooooooooo hope they don't push SP1 past July. If that happens, my main migration of our cluster will have to be pushed to 2009. Ick. We're already running out of effective file-cache in 32-bit memory space. I need 64-bit to really give good performance. Hope hope hope.

A few other minor points:
  • Around the release of SP1, Prosoft and Condrey Consulting (Kanaka) will release an NCP client for Mac.
  • The clearing of throats next to a mic is a sign of someone who doesn't do a lot of work in front of mics.
  • OES2 is fully 64-bit optimized (on Linux)
  • They claim EVEN BETTER NSS performance on OES2. I hope to try that out, soon as I can figure out how to get SLES10/OES2-beta5 to talk to my SAN luns. It hates me.

Labels: , ,


Tuesday, September 25, 2007

OES2 Web-chat tomorrow

This isn't exactly widely spread, but here it is:

Open Enterprise Server 2 Live Webcast

Tomorrow, September 26th at 11AM PDT.

They'll be talking about all the spiffy thats in OES2, and some new info about code releases. I think this is the 'event' they mentioned a while back.

Labels: , ,


Tuesday, September 18, 2007

OES2: clustering

I made a cluster inside Xen! Two NetWare VM's inside a Xen container. I had to use a SAN LUN as the shared device since I couldn't make it work doing it just to a single file. Not sure what's up with that. But, it's a cluster, the volume moves between the two just fine.

Another thing about speeds, now that I have some data to play with. I copied a bunch of user directory data over to the shared LUN. It's a piddly 10GB LUN so it filled quick. That's OK, it should give me some ideas of transfer times. Doing a TSATEST backup from one cluster-node to the other (i.e. inside the Xen bridge) gave me speeds on the order of 1000MB/Min. Doing a TSATEST backup from a server in our production tree to the cluster node (i.e. over the LAN) gave me speeds of about 350MB/Min. Not so good.

For comparison, doing a TSATEST backup from the same host only drawing data from one of the USER volumes on the EVA (highly fragmented, but must faster, storage) gives a rate of 550 MB/Min.

I also discovered the VAST DIFFERENCES between our production eDirectory tree, which has been in existence since 1995 if the creation timestamp on the tree object is to be believed, and the brand new eDir 8.8 tree the OES2 cluster is living in. We have a heckova lot more attributes and classes in the prod tree than in this new one. Whoa. It made for some interesting challenges when importing users into it.

Labels: , , , ,


OES2-beta progress

As mentioned before, I have the OES2 beta. Right now I have two NetWare servers parked in Xen VM's on SLES10SP1. This is how it is supposed to work!

I haven't gotten very far in my testing, but a few things are showing. I managed to do a TSATEST-based throughput run of a backup of SYS. That's about a gig of data. Throughputs for just one stream to one of the servers was around 500 MB/min, which is passible and within the realm of real performance for slower hardware. The downside of that is that the CPU reported by "xm top" was around 45%, where the CPU reported in MONITOR was closer to 25%. That's way higher than I expected, but could be related to all the disk I/O ops. This I/O was to a file in the file-system, not a physical device like a LUN on the SAN (that comes later).

Now I'm trying to get Novell Cluster Services installed. I want to get a weensy 2-node cluster set up to prove that it can be done. I suspect it can, but actually seeing it will be very nice.

Labels: , , ,


Thursday, September 13, 2007

OES2: virtualization

I have the beta up and running. I have a pair of OES2-NW servers running in Xen on SLES10SP1. And it loads just spiffy. Haven't done any performance testing on it, kind of hard to really interpret results at this point anyway.

What I HAVE been spending time on is seeing if it is possible to get a cluster set up. Clusters, of course, rely on shared storage. And if it works the way I need it to work, I need multiple Xen machines talking to the same LUNs. It may be doable, but I'm having a hard time figuring it out. The documentation on Xen isn't what you'd call complete. Novell has some in the SLES10SP1 documentation, but the stuff in the OES2 documentation is... decidedly overview-oriented. This is the most annoying thing, as I can't just put my nose to a manual and find it.

So, looking for Xen manual. It has to be around somewhere. Google-foo failed me today.

Labels: , , ,


Monday, September 10, 2007

OES2 public beta is out

Jason Williams said so.

This looks to be Beta5. They released both the Linux and NetWare parts of it. The NW65SP7 overlay iso is 1.1GB in size. I sooooooooooooooooooooo gotta get DVD drives into my servers.

Rumor has it release is now mid-October. So who knows what's going on with the 'launch' on the 26th.

Labels: , ,


Friday, September 07, 2007

The mystery of the OES2 release date

Various sources have pointed at evidence that Novell will be launching OES2 on the 26th. As has been pointed out, "Launch" and "Release" are different things. And yet, and the same time rumor has it to "watch for events this Monday".

I don't know what to make of that.

It COULD be that the open beta will be out Monday. I have doubts about that, as that leaves very little time for reports to come back from the field for incorporation into OES2-release, presumably on or about the 26th.

It COULD be that it'll be released Monday, and the major PR push for launch will be two and a half weeks later. I have my doubts about that, Novell will be scooped by the likes of me as we put the new product to the test, but it could happen.

It COULD be that Monday is a red herring and Novell will announce a ship date on the 26th, and the opening of the beta. I put more stock into this possibility. The likes of me will swoop up the beta code, run it through its paces and send feedback about what we manage to break, for a presumed ship of OES in November or so.

Or it could be none of these. I guess we'll find out Monday or something.

Labels: , ,


Friday, August 31, 2007

Here's an interesting thing

Novell is putting together a Best Practices guide for migrating to OES2 from NetWare. Obviously this is OES2-Linux, as there is not much that needs migrating when going from OES-NW to OES2-NW. They're soliciting community input for the guides, and will be offering Cool Solutions reward points for contributions.

This is interesting. I know that the Novell Support Forum Sysops tend to build up their own micro guides based on problems people report in the forums, and this is a way to better formalize that. Some of the sysops have taken to using the Cool Solutions Wiki as a place to park boiler-plate answers and forward questioners to those pages. This is an interesting concept.

More interesting as OES2 isn't out yet, even in an open-beta form. Where are we going to get our experience from, eh? This implies that shortly we'll have at least an open beta to try out. I hope so.

I can't contribute much to this document because my main migration is contingent on AFP being eDir integrated, and they've said that'll not happen until probably SP1. If I do anything it'll be the eDir servers, and those are relatively easy migrations. DFS is the only sticking point for that.

Labels: , ,


Wednesday, August 29, 2007

Dynamic Storage Technology, more data

Two days ago Novell posted an AppNote on Dynamic Storage Technology, formerly known as 'shadow volumes'.

Setting up Dynamic Storage Technology with Open Enterprise Server 2

One thing I noticed right at the top of the article is a little blurb that reads:
This article was written for Novell Open Enterprise Server 2. Sign up here to be notified when the Novell Open Enterprise Server 2 open beta becomes available.
Which tells me that the public beta is probably pretty near, and that OES2 release will probably not be "end of Q3" like Jason Williams indicated a while back. I could be wrong, of course. As soon as I get the public beta code there is some serious testing I need to do.

Anyway, back to the article. This is a click-by-click guide for setting up DST. This includes screenshots, which are of the new iManager 2.7. Unsurprisingly, Novell re-themed the iManager interface. There is a gotcha on step 17, where you have to edit a local config file on the OES server to get it going, that would probably trip up most people trying to set up DST by going solely on looking at the UI.

This is a very good article describing it all. I recommend it!

Labels: , , ,


Friday, July 27, 2007

Novell news

Two Cool Blogs posts in the past few days have held some nice tidbits.

Jason Williams says that the Novell Client for Vista is due out mid August
, so long as a key defect registered with Microsoft gets fixed.

Jaimon Jose says that eDir 8.8 SP2 is also due out real soon. SP2 apparently involves some serious performance enhancements.

Both of these are technologies associated with the elusive OES2. We need the Client for Vista as soon as they can get it to us, so I'm not surprised they're considering releasing that independently of OES2. SP2 for eDir 8.8 is one thing I figure will be included in OES2 by default. As that's an independent product as well, having it release independently is nice. This means that two technologies that could be blockers for OES2 are finally being kicked into the real world.

In news unrelated to WWU at all, Bonsai, the next GroupWise version, seems to be getting closer to deployment. They're nearing 'code complete' and will soon start the Authorized Beta phase.

Labels: , ,


Wednesday, July 18, 2007

The OES2 push, what it means to me.

With the release of OES2 pushed to Christmas, or possibly BrainShare 2008, I'm in a hard spot. The magnitude of this migration means that I have one period a year I can pull that off, and that is the last week in August and the first two weeks of September. If I don't have code in that period, I can't migrate. Period.

As I learned at BrainShare this year, the Apple Filing Protocol stack on OES2-Linux is not eDirectory integrated. This is a project stopper for us, so we need that to be in place before we migrate. They quoted us, "Possibly SP1 timeframe, definitely not first-customer-ship, but don't hold us to it." They learned of the AFP problem at BrainShare and said they'd get right on it to try and get that in. That told me that summer 2008 would be the earliest I could expect to have the eDir integrated AFP stack.

Since I don't think Novell is planning on pushing OES2 ship to summer 2008, I suspect the AFP stack will be in with SP1. I consider it likely that OES2 SP1 will ship about the same time as SLE10 SP2. Which means I have real strong doubts that I'll be doing an OES2-Linux migration during next year's intersession. So we'll probably end up staying on NetWare for file-serving at least until 2009. In 2009 those NetWare servers may very well be in either an ESX or Xen virtual container, but it'll still be the 32-bit NetWare code doing the serving. That said, the web and print services (MyFiles, MyWeb, iprint) may move earlier, as they do not have the same AFP dependency.

Our storage needs on the WUF cluster are already pushing the boundaries of the 32-bit memory space. I'd be a lot happier of I could throw another 2 gigs of RAM at the file-servers in order to keep their cache-levels at a good spot. Can't do that on 32-bit NetWare, at least not while expecting improved performance. In 2009 we'll be managing anywhere from 12 to 18 terabytes of data on WUF, with a good chunk of it active. That is a situation that screams for 64-bit limits to memory space in order to provide zippy performance.

Thus, I am worried. Please, Novell. Ship at Christmas. It'll make my schedules look a LOT less grim.

Labels: , ,


Monday, July 09, 2007

More fun OES2 tricks

I had an idea while I was googling around a bit ago. This may not work the way I expect as I'm not 100% on the technologies involved. But it sounds feasible.

Lets say you want to create a cluster mirror of a 2-node cluster for disaster recovery purposes. This will need at least four servers to set up. You have shared storage for both cluster pairs. So far so good.

Create the four servers as OES2-Linux servers. Set up the shared storage as needed so everything can see what it should in each site. Use DRBD to create new block-devices that'll be mirrored between the cluster pairs. Then set up NetWare-in-VM on each server, using the DRBD block-devices as the Cluster disk devices. You could even do SYS: on the DRBD block-devices if you want a true cluster-clone. That way when disk I/O happens on the clustered resources it gets replicated asynchronously to the DR site; unlike software RAID1 the I/O is considered committed when it hits local storage, SW RAID1 only considers writes committed when all mirrored LUNs report the commit.

Then, if the primary site ever dies, you can bring up an exact replica of the primary cluster, only on the secondary cluster pair. Key details like how to get the same network in both locations I leave as an exercise for the Cisco engineers. But still, an interesting idea.

Labels: , , , ,


Friday, July 06, 2007

Getting creative with Blackboard

I had me an idea yesterday. One of those ideas that I'm not sure is a good one, but wow does it make a certain kind of sense.

We, like all too many schools run Blackboard as the groupware product supporting our classrooms. There is an opensource product out there that also can do this, but we're not running it. That's not what this post is about.

First a wee bit of architecture. Roughly speaking, Blackboard is separated into three bits. The web server, the content server, and the database. The web-server is the classic Application Server that is what students and teachers interface with. The web server then talks with both the content server and database server. The content server is the ultimate home of all things like passed in homework. The database server glues this all together.

Due to policies, we have to keep courses in Blackboard for a certain number of quarters just in case a student challenges a grade. They may not be available to everyone, but those courses are still in the system. And so is all of the homework and assorted files associated with that class. Because of this, it is not unusual for us to have 2 years (6-7 quarters) of classes living on the content server, of which all but one quarter is essentially dead storage.

One of the problems we've had is that when it comes time to actually delete a course, it doesn't always clean up the Content associated with that course. Quite annoying.

This is a case where Dynamic Storage Technology would be great. Right now our Blackboard Content servers are a pair of Windows servers in a Windows Cluster. It struck me yesterday that this function could be fulfilled by a pair of OES2 servers in a Novell Clustering Services setup (or Heartbeat, but I don't know how to set THAT up), using Samba and DST to manage the storage. That way stuff that is accessed in the past, oh, 3 months would be on the fast EVA storage, and stuff older than 3 months would be exiled to the slow MSA storage. As the file-serving is done by way of web-servers rather than direct access, the performance hit by using Samba won't be noticable as the concurrency is well below the limit where that becomes a problem. Additionally, since all the files are owned by the same user I could use a non-NSS filesystem for even faster performance.

Hmmmm......

The problem here is that OES2 isn't out yet. Such a fantastical idea may be doable in the 2008 intersession window, but we may have other upgrades to handle there. But still, it IS an interesting idea.

Labels: , , , ,


Dynamic Storage Technology

Novell Connection Magazine has an article up right now that describes DST, formerly known as Shadow Volumes. I've talked about them before, both last year around this time (6/15/07, and 6/26/07) and back at BrainShare (TUT205). So, I've been following this.

As said previously, this'll not work for NetWare, just OES-Linux. From what I understand you can host migration volumes on NetWare, but the server presenting the unified view of the storage has to be OES-linux.

Anyway, on with the article.

Labels: , , ,


Tuesday, July 03, 2007

OES2: pushed several months

A new post up on Cool Blogs shows where OES2 is sitting:

http://www.novell.com/coolblogs/?p=921

To quote from one of the comments by the author:
There will be a public beta. It might take couple of months more for a public beta.
This blows my schedule. From the sounds of it, they're looking at a Christmas or possibly BrainShare 2008 release. We'll have to put NetWare inside ESX server instead of a Xen paravirtualization. Due to this delay, and the presumed SP1 schedule, chances are now much worse for Novell to make the summer intersession 2008 migration window.

Crap.

Labels: , , ,


Thursday, June 28, 2007

Novell Client for Vista, in public beta

Announced in Cool Blogs.

On the Beta Page.

Downloads.

Documentation.

Still no word on when OES2 is coming out. This is somewhat disheartening, as I had heard at BrainShare that the OES2 release would be simultanious with the Novell Client for Vista release. At this point, it is looking like an August release for OES2, which soooo blows my schedule.

Labels: , , ,


Monday, June 18, 2007

New Novell releases

Looks like Novell pushed several products out the door late Friday:
No OES2. No Client for Vista. But SP1 gets me closer to where I need to be.

NCL 2.0 is interesting since the current version is v1.2. The full rev of the version suggests that they made marked improvements to it. I have noticed that they offer both 32bit and 64 bit versions of the client, which I don't think 1.2 had.

Labels: , , ,


Wednesday, June 13, 2007

Still waiting

Any day now OES2 will come out.

Any day now.

Any day now I'll get a paravirtualizable NetWare and will be able to run it through its paces.

Any day now I'll get to try and figure out how Xen virtualization of NetWare interacts with an HP MSA1500cs.

But not today.

Labels: , , , ,


Monday, April 09, 2007

OES2, not until 2008

The revelation about AFP in OES2 (how did I miss that?) is the last nail. OES2 will not be rolled out to the WUF cluster until August/September 2008 at the earliest. We'll be staying on NetWare until then. We have a couple of Mac labs and at least one class track that depends on AFP support. CIFS is not an option for many reasons.

So we will be waiting until Novell catches up. In the mean time our 'utility' servers could possibly move, but there aren't many of them. The other two NDS servers, and the server that ATUS hosts their Ghost images on. We're already running OES on one of the NDS servers. The other two are the SLPDAs for our environment, and also house the DFS databases.

Labels: , ,


Friday, April 06, 2007

OES2 and AFP

If you're an instituion of education like us, chances are real good you have PowerBooks and other Mac hardware desiring access to your NetWare/OES servers. It turns out I missed something while at BrainShare. OES2-Linux does NOT have an eDir integrated AFP stack like NetWare does. Whoa.

Details here: http://www.novell.com/coolblogs/?p=836

That's Jason Williams posting, and he is the Project Manager to OES. I spoke with him for a while during Meet the Experts regarding the concurrency concerns we have with OES in general. He has been on Novell Open Audio several times, so I know his voice. He was run downright ragged during BrainShare, which is very not surprising due to his level of oversight of a major product.

He's asking for people who need AFP to talk to them about it. The details of what he's looking for is in the posting I linked above. I've sent in my own impressions, and I've forwareded it to internal people who are Very Concerned about how Mac interacts with our NetWare servers.

Labels: , , ,


Monday, April 02, 2007

Concurrency, again

I performed another test on Friday for concurrency. I had 9 workstations performing an iozone througput test. Each machine ran 20 threads each processing against a 15MB file, for a total working set size of 2.7GB which fits into the server's RAM. The results from the workstations were pretty consistant. The workstations had all of 384MB of RAM in them, and the number of IOZone threads running caused significant page-faulting to occur. Which has the side effect of minimizing client-side caching. The workstations were connected to the core by way of 100MB ethernet, so maximum theoretical speeds are 12.5MB/s.

Some typical results, units are in KB/s

Initial write
11058.47
Rewrite
11457.83
Read
5896.23
Re-read
5844.52
Reverse Read
6395.33
Stride read
5988.33
Random read
6761.84
Mixed workload
8713.86
Random write
7279.35

Consistantly, write performance is better than read performance. On the tests that are greatly benefitted by caching, reverse read and stride read, performance was quite acceptable. All nine machines wrote at near flank speed for 100MB ethernet, which means that the 1GB link the server was plugged in to was doing quite a bit of work during the Initial Write stage.

What is perhaps the most encouraging is that CPU loading on the server itself stayed below the saturation level. Having spoken with some of the engineers who write this stuff, this is not surprising. They've spent a lot of effort in making sure that incoming requests can be fulfilled from cache and not go to disk. Going to disk is more expensive in Linux than in NetWare due to architectural reasons. Had the working set been 4GB or larger I strongly suspect that CPU loading would have been significantly higher. Unfortunately, as school is back in session I can't 'borrow' that lab right now as the tests themselves consume 100% of the resources on the workstations. Students would notice that.

The next step for me is to see if I can figure out how large the 'working set' of open files on FacShare is. If it's much bigger than, say, 3.2GB we're going to need new hardware to make OES work for us. This won't be easy. A majority of the size of the open files are outlook archives (.PST files) for Facilities Management. PST files are low performance critters, so I don't care if they're slow. I do care about things like access databases, though, so figuring out what my 'active set' actually is will take some figuring.

Long story short: With OES2 and 64 bit hardware, I bet I could actually use a machine with 18GB of RAM!

Labels: , , ,


Thursday, March 29, 2007

Why cache is good

One of my post-brainshare tasks is to rebenchmark some OES performance. I did a benchmark series back in September and the results there weren't terribly encouraging. I learned at BrainShare that a mid-December NCPSERV patch fixed a lot of performance issues, and I should rerun my tests. Okay, I can do that.

One test I did underlines the need to tune your cache correctly. Using the same iozone tool I've used in the past, I ran the throughput test with multiple threads. Three tests:

20 threads processing against a separate 100MB file (2GB working set)
40 threads processing against a separate 100MB file (4GB working set)
20 threads processing against a separate 200MB file (4GB working set)

The server I'm playing with is the same one I used in September. It is running OES SP2, patched as of a few days ago. 4GB of RAM, and 2x 2.8 P4 CPU's. The data volume is on the EVA 3000 on a Raid0 partition. I'm testing OES througput not the parity performance of my array. Due to PCI memory, effective memory is 3.2GB. Anyway, the very good table:
                        20x100M        40x100M        20x200M
Initial write 12727.29193 12282.03964 12348.50116
Rewrite 11469.85657 10892.61572 11036.0224
Read 17299.73822 11653.8652 12590.91534
Re-read 15487.54584 13218.80331 11825.04736
Reverse Read 17340.01892 2226.158993 1603.999649
Stride read 16405.58679 1200.556759 1507.770897
Random read 17039.8241 1671.739376 1749.024651
Mixed workload 10984.80847 6207.907829 6852.934509
Random write 7289.342926 6792.321884 6894.767334
The 2GB dataset fit inside of memory. You can see the performance boost that provides on each of the Read tests. It is especially significant on the tests designed to bust read-ahead optimization such as Reverse Read, Stride Read, and Random Read. The Mixed Workload test showed it as well.

One thing that has me scratching my head is why Stride Read is so horrible with the 4GB data-sets. By my measure about 2.8GB of RAM should be available for caching, so most of the dataset should fit into cache and therefore turn in the fast numbers. Clearly, something else is happening.

Anyway, that is why you want to have a high cache-hit percentage on your NSS cache. This is also why 64-bit memory will help you if you have very large working sets of data that your users are playing on, and we're getting to the level where 64-bit will help. And will help even though OES NCP doesn't scale quite as far as we'd like it to. That's the overall question I'm trying to answer here.

Labels: , , , ,


Tuesday, March 27, 2007

Just a handy reminder

Novell has changed the patch process for brand new OES SP2 slightly. See TID3045794.

rug act [activation code] [email address]activate your patches
rug sub oes
subscribe to the OES channel
rug ref
refresh the OES channel
rug pin patch-11371
Install the rug patch, so this doesn't take an age
rcrcd restart
Restart the red-carpet daemon, so the patch takes
rug ref
refresh the OES channel
rug pl
Make sure you see patches
rug pin --entire-channel oes
Install all the patches in the OES channel

Now you know.

Labels: ,


Thursday, March 22, 2007

TUT 202: NetWare cluster migrations to LInux clusters

There is a book on this: "Novell Cluster Services for NetWare and Linux". This session was about OES, not OES2. Again, my notes:
  • On linux, cluster nodes are added through YaST
  • 120 bytes of meta-data per file on NSS
  • iPrint volumes could go on non-NSS volumes
  • ext3 on OES2 is indexed, not indexed on OES1. Problem for larger directories.
  • Novell Server Consolidation and Migration Tool can migrate Netware to Linux
  • While running in mixed mode, can not extend or create NSS pools. Reboots all around to make this take.
  • In mixed mode, trustee modifications do NOT transfer to the other OS. Migrate your NetWare volumes to OES-Linux, and leave them there!
    • In OES-Linux, trustees are kept in a file, not in the file-system.
  • In mixed mode, cluster load/unload scripts are kept in /etc/opt/novell/ncs/
    • When out of mixed mode, scripts are promoted into edir
  • Cluster licenses are not checked in OES-linux, but still 'count' come audit time. So have them.
  • The 'cluster convert' command ends mixed-mode operation
  • Clustering inside VMWare ESX server: only 2-node Microsoft clusters are supported. All others are not.

Labels: , , , ,


OES 210: OES, architectural overview

This sounds basic, but it extends on IO102. Again, my notes:
  • Probable beta in the next few weeks
  • OES2 will not install on SLES10, only on SLES10 sp1
    • This was done for Product Certification reasons, as was the fact that OES is an 'add on' to SLES
  • Most of OES2 is still 32-bit code. Parts with kernel interaction will be 64-bit.
  • Shipping on DVD media, though the OES add-on will be CD.
  • It will use Novell Customer Center for updates
  • http://www.novell.com/products/openenterpriseserver/partners for AV and Backup partners
  • CASA is a new auth package, stores things. Also exists on the client
  • NLDAP has been ported to openLDAP, in that the openLDAP community has accepted the patches submitted by Novell.
  • The kernel in OES2 will be 2.6.16
  • SMS allows backing up of Xen VMs
  • eDir 8.8 comes with OES2, no word on eDir 8.7
  • pureFTP is edir integrated
  • iManager 2.7 comes with JRE1.5
    • iManager WILL be ported to NetWare, which means OES2-NetWare will also come with JRE1.5
  • Samba new 'passdb' option = NDS_ldapsam
    • Allows use of Universal passwords as a Samba password. Nifty.
  • Tomcat 5 now, separate OES instance from the default SLES10 instance.
  • New migration framework, script based from the looks of it.
  • LAS, light auditing framework, new audit API
    • NSS is instrumented to use it.

Labels: , , ,


Tuesday, March 20, 2007

TUT211: NetWare virtualization

  • Xen 3.0.4+ is the codebase. They wanted 3.0.5, but Xensource didn't get the release out in time for that.
  • Server.EXE contains the bulk of the paravirtualization code.
  • New loader, XNLOADER.SYS replaces NWLOADER.SYS, if used in Xen.
  • New console driver. The old method, writing directly to video memory, won't work in a paravirtualized VM.
  • New PSM: XENMP.PSM. Permits SMP in Xen.
  • So far, no "P2V" equivalent application, though they promise something by OES2-ship.
  • Improved VM management screens.

Labels: , , , ,


TUT205: Dynamic Storage Technology

I've gone over this in some length in the past. But as with the previous, here are my notes from the session.

  • New fstype = shadowfs, provides a linux-level view of a shadow filesystem. By default, linux doesn't see the unified view. Useful for some backup apps, or things like web-servers.
  • File-systems participating in DST need to be in the same file-system on the OES server. Could be NFS mounted, might possibly be NCP-mounted in the future. Not yet.
  • Migration policy can be set by user.
  • Migrations are batched, not done on-demand.
  • Can be used to silently migrate a volume to new hardware
    • Set new volume as Primary, and old as Shadow
    • As users hit data, it gets migrated to Primary from Shadow during nightly migrations.
    • Over time, most of a file-system can be migrated this way.
  • Directory quotas do NOT replicate over shadow. The shadow quota may be different than Primary quota, and directory quotas are NOT shadow-aware. This is because directory quotas are a function of the file-system, and DST is a function of NCPserv and the client.

Labels: , , ,


TUT212: Novell Storage Services

I'm what you'd call good at NSS. But NSS on OES2 is another critter. This session took us through the updates. From my notes:

  • Three times the NSS source tree has been accidentally deleted by developers. It has been restored from Salvage each time. Go Salvage.
  • When mounting NSS on OES-Linux, mount it with the long namespace. Saves time. I did not catch the fstab option to make this work, though.
  • You can create NSS pools that are not NetWare compatible
  • NSS & LUM
    • NSS = 128-bit, Unix = 32-bit. LUM handles the translations.
    • Users need to be LUM-enabled for this to work
    • NCP-Serv can fake it for non-LUM users, but it is slower access.
      • OES1 = Rights and owners set all posix
      • OES2 = Rights and owners set through extended attributes
    • If Samba, then LUM.
      • Trustees are enforced, GUID is ignored.
  • Beasts = inodes!
  • /proc/slabinfo -> lsa_inode_cache = @ of inodes/files in cache
  • On NetWare, memory over the 4GB line is treated as a RAMdisk for files over 128K in size.
  • 32-bit vs 64-bit linux & NSS
    • 32-bit linux: 1GB max kernel memory, makes for tricky caching
    • 64-bit linux: All memory can be kernel memory
  • NSS patch in mid-December allowed meta-data caching in user-memory, greatly speeding up meta-data reads on 32-bit systems with large numbers of files.
  • nss /HighMemoryCacheType= [private|linux|none]
    • Sets the use of User memory in 32-bit OES
    • None = Use the same algorithm as OES-FCS, which is to try and cache everything in Kernel-mode memory. Only option on 64-bit linux since it doesn't have to use USER memory at all.
    • linux = integrate caching into the regular linux caches. This can be a problem on dual use file-server/app-server system, as memory hungry applications can cause the file-system cache to purge completely.
    • private = set up a separate user-mode cache in memory outside of the linux cache. Best for dedicated file-servers.

Labels: , ,


What's new in OES2

A good many things are new in OES2. The high-points:
  • 64-bit support (woo!)
  • iFolder 3.6
  • Dynamic Storage Technology (f.k.a. Shadow Volumes)
  • eDir integrated DHCP/DNS & FTP
  • Major Samba improvements
  • DFS support, including linking to sub-directories
    • Make a link to, for example, DATA3:/shared/, rather than making a new volume just for "shared"
  • NetWare in a VM, with improved VM management
  • Xen 3.0.4+ support
    • They wanted 3.0.5, but Xensource didn't make the cut off date. So OES2 will have 3.0.4 heavily patched.
Also...
  • Service packs for OES will be synchronized with SLES
  • OES is going to be an add-on product on top of SLES, choose 'add on product' during install and use the OES CD's.
  • The 'Volume Location Database' for DFS is clusterable now
  • iManager 2.7 now has support for managing file-system trustees
  • OES3 will only have support for NetWare inside of a VM. This is a move that was pushed by the hardware vendors, NOT Novell. The hardware vendors have notified Novell that they'll be discontinuing driver support for NetWare after OES2.
The new Novell Client will be released near OES. This will be 4.91SP4:
  • It has 802.1 support
  • New client for SLED10
  • No DLU for vista, that will come from Zen

Labels: , , ,


Monday, March 19, 2007

TUT212: Novell Storage Services

Not a new topic, but it contained the updates to NSS that'll be there in OES2.

By far the biggest thing is a 64-bit version of OES. Big big big. How big? Very big.

Remember those benchmarks I ran? The ones that compare the ability of OES to keep up with NetWare? And how I learned that on OES NCP operations are CPU bound w-a-y more than on NetWare? That may be going away on 64-bit platforms.

You see, 64-bit linux allows the Kernel to have all addressable memory as kernel memory. 32-bit linux was limited to the bottom 1GB of RAM. If NSS is allowed to store all of its cache in kernel memory, it'll behave exactly like 32-bit NetWare has done since NSS was introduced with NetWare 5.0. I have very high hopes that 64-bit OES will solve the performance problems I've had with OES.

Labels: , , , ,


Monday keynote

Ron Hovespian is a better speaker than Messman was, and was much better about hiding the fact that he was using a teleprompter. All in all the session wasn't terribly informative, but then the Monday session generally aimed at Press Releases rather than gee whiz. That comes Friday.

That said, there was some good stuff in this session:
  • OES2 public beta will be 'soon'. It will not be released at BrainShare
  • AD / eDir federation will be in OES2
  • SLES10 SP1 is out
  • A new certification: Novell Certified Engineer (NCE), a migration of the old Certified Novell Engineer (CNE) to the new Linux regime. (I have to look in to that)
  • Virtualization managers are coming soon. Possibly in Zen for Linux 7.2, releasing "after Q2".
  • NetWare SP7 will be OES2
Oh, and there was a Microsoft guy on stage. Whoa. I'll post that picture later.

Update: The picture.

Labels: , , ,


Tuesday, March 13, 2007

Novell open audio: Dynamic Storage Technology

The last NOA before BrainShare: Dynamic Storage Technology

This is the official name for what has been known as Shadow Volumes. I've spoken about them many times in the past. I first heard about Shadow Volumes (now Dynamic Storage Technology) at BrainShare last year, but I didn't blog about it. Since then, there have been a few more posts.

June 15, 2006
June 26, 2006
September 13, 2006
November 30, 2006

Yeah, this is exciting stuff. The podcast had more details, here are my notes:

Jason Williams -- Product Manager for OES
  • OES2 will include Dynamic Storage Technology
  • "We recon about 80% of of that stuff [on very large NCP volumes] could be turned to stale. It's stuff that hasn't been touched in maybe 30 days or more"
  • "We have one customer out there with maybe 450 plus terabytes of data, and that's just the unstructured stuff. It doesn't even account for their databases."
  • Redirection to the shadow volume is done similar to DFS, with a pointer the client understands and then follows.
    • This avoids the migrate/demigrate problem for traditional HSM
  • This is linux-only. Not NetWare.
  • Works for NCP-clients right now, trying to get Samba working... not done yet.
  • Can set policies for what to migrated, ModifyDate, AccessDate, FileType, etc.
  • Managed through NRM
  • Can do stacked policies, a global policy, and policies for specific volumes
  • Applies to not just NSS, but to ext3, reiser, xfs, and such.
  • Requires an exclusive lock on a file before it can be migrated.
  • This is a service on top of a file-system, not a feature of a file-system.
  • Monday morning keynote demo! Right there!
  • There will be a table in the Technology Lab
For more details, listen to the pod-cast.

Labels: , ,


Thursday, November 30, 2006

OES2

Novell Open Audio had a podcast last week about Open Enterprise Server 2 (it's official, that's the name). It was quite long, and full of nice information. Probably the best bit was about Shadow Volumes, which I've mentioned before. That just keeps getting better and better! I highly recommend listening to the pod-cast.

I've known for a while that it allows policy based migration of data to older/slower/cheaper media. Unlike traditional HSM technologies, Shadow Volumes are based on the last-modified date rather than the last-accessed date. Also, policies can include file types as well, so you can migrate your large multi-media files to media that handles long contiguous reads better. Or just migrate files larger than 50MB to that faster media. Whatever.

One scenario mentioned in the pod-cast was about data migrations of extremely large data. One Novell cluster mentioned in the pod-cast had 420TB of data in it. Ooo! Migrating THAT to a new SAN would take weeks. How it works is this:
  1. Set up the new server
  2. Configure the volume on the new server to use the old SAN as the migrate (i.e. slow) media
  3. Do the server migration itself
  4. As users modify data on the old SAN, it gets tagged for migration to the new SAN. In a week/month/whatever most of the active data is on the new SAN, and the older SAN gets less and less data.
  5. When the time comes to decomission the old SAN (assuming that's what you want to do) the total data migration is a lot easier.
Freaking cool.

Unfortunately, this is an OES2-Linux product only. It can use NetWare volumes as migration targets, but NetWare won't do the policy based decisions. Darn.

Also mentioned is that OES2 will include SP7 for NetWare 6.5, which will introduce Xen paravirtualization to NetWare. IMHO, this is spiffy if your hardware vendor has stopped significant NetWare support (*cough*dell*cough*) and you still need to use it. For us we'll probably stick with 'bare metal' installs for the time being, at least until we get proof that running a Xen-virtualized NetWare instance on a 64-bit server runs faster than the same NetWare running bare-metal on a 64-bit server (in 32-bit emulation mode).

It also sounds like they've spend serious time getting NSS and NCP faster. This is very needed, as I showed earlier. As file I/O is much more CPU bound on Linux than on NetWare, any improvements they can make will be appreciated.

Also, they hope (but are not promising) to give out a public beta of OES2 to all BrainShare attendees. I predict another round of benchmarking come early April.

OES2 is currently slated for relase in the late-May early-June timeframe. This is nifty, as that's the start of Summer for us. Though, we're not migrating right away unless we are blown away by the differences.

Labels: , ,


Friday, November 03, 2006

An interesting request

The other day I got asked a question that I hadn't considered before.

"Can you set up rights so that a certain group of student workers can only access this directory when they're at work, and not anywhere else?"

Clearly they don't trust the students all that much, but the question is still interesting. How to do that? We've spent a lot of effort promoting 'mobility' in terms of getting at your files from anywhere. Novell NetStorage is designed for exactly this. Yet how do you restrict access to a specific directory to:

(userMemberOf specificgroup.groups.wwu) AND (workstationMemberOf othergroup.wsgroups.wwu)

In NetStorage perhaps the easiest way is to make sure the drive that directory is in is not contained in any login-scripts the user has. That means it won't show up in NetStorage. On the other hand, if they come in on SFTP the files are still there for the taking. The problem with this is that the volume in question is in their login script already.

Another way to handle that is to create a second set of accounts for the students. These accounts would be workstation-login-restricted to just the workstations the department designates. Because of this, they won't be able to use those logins for NetStorage or SFTP as the servers (what shows up in the 'Network Address' field when you log in via NetStorage or SFTP) isn't in the approved list. The problem with this is that we have a strong 'one account' policy, even super-users like myself don't have a second low-priv account for routine use.

The crux of it is that this is the first time I've been asked to build location-awareness into an ACL. I wonder how other companies are handling this?

Tags: ,

Labels: ,


Wednesday, October 11, 2006

NSS read-ahead

One of the tuning items that has come up as I've been doing all of this benchmarking is NSS Read Ahead. This can be configured by two command-line parameters:

nss /AllocAheadBlks=[vol]:[count]
nss /ReadAheadBlks=[vol]:[count]

AllocAhead allocates blocks on writes, where ReadAhead is just that, blocks read ahead of the read. Both behaviors are to make access to the base I/O subsystem more efficient and to improve Read performance.

By default as of NetWare 6.5, the default ReadAhead is 2 blocks (8KB), and the default AllocAhead is 15 blocks (60KB).

So what is the recommended settings for these? The manual has this to say:
The most efficient value for block count depends on your hardware. In general, we recommend a block count of 8 to 16 blocks for large data reads; 2 blocks for CDs, 8 blocks for DVDs, and 2 blocks for ZLSS.
ZLSS is, I believe, a standard volume.

The question then begs what is the real optimal setting for this, based on what you can find out about your storage systems. I don't know, but I do have some suggestive ideas. If I have time, I'll see what I can do about testing it.

The gold standard is having very good data on how I/O is performed on your volume. For a volume consisting of mostly databases, such as Access files, the read-ahead should be set to a value close to the average record-read size. For a plain ole home-directory volume file size is probably the better determiner of 'best'.

Running some stats on the STU1 volume, I've found the following:

RAID Stripe size: 128KB
NSS Block size: 4KB
Median Size: 8192KB
Average Size: 293KB
File-count Median Size: 16KB
  • 50% of the files on STU1 are 16K or smaller
  • 50% of the files on STU1 are responsible for 0.55% of the total space used on STU1
  • 90% of the files on STU1 are 256K or smaller, which represents 7.9% of the total space used on STU1
  • 10% of the files on STU1 are responsible for over 90% of the data on STU1
Based on this, a ReadAhead value of "4" is probably in order. This represents a file size of 16K, which 50% of the files on the volume exceed. A ReadAhead value of 32 (128K) would match the RAID stripe size and would very likely enhance, possibly greatly, reads of those files that exceed 128K in length.

The GIS volume is another story.

Median Size: 200MB
Average Size: 11.5MB
File-count Median Size: 8KB
  • Total files on the volume is vastly smaller than on STU1
  • 43% of the data on the GIS volume are in files larger than 256MB
  • The largest file-type is TIF, which is an uncompressed graphics format that is read as a whole, not as sub-records
  • Files under 64MB in size represent 93% of the files, but only 7.7% of the data. Compare that with 99.97% and 89.3% respectively on STU1
In this case a ReadAhead setting of much higher is called for. The Novell guidance of "16" makes sense in this case, since that is 1/2 the stripe size and most of the reads on the volume are probably going to take advantage of this activity.

Tags: ,

Labels: ,


Tuesday, September 26, 2006

OES2 release pushed beyond BrainShare

To quote:
Please note that Open Enterprise Server services currently run on SUSE Linux Enterprise Server 9. New purchases of Open Enterprise Server will not include SUSE Linux Enterprise Server 10 until it officially becomes part of Open Enterprise Server in the next release, scheduled for mid-2007.
Hmm. This tells me that what we'd be seeing at BrainShare '07 will be beta builds of OES2. March is not 'mid-2007'.

This further brings the question of what the Big Thing will be at BS-07. Last year it was SUSE 10. All. Over. The. Place. OES2 will be big for me, but I'm not convinced that Novell will give the next OES the same push it did for SLES 10. I'm a bit irked that they seem to be minimizing the file and print serving that made the company, but that's just business; file-servers don't make for profit anymore. On the other hand, I may be wrong.

The flag-ship products are SLES, GroupWise, Zen, and Identity Manager. IM is a big consulting driver, and still a hot technology, so that'll still get a big focus. Zen7 SP1 is recently enough out the door that SP2 or even a version 8 is probably not going to happen by BrainShare time. GroupWise 7 has been out a while now, but I haven't heard any mumblings about a v8 for that product.

On the other hand, openSUSE 10.2 is in Alpha right now. According to the roadmap 10.2 will release Devember 7th. What this means for SLES is unclear to me, but it could mean that beta builds of SLES 10.2 may be available at BrainShare. You can find a list of changes from 10.1 to 10.2 (for openSUSE, this isn't for SLES) here. The changes aren't terribly significant, just improvements to the XWindows environment (both Gnome and KDE), and related applications.

So no, I can't yet tell what the Big All Consuming Message will be. Eh. Time will tell.

Tags: ,

Labels: ,


Wednesday, September 20, 2006

Results: wild speculation

The question on my mind is why is ncp serving on OES-Linux so much more resource intensive than OES-NetWare? The answers are not immediately clear, and I lack certain developer tools to answer why that may be. So I'm left with wild speculation, which I'll indulge in.

I strongly suspect a contributing factor is where the code executes. In NetWare everything is in Ring 0 (kernel-land) unless exiled to a Protected Memory Space whereupon it executes in Ring 3 (user-land). My CNE classes said that stuff running in a protected memory space typically runs 3-5% slower than in the OS memory space on NetWare. On Linux, at least as far as the 2.6 kernels anyway, memory accessible from Ring 0 is limited to the first 1GB of RAM and most processes are supposed to run in Ring 3. This is the architecture that permits things like "kill -9 [pid]" to work on Linux, but abend the server in NetWare.

There was a very handy slide at BrainShare 2006 that showed the differences in the NCP/NSS architecture in NetWare and Linux. The session was IO104: File System Roadmap by Richard Jones. Because you can purchase your very own BrainShare DVD, I'm going to assume that any NDAs on this information have lapsed. You'll want to open these links in different tabs, I'll be referring to the contents of them.

IO104 Slide 40: Linux and NetWare Architectures

The NetWare architecture is very familiar. I've been looking at that chart for years. The thing to note is that the NSS and NCP bits are right next to eachother in kernel-land, so run well together with little interference.

IO104 Slide 41: NSS on Linux in OES

This is how NSS and NCP are crammed into Linux. The 'up call' box is how communication between kernel-land and user-land are performed. Every piece of I/O that comes in on an NSS volume over any file protocol, NCP, Samba, NFS, or AFP, has to pass the user/kernel interface. If you look at slide 40 you can see that this is true for all file-systems on Linux.

The side information on slide 41 hints at a major problem when OES-Linux first shipped. At that time the file-cache was being kept in kernel-land like it is in NetWare. This gave some screaming numbers. Unfortunately Linux is limited to 1GB of RAM in kernel-land, and that has to be shared with everything in kernel-land. So it screamed... so long as you had very small file systems. Ahem. SP1 changed that so NSS could use Linux's native caching mechanism. It dropped the speed a bit, but it could again handle large file-systems.

Since every I/O request on a file-system has to pass the computing equivalent of the blood/brain barrier, this introduces certain lags. The true impact of this is unknown to me, as my linux-fu is too weak to know where to stick the probes to get an idea as to where all that CPU is going. Watching the split of load types I clearly saw that the CPU spent very little time in IOWAIT, and split roughly evenly between USER and SYSTEM. The NCP server was doing something, but NSS (all that SYSTEM time) clearly was quite busy as well. Due to how file-servers are handled on Linux if I had run this against Samba the busy process would have been SMBD, since CPU for file-system work is 'charged' against the calling process.

Then there is the possibility of just not having fully optimized code. I've heard that NSS as a linux file system runs 'only' 12% slower than reiser (when called locally on the Linux server, and not over a file-serving protocol), which says that NSS is pretty butch as it is. Scale is the key question, though.

The same File System Futures presentation had a few slides about where NSS is likely to go in future revisions of OES and SLES, where 'future' is likely the version past the one coming out Real Soon Now, and it looks quite promising. The block diagram for how the NetWare Services shim into Linux is much cleaner. The plan, as of March, was to shim in a 'NetWare Modular Features' layer between the file-systems and the Virtual File Services layer. The advantage to this would be at a minimum NetWare-style trustees on reiser, JFS, UFS, etc.

Once the next version of OES ships I'll see if I can get the hardware to re-run the dir-create and file-create tests. Even doing a single workstation should tell me what improvements, if any, were put into OES when it comes to scalability.

Tags: ,

Labels: ,


Thursday, September 14, 2006

Backups for OES

One of the things that has prevented us from seriously considering a move to OES-Linux has been the backup problem. Apparently there has been some movement on that issue. At Brainshare this year SyncSort was quite prominent in pointing out that they had full support for backing up NSS volumes on Linux.

Today over at Cool Blogs, Richard Jones posted about the progress of this technology in the industry. The short version is that Novell implemented SMS on Linux, and for vendors that already had a solid Linux client it required them to completely rewrite it. Which would explain why it has taken almost two years for the big storage players to come out with supported product. Novell has taken steps to support the really big storage players in UnixLand (IBM, et. al.) in their clients, using extended attributes (Xattrs).

Turns out that xattr thing was slipped into a patch on the 11th of August. I wonder if that's the same package that had shadow volumes included?

Tags: ,

Labels: , ,


Monday, August 28, 2006

Keeping up

Take a look at Hera's loading:



The break in the chart at the begining of 'week 34' is the point where I took Hera down to reformat it. The big spikes afterwards are all the Things I had to do to it last week.

The other thing to keep in mind is that the line before the break is formatted differently than after the break. Before, the green line was the point-in-time utilization of CPU0, and the blue line was the point-in-time utilization of CPU1. After the break, the green line is the 1-minute averaged load, and the blue line the 5 minute averaged load. Not exactly apples to apples. But generally speaking, if you add the blue and green lines together before the break you'll get and equivalent 'after the break' line.

The bit at the begining of the week where the blue line falls to zero and the green line gets really small is the time after I removed the replicas from Hera and before I turned it off. It was still getting non-trivial LDAP traffic at the time, but it was forwarding off to the other two eDir servers instead of serving results itself. Interesting.

I've already noticed that using iManager on that server will spike CPU quite noticibly. When I leave things be, load is about where I'd expect. Regular processing appears to be equivalent total load as before the reformat, possibly a bit less loading. What will be interesting is what the chart will look like once school starts up again.

Tags: ,

Labels: ,


Thursday, August 24, 2006

Plodding on

Things are more stable this morning, but we did have some issues. First and most worry making, two of the replicas on Hera were not in a good state. One, happily a small one, never left the "new" state. The other just plain wasn't synching completely.

First, the replica that never left 'new'. I haven't seen that one before, so it took a LOT of digging until I found the fix for it. All dstracing showed that attempts to sync that particular replica was throwing a -673 error (FFFFFD5F replica not on). What ultimately fixed it was doing a "network address repair" on the other two main eDir servers. That seemed to kick clear whatever blockage had built up.

The second one was easier. I just removed the replica from Hera while I worked the other problem. I put it back when the other replica was working fine. In the process I noticed that some of the servers in that replica (but not in the ring) were showing 'unlocatable' errors in the network address rebuild process. Not critical. But once the replica was back on, it showed no signs of going the way it did at first.

As a side effect, I also identified a handful of servers that weren't correctly advertising their presence in SLP. In every case the SLP discovery options were set to 2, or DHCP-only. In that state it'll ignore the slp.cfg completely. Changing it to 4 suddenly caused these servers to find the DA's and report their services, and thus permit their network addresses to be repaired.

SLP on this server in general is a bit confusing. I'm not sure what services an OES-Linux server is supposed to advertise, so I'm not sure if SLP is completely healthy.

I also managed to get LUM set up right. With that in place, My Fellow Admins can log in to the server without me having to create accounts! I'm so proud.

In terms of server health things are in very good shape. CPU usage is still a bit worrying, but now that I have a day's worth of data to look at it appears to be about the same as it was before. On the other hand, the "outstanding requests" in the iMonitor agent health-check consistantly shows lower numbers. Like 3-5 instead of the 7-9 it was before. Peanuts, but progress.

And this morning we heard that the first parts of the router replacement have started. A couple of buildings were moved to the new cloud around 6am today. No screaming so far.

Tags: ,

Labels: ,


Wednesday, August 23, 2006

More thumping

Today's tasks were to get the monitoring we do to that server set up correctly, and fix niggling things. The monitoring was actually pretty fast. I had done my homework on that one, and getting the new configs in was pretty trouble free.

One of the bigger niggling things was LDAP. This is more of a left-over of the meltdown we had this May. The servers that drive the single-signon for most of campus objected to the certificate that Hera was presenting. That got routed around, but it got a couple of us into the thick of things. Getting nldap on linux to present a new certificate isn't quite a simple as it is on NetWare. It isn't easy either place, but it took more... whacking to get the change to truly take on Linux.

Also, all of the replicas have been put back onto Hera. And weirdly, our total CPU usage is higher than it was before the change. WHA? The inverse of that was the goal of the whole change in the first place. We'll see how thing go once we get some normal usage under our belts. All the poking I'm doing on the server is taking cycles, and importing whole replicas is CPU intensive as it is.

Happily, I created an 'installation server' local to that machine so I don't have to feed CD's if I need to install something or other. I haven't decided if I'll make this network-accessible or not, as we have exactly zero further OES-Linux servers in the pipeline. But still. It'll save time.

Tags: ,

Labels: ,


State of the migration

I ran into a few hitches yesterday, that I hinted at. The first thing I ran across is that I don't understand how OpenSSL and NovellPKI work together. I got asked during the install to create a Certificate Authority. I got side-tracked in the mind-set of, 'there is only one CA per tree, and this isn't it', and didn't create one. This got me later when it didn't export some key SSL file and apache2 wasn't able to load.

So I removed edir from the server and tried the install again.

Where I came upon my second problem. Specifically, ndsconfig does not remove edirectory nearly as well as NWCONFIG does on NetWare. There were objects scattered hither and thither that prevented a successful reinstall of edir on the same server name. Objects like LDAP Server objects, and SAS objects. To get edir reinstalled successfully I had to manually delete all the extra objects.

This is a problem I ran into during testing, I just forgot I ran into it before I headed over to the other data center. Oops.

The third problem were the post-SP updates. Since SP2 was released in January, there have been a LOT of patches since then. 1.7GB worth of patches. Good thing I work for an Educational institution with fat pipes and was performing the update during an intersession when traffic is very light. Aye. THAT didn't give me any grief at all, happily.

Since SP2 came out, it looks like we've been averaging something like 2.3 patches per day inclusive of weekends. That's a lot. That's more than Microsoft in the bad old days before they came up with the Patch Tuesday concept. So once again I blow the dust off of procedures I used back then:
  1. Identify the patch.
  2. Assess if we have the package that is being patched.
  3. Determine if the behavior addressed by this patch is one we'll ever run in to.
  4. Based on 3, decided if this is a Patch Now, or Patch Normally patch.
Happily for me, normal users will never ever have file system access on this server so something like 85% of the security patches fix things that I'll only worry about once the server has already been broken in to. Therefore, most of the patches coming down the pike can wait for normal patch management days.

The other thing I forgot was the cardinal rule of doing ANYTHING with Linux:
Thou shalt have internet access and a browser. Yea, verrily, yea.
I didn't. I assumed, wrongly, that the Windows servers next to my patient would be usable. Those four are running headless. Oops. Ah well.

Tags: ,

Labels: ,


Thursday, June 15, 2006

HSM on NetWare

Richard Jones over at CoolBlogs recently posted a piece about Hiearchical Storage Management and the OES product line. NetWare has had HSM support since the 1980's, so there is nothing new there. But Linux is another story, and that's still in the pipeline. The article is a good read.

Go read it.

Tags: , ,

Labels: , ,


Tuesday, March 21, 2006

Novell client for Vista

One of the things that I found out today was Novell's plans for a client for Vista. This was one of the prime questions I was sent to BrainShare to answer. And the answer is...
Novell will release a preview client for Vista 60-90 days after Vista releases. There will also be a Vista64 client. But there will never be an XP-64 client due to XP-64 missing key bits of the network stack.
So there you have it. As for how long it'll take until there is a release-quality client, that remains to be seen. But it took Novell quite a while to get an XP-compatible client. Here is hoping that it doesn't take that long with Vista. But rigarous testing by all of us and reporting defects back to Novell will help that process long.

[update 2/5/2007: the 'technology preview' client is out. See here]

[update 6/28/2007: the Public Beta client is out. See the beta page]

[update 7/27/2007: Jason Williams says that it should be out in mid-August]

[update 8/20/2007: The 1.0 client is out. Get it here.]

Tags: , ,

Labels: , ,


Thursday, January 26, 2006

Benchmark results summary

These eight articles were written as part of a benchmark I ran. The goal was to check out two separate variables. NetWare vs Linux, and NCP vs CIFS. The hardware used in this test was identical.
Server Hardware:
HP ProLiant BL20, G3
2x 3.2GHz Cpu
2GB RAM
2x 72GB U320 HD, RAID1
Hyperthreading off
100MB Ethernet port

Client Info:
3.00GHZ CPU
1GH RAM
Novell Client 4.91.0.20050216
100MB Ethernet port, different subnet from server
WinXP,SP2 fully patched

Switched ethernet between Server and Client

NetWare Config
NetWare 6.5 SP4a (a.k.a. OES-NW SP1)
No post-SP4a patches
No changed NSS settings
No Proliant RomPaq applied (i.e. Novell supplied drivers, not HP-supplied)
10GB NSS volume
Purge-Immediate flagged in test directory

OES-Linux Config
OES-Linux SP1
Novell Samba
No post-patches (risky, I know, but best apples-to-apples since SP2 was on the Red Carpet servers)
10GB NSS Volume
Purge-Immediate flagged in test directory
The performance tests were performed with IOZONE over the network. As you would expect, certain tests were constrained by network performance, but the data was rich enough to draw conclusions from all levels of file size.

These tests were done such that only my I/O was being handled by the servers. I don't have the resources to check out how the two platforms and protocols handle high levels of contention. That'll have to be handled by people other than me.

Part 1: Caching
Part 2: CIFS
Part 3: NCP
Part 4: Comparing Cache, NCP-on-Linux vs CIFS-on-NetWare
Part 5: Comparing Uncached, NCP-on-NetWare vs CIFS-on-Linux
Part 6: Conclusions so far
Part 7: Uncached NCP
Part 8: NCP vs CIFS on Linux

The Bottom Line
NCP-on-Linux is the best bet. This is a surprising result, but it goes to show that Novell has done a good job in porting over NCP onto the Linux platform. I did not expect to find that NetWare was second to Linux for file-serving over Novell's 20 year old file serving protocol. The improvement for running NCP clients against a Linux server was not jaw dropping, only single digit improvements, but the fact that it is better at all says something right there.

And as a bonus, the data I drew it all from!

Labels: , ,


Benchmark results part 8: NCP vs CIFS on Linux

Summary

Since I now have data runs for both protocols that do not include client-side caching, this comparison should be a lot easier. So far we have learned that NCP overall is better than CIFS for the kinds of file-access our users do most. I expect this to show here as well. Earlier tests showed that NCP-on-Linux (cached) is better than CIFS-on-NetWare (cached), and NCP-on-Linux (uncached) is better than NCP-on-NetWare (uncached). Since I've already shown that NCP-on-NetWare is better than CIFS-on-Linux, and NCP-on-Linux is better than NCP-on-NetWare, it is a foregone conclusion that CIFS-on-Linux will be worse than NCP-on-NetWare.

But by how much? Same OS back end for the two, so lets go see!

Write Tests

The Write test turned in an overall performance increase of 17% for using NCP versus CIFS. Like the previous NCP vs. CIFS comparisons, the differences in performance are very visible in the Record Size scale. The 4K record size shows a performance increase of 97%, 8K at 95%, 16K at 59%, 32K at 44%, and 64K at 13%. After 64K CIFS starts performing better. Each progressive record size up to 16M gets a little bit worse for NCP, until it gets to 16M and has a performance hit of -13%. The file-sizes show a similar but flatter curve, with the inflection between NCP vs CIFS occurring between the 16M and 32M file-sizes. The 64K files perform 75% faster, and the 512M files perform 6% slower.
NCP vs CIFS on Linux, Writer test
The ski-jump look of the graph shows it all right there. As with the previous NCP vs CIFS, file-size doesn't have a LOT to do with performance, but it does have an impact. The slope of the 4K line shows that the larger file-sizes probably wouldn't be able to match NCP's performance for the smaller files.

The Re-Writer test showed an overall improvement of NCP over CIFS by 16%, a bit lower than the Writer performance. This was also reflected in the record-size and file-size performances. The movement isn't great, but it does suggest that CIFS contains slightly better metadata-handling than NCP.

The Random Write test showed an overall improvement of NCP over CIFS by 4%. The reason for the poorer showing is that NCP's small record-size performance that did so well in the Writer and Re-Writer tests, isn't nearly as good on this test. The same ski-jump is visible in the graph, but not to the same slope.

The Record Rewrite test showed an overall improvement of NCP over CIFS by 6%. Like the Random Write test, NCP wasn't able to show the stellar performance at the smaller record-sizes that it showed on the Write test. The inflection point is between the 64K and 128K record-sizes.

Read Tests

The Reader test turned in an average performance boost for using NCP of 6%. Like the earlier test comparing CIFS-on-Linux to NCP-on-NetWare, there isn't a strong correlation with record-size and performance.
NCP vs CIFS on Linux, Reader test
The performance was almost entirely better than CIFS, but in many cases only by a few percentage points.

The Re-Reader test performed much the same as the Reader test, and posted a performance increase of only 5%. Like the re-writer test, this is probably due to better meta-data handling in CIFS than with NCP. The data looks much like the Reader chart in shape and form.

The Random Read test posted a performance boost of 4%. NCP performed a bit better (up to 9%) at the smaller record sizes, but overall performance was generally just a few points above the break-even line.

The Backward Read test turned in a performance boost of 5%. As with most CIFS tests, NCP performed better at smaller record sizes. As with the Random Read test, performance was overall better than CIFS by only a few points on most of the chart.

Conclusions

While CIFS has NCP beat on writes to large files, NCP has CIFS beat on reads. This matches earlier results. In fact, NCP-on-Linux is better than NCP-on-NetWare enough that the large file reads are now above the 1.00 line. Novell has done a good job getting NCP ported to Linux.

Which protocol to use depends on what you are going to use the server for. For general office file-server usage, NCP is by far the better protocol. For GIS, large DB, or other large media files, CIFS probably is the better choice in those cases. In our case, though, NCP's access patterns fit our usage patterns better.

Summary

Labels: , , ,


Benchmark results part 7: Uncached NCP, Linux vs NetWare

Summary

The run is complete, and I now have a true apples-to-apples comparison of NCP performance. The result is a rather surprising one! In every single test, NCP-on-Linux out-performed NCP-on-NetWare. The lowest margin was 1%, and the highest margin was 9%, so the advantage isn't stellar. On the other hand, NCP started life on NetWare so you would expect it to do better on that platform.

T'ain't so.

One small trend did show up in the test data. Tests that involved a write component showed a slight, 1-3%, increase over the NetWare data. Tests that involved a read component showed a little better performance, 6-8%. The reasons for this are unclear, but it is very consistent.

Write Tests

The Writer test showed the best performance gain in the range of file-sizes 2M and under, and record-sizes 32K and under. The average improvement in this range was a rather respectable 5%, which is much higher than the overall average for the test of 1%. Performance seems to be affected more by file-size than by record-size, as the range of improvement over record-size was smaller than the range of improvement over the file-sizes. There is a hint in the data that 4K record-sizes for files larger than 16M are much better handled on NetWare, but that data was not gathered.

The Re-Write test showed similar patterns to the Write test, but slightly faster. As the description of the test says, a re-write doesn't affect meta-data to the same degree that a new file would. As with the Writer test, the best performance gain was in the range of file-sizes 2M and under, and record-sizes 32K and under. In that range the improvement was also 5%. Overall, the test showed a 2% improvement for running NCP on Linux. An interesting outlier in the data is the file-size of 8M, which turned in the worst result of the test at a -3%.

The Random Write test showed a 2% improvement in performance over NCP-on-NetWare. The best consistent performance was at the 64K and 256K file-sizes, each with a performance increase of 11%.

The Record Rewrite test showed the best performance of the write tests, at 3%. Every single record size tested showed at least a .5% improvement over NetWare. The best record-size was 4K, with a performance boost of 10%, and the worst was 16M, with performance just a hair over parity with NetWare. On the file-size front the results were very scattershot, with the best performance (21%) being turned in at the 128K file-size, and the worst (-5%) at the 4M file-size. The 'sweet spot' identified in the Writer test had an average improvement of 12%.

There were some trends over all of the writer tests as well. In every case, file-sizes of 16M and larger turned in a positive performance difference when run against NCP-on-Linux. The sweet-spot, file size of 2M or smaller and record size of 32K and smaller, turned in performance markedly better than the overall performance for that test.

Read Tests

The Reader test turned in an overall performance gain of 6% over NetWare. The tendency of the Writer report to show a decrease in performance at the 4K record size doesn't show up here. In fact, the number two and number three highest performance gain values on the chart were in the 4K record size column at the 2M (+38%) and 16M (+31%) file sizes. The 2M file-size showed the highest variability in performance as it had both the highest and lowest performance values on the chart. The 2M file-size with a 256K record size showed a -41% performance hit, and the 2M file-size with a 512K record size showed at +52% performance gain. The overall average for that file-size was 6%.

The Re-Reader test turned in a performance gain of 8%, which is presumably due to server-side caching of data being faster on Linux than on NetWare. There were two far outliers in the data which turned in performances 100% or better than the NetWare data. Looking at the raw data, these two results were due to NCP-on-NetWare turning in really bad numbers for 512K file-size and 8K record size, and 1M file-size and 64K record-size. Other than these two, the data is pretty even. As with the Reader test, the 4K record-size turned in very good numbers, especially at larger file-sizes.

The Random Read test turned in a performance gain of 6% over NCP-on-Netware. This was a hair faster than the initial Reader test, which shows that server-side caching still has a role to play. The range of values on this test was narrower than that reported by the Re-Reader test. There were no real 'hot spots' on the chart. The 4K record-size continued to show the largest variability.

The Backward Read test turned in the best value of the lot with a performance increase of 8% over NCP-on-NetWare. This test also had a far outlier at the 512K file-size/64K record-size level, where the NCP-on-NetWare test turned in an abysmal number. That value was excluded from the averaging, otherwise the performance increase of the test would have been a 9% and change. This test also showed a very strong value for the 4K record size, with an average performance increase of 21%. Another interesting result on this test is that the sweet spot identified in the Writer tests shows up on this one, with an average performance increase of 14%.

Unlike the Writer tests, the Reader tests didn't have any trouble at the 4K record-sizes on larger files. Overall performance was better than NetWare by a noticeable margin. There were a few exceptions, but generally speaking the results were consistent.

Conclusions

It is clear from the data that Novell has somehow managed to make NCP-on-Linux better than it was on NetWare. NetWare's historic claim as the end-all-be-all of File Servers may finally be coming to an end. Now to compare NCP-on-Linux (uncached) vs CIFS-on-Linux (uncached).

Part 8: NCP vs CIFS on Linux

Labels: , , ,


Wednesday, January 25, 2006

Benchmark results part 6: Conclusions so far

Summary

The analysis is done, and now it is time to make some decisions about what works best for us. As I've stated before, the majority of file-access to the NetWare cluster is with smaller files, and by definition smaller file-ranges. A lot of data on there is in larger files, but the count of those files is pretty small. On the User and Shared volumes, at least 50% of files are 64K or smaller; the smallest file-size in these tests.

I analyzed two big groups, NCP vs CIFS/SMB, and cached vs uncached. The cache/uncache was a surprise of the local settings, and it does taint the data. I hope to do another run with NCP-on-Linux in an uncached mode in order to better compare it against NCP-on-NetWare which seemed to run in an uncached state.

The NCP vs CIFS benchmarks were pretty clear. NCP is engineered to be better at handling files and access patterns in the range our users are most likely to use. This is unsurprising considering that Novell designed NCP to be a file-serving protocol from the ground up, and CIFS/SMB was more general purpose in mind. As such, for big files or large sub-ranges CIFS is the better protocol. In both of the cached and uncached comparisons NCP came out the winner.

When it comes to caching mechanisms, NCP worked best for our environment with one big exception in the 'Re-Reader' test. Microsoft's cache did this caching, so performance in that case was vastly better than the uncached NCP performance.

In the end what have I learned? The fact that the Novell Client performed local caching for the NCP-on-Linux test blew my testing objectives out of the water. In order to make any real tests I need to be able to test NCP-on-Linux in an uncached state, and I'm working on that. According to the tests, NCP-on-Linux is the best combination of protocol and caching.

Look for Part 7, where I compare NCP-on-Linux (uncached) against NCP-on-NetWare, and CIFS-on-Linux.

Part 7: Uncached NCP

Labels: , , ,


Benchmark results part 5: Comparing Uncached, NCP-on-NetWare vs CIFS-on-Linux

Summary

In this section I'm going to compare the two access methods that didn't have any local caching, NCP-on-NetWare and CIFS-on-Linux. The margin of differences between the two shouldn't be as large as it was for the cached methods, simply due to the relative speed of the network involved being a major limiter for speeds.
Write: This test measures the performance of writing a new file. When a new file is written not only does the data need to be stored but also the overhead information for keeping track of where the data is located on the storage media. This overhead is called the “metadata”It consists of the directory information, the space allocation and any other data associated with a file that is not part of the data contained in the file. It is normal for the initial write performance to be lower than the performance of rewriting a file due to this overhead information.
The graph for this test shows a strong correlation to record-size in performance. Clearly NCP-on-NetWare is much better at handling small sub-ranges of files than CIFS-on-Linux. Once the sub-range gets to a certain size between 128K and 512K (depends on file-size) then CIFS-on-Linux provides better performance. For most types of filaccesses our users use, NCP-on-NetWare would provide the better performance.
Re-write: This test measures the performance of writing a file that already exists. When a file is written that already exists the work required is less as the metadata already exists. It is normal for the rewrite performance to be higher than the performance of writing a new file.
As this graph also shows, there is a strong correlation to record-size in performance. The point where CIFS provides better performance comes a bit earlier, but the general trend remains.
Read: This test measures the performance of reading an existing file.
This graph doesn't show as strong a correlation to record size. The performance boost that NCon0n-Linux provides isn't nearly as strong as it was with the previous two writing tests. It seems to do best on files of 64K and in smaller record sizes.
Re-Read: This test measures the performance of reading a file that was recently read. It is normal for the performance to be higher as the operating system generally maintains a cache of the data for files that were recently read. This cache can be used to satisfy reads and improves the performance.
This graph looks a lot like the "read" graph. As above, the performance boost isn't terribly great. File-Size/Record-Size combinations that give a performance difference in excess of 10% are rare.
Random Read: This test measures the performance of reading a file with accesses being made to random locations within the file. The performance of a system under this type of activity can be impacted by several factors such as: Size of operating systemÂ’s cache, number of disks, seek latencies, and others.
This graph continues the trend of the previous 'read' graphs in that it isn't quite as impressive. Record sizes of 128K and smaller yield small gains, and above that line CIFS-on-Linux is the better get. With a few visible exceptions, most performance is also within 10%.
Random Write: This test measures the performance of writing a file with accesses being made to random locations within the file. Again the performance of a system under this type of activity can be impacted by several factors such as: Size of operating systemÂ’s cache, number of disks, seek latencies, and others.
This graph shows very similar trends with the previous Write graph. As with that graph, the break between NCP-on-NetWare being faster and CIFS-on-Linux being faster is when the record-size gets in the 128K-512K range. In terms of raw numbers, the Random Write is slower than the Write test, but this is to be expected.
Backwards Read: This test measures the performance of reading a file backwards. This may seem like a strange way to read a file but in fact there are applications that do this. MSC Nastran is an example of an application that reads its files backwards. With MSC Nastran, these files are very large (Gbytes to Tbytes in size). Although many operating systems have special features that enable them to read a file forward more rapidly, there are very few operating systems that detect and enhance the performance of reading a file backwards.
This graph looks like the previous 'read' graphs.
Record Rewrite: This test measures the performance of writing and re-writing a particular spot within a file. This hot spot can have very interesting behaviors. If the size of the spot is small enough to fit in the CPU data cache then the performance is very high. If the size of the spot is bigger than the CPU data cache but still fits in the TLB then one gets a different level of performance. If the size of the spot is larger than the CPU data cache and larger than the TLB but still fits in the operating system cache then one gets another level of performance, and if the size of the spot is bigger than the operating system cache then one gets yet another level of performance.
This graph looks nearly identical to the 'random write' test before.

While the results aren't as dramatic as they were for the cached methods, they are at least consistant. NCP-on-NetWare provides consistant and real performance improvements over a hardware-identical CIFS-on-Linux (Samba) configuration. Writing performance was much better in the file and record sizes we generally see on our NetWare servers. Large file sizes and record sizes were better handled by CIFS-on-Linux, but such access is a minority on our network. If we had a lot of video editing types around, I'd be singing a different story.

Part 6: Conclusions so far

Labels: , , ,


Tuesday, January 24, 2006

Benchmark results part 4: Comparing Cache, NCP-on-Linux vs CIFS-on-NetWare

Summary

In this section I'm comparing the two cached methods, NCP-on-Linux, and CIFS-on-NetWare. I'll do the uncached ones in the next section.

The comparison here is not as much apples-to-apples as I'd like. Microsoft caching, and Novell's caching use different mechanisms, and we're also going over different protocols and platforms as well. Because of this, the trends aren't nearly as clear cut as they were in the previous sections where we compared the differences between platforms.
Write: This test measures the performance of writing a new file. When a new file is written not only does the data need to be stored but also the overhead information for keeping track of where the data is located on the storage media. This overhead is called the 'metadata' It consists of the directory information, the space allocation and any other data associated with a file that is not part of the data contained in the file. It is normal for the initial write performance to be lower than the performance of rewriting a file due to this overhead information.
For this test, NCP-on-Linux outperforms CIFS-on-NetWare in the areas of most interest. As with a few tests so far, the 'sweet spot' seems to be with a file-size under 32MB and a record size under 512KB. NCP-on-Linux particularly out-performs CIFS-on-NetWare in the small file ranges. Improvements of 200-400% are pretty common within the sweet-spot range, with a few combinations (such as 512KB file, 64KB record size) going as high as 1300%.
Re-write: This test measures the performance of writing a file that already exists. When a file is written that already exists the work required is less as the metadata already exists. It is normal for the rewrite performance to be higher than the performance of writing a new file.
For this test, CIFS-on-NetWare outperforms NCP-on-Linux. However, the magnitude isn't nearly to the scale of the Write test. Record size again has something to do with the performance. The two methods reach near parity near a record size of 1MB. Though for files over 32MB, CIFS-on-NetWare provideconsistentnt 5-10% performance increase over NCP-on-Linux across the board.
Read: This test measures the performance of reading an existing file.
For this test there is no clear winner. NCP-on-Linux generally outperforms CIFS-on-NetWare when the record-size is filesize, or filesize/2. It also has small increases, 5-10%, for 16KB record-sizes and files around 8MB. Generally speaking, though, CIFS-on-NetWare outperforms NCP-on-Linux by an average of 7% across the board.
Re-Read: This test measures the performance of reading a file that was recently read. It is normal for the performance to be higher as the operating system generally maintains a cache of the data for files that were recently read. This cache can be used to satisfy reads and improves the performance.
This is very clear-cut. CIFS-on-NetWare blows the pants off of NCP-on-Linux for this test. The average performance increase for everything right up to playing with 512MB file is about 9000%. Why is this? Because NCP-on-Linux does NOT cache this particular test, and CIFS-on-NetWare does. This is a design choice from Novell, presumably.
Random Read: This test measures the performance of reading a file with accesses being made to random locations within the file. The performance of a system under this type of activity can be impacted by several factors such as: Size of operating system'’s cache, number of disks, seek latencies, and others.

For this test NCP-on-Linux is the winner. Especially for small record sizes or small files. For 4K records, the performance increase is 33% and for files 512K and under performance increase averages about 10% over CIFS-on-NetWare. Overall, performance is better by 10-15%.
Random Write: This test measures the performance of writing a file with accesses being made to random locations within the file. Again the performance of a system under this type of activity can be impacted by several factors such as: Size of operating system'’s cache, number of disks, seek latencies, and others.
NCP-on-Linux is the winner in the ranges important to me. CIFS-on-NetWare has better performance for large files at large record-sizes. NCP-on-Linux is clearly better with record sizes 16K and under. 72% better at 4K record size, 55% better at 8K, 37% better at 16K, and 11% better at 32K.
Backwards Read: This test measures the performance of reading a file backwards. This may seem like a strange way to read a file but in fact there are applications that do this. MSC Nastran is an example of an application that reads its files backwards. With MSC Nastran, these files are very large (Gbytes to Tbytes in size). Although many operating systems have special features that enable them to
read a file forward more rapidly, there are very few operating systems that detect and enhance the performance of reading a file backwards.
This is another test where NCP-on-Linux beats out CIFS-on-NetWare. The margin is not great, but consitent. As with the previous test, the best performance is with a 4KB record size. You have to get to the 16MB record-size to get a category that CIFS-on-NetWare outperforms NCP-on-Linux, and even there the difference is 3%. The overall performance increase of NCP-on-Linux is a shade under 9%.
Record Rewrite: This test measures the performance of writing and re-writing a particular spot within a file. This hot spot can have very interesting behaviors. If the size of the spot is small enough to fit in the CPU data cache then the performance is very high. If the size of the spot is bigger than the CPU data cache but still fits in the TLB then one gets a different level of performance. If the size of the spot is larger than the CPU data cache and larger than the TLB but still fits in the operating system cache then one gets another level of performance, and if the size of the spot is bigger than the operating system cache then one gets yet another level of performance.
This test showed a mixed result. For file-sizes 8MB and under, NCP-on-Linux clearly has a lead across all record-sizes. Results get a lot more spotty when file sizes go over that line. Performance is near parity when record-sizes are at 512KB and larger. CIFS-on-NetWare does best in the record-size 512KB and larger, and also in file-sizes 32MB and up. For the most common of file-access types, NCP-on-Linux would provide the best performance.

Overall, NCP-on-Linux appears to beat out CIFS-on-NetWare. The big exception is the ReRead test, where NCP-on-Linux doesn't even attempt to cache and the results are raw-IO. On a client station with small amounts of RAM, these results may be different since the caching being tested here is a function of the local machine rather than the servers. The servers do play a role, however, so this does need to be included.

Part 5: Comparing Uncached, NCP-on-NetWare vs CIFS-on-Linux

Labels: , , ,


Benchmark results part 3: NCP performance

Summary

As with the CIFS test, caching was present in one half of the environment so analysis isn't straight forward. NCP-on-Linux involved local caching where NCP-on-NetWare apparently did not. This was a confusing result, since the identical client settings were used for both environments. Also interesting to note is that the results of NCP-on-Linux were similar to the results for CIFS-on-NetWare. There are differences, but the general trends were similar.

As with the CIFS tests, the two tests that give the best uncached results are the Reader and Backward Reader tests.
NCP Reader comparisonReader Test, Comparing NetWare vs. Linux. The value is the multiplier that NetWare is faster than Linux. Units are in KB.

As with the CIFS test, the key value here was the record size. The sense is inverted from the CIFS test, in that it is the Linux environment that performs faster than the NetWare environment. This is a surprising result, considering that NCP is a native NetWare protocol, and NCP on Linux is a relative newcomer. The magnitude of the improvement is comparable to that of CIFS-on-NetWare over CIFS-on-Linux, which is an interesting result by it self. Also, as with the CIFS-on-NetWare, the improvement of NetWare over Linux in the larger record sizes is quite visible. The improvement is on the order of 5%, which comes close to the 7% improvement on larger file-sizes reported by CIFS-on-Linux.
NCP Backward Reader, file size view, comparisonBackward Read Test, File Size view, NetWare vs. Linux. The value is the multiplier that NetWare is faster than Linux. Units are in KB.

This test showed a difference between the CIFS and NCP tests. Unlike the CIFS test, file-size was not a strong determiner of performance. Record size was more closely associated.
NCP Backward Reader, record size view, comparisonBackward Read Test, Record Size view, NetWare vs. Linux. The value is the multiplier that NetWare is faster than Linux. Units are in KB.

As you can see from the chart, record size is the thing that separates performance. The break comes between the 64K and 128K record sizes. Unlike the CIFS results on this test, the level of improvement for NCP-on-Linux is not to the same magnitude as the improvement for CIFS-on-NetWare. As has proven to be common with cached vs. uncached access, larger file-size access for the uncached method is a little faster. In this case about 5%, which isn't close to the 14% gain reported by CIFS-on-Linux.

As with the CIFS tests, the cache mechanism makes checking true performance of the file system under certain levels hard. However, the cache only provides performance boosts below certain file-size and record-size levels, so we do have some data to play with. Not as much as I'd like, but it is still there.

The 'Record Rewrite' test shows how the effectiveness of caching reduces over time.
NCP Record Rewrite, NetWare vs Linux comparison
Record Rewrite Test, average improvement per record-size, NetWare vs. Linux. The value is the average multiplier of NetWare performance over Linux performance, averaged across all file-sizes.

That is a very clear curve, and shows rather well that caching only handles the first 256K of a record rewrite and the rest is handled through normal methods. The point where NCP-on-NetWare pulls ahead of NCP-on-Linux is between 2MB and 4MB. The curve suggests that rewrites higher than 16MB would pull even farther ahead, but that sort of file-access is rather rare, all things considered.

The 'Random Write' test does not show an improvement for the uncached method like it did with the CIFS tests. The improvement is linear starting at 128K and never breaks 1.00. Again, once the record size gets beyond 32MB there may be a point where it does, but again this sort of file-access is rather rare.

The 'Writer' test is cached, which is in the Novell Client spec. The improvements are for file sizes larger than 32MB and record sizes larger than 2MB. In that range the improvement of NCP-on-NetWare is about 7%. In the FileSize > 32MB range, regardless of record-size, the improvement is 1%. In the RecordSize > 2MB range, regardless of record-size, the improvement is 7%. This tells us that Record Size is again the biggest determiner of performance.

Like the CIFS tests, the data here show that NCP-on-Linux is the better bet. This has much more to do with NCP-on-Linux being cached better than NCP-on-NetWare. I am suspicious of this result, since NCP-on-NetWare should have been doing local caching as well, but it wasn't. Since the majority of file-access on our file-servers is going to be with files under 64K in size, NCP-on-Linux is the better bet from a pure performance perspective.

Part 4: Comparing Cached, NCP-on-Linux vs CIFS-on-NetWare

Labels: , , ,


Benchmark results part 2: CIFS performance

Summary

Because caching was a big difference between the CIFS-on-NetWare and CIFS-on-Linux runs, analysis is a bit difficult. CIFS-on-NetWare had local caching involved, so clearly apparent performance will be better for most usages when connected to CIFS-on-NetWare. True performance is another story, and that's what I'm trying to define in this section.

There were a couple of tests that don't involve the cache mechanism. The "Reader" and the "Backward Read" tests. IOZONE documentation defines the tests as:
Reader: This test measures the performance of reading an existing file.

Backward Read: This test measures the performance of reading and writing a file with accesses being made to random locations within the file. Again the performance of a system under this type of activity can be impacted by several factors such as: Size of operating systemÂ’s cache, number of disks, seek latencies, and others. This test is only available in throughput mode. Each thread/process runs either the read or the write test. The distribution of read/write is done on a round robin basis. More than one thread/process is required for proper operation.

CIFS Reader comparisonReader Test, Comparing NetWare vs. Linux performance. The value is the multiplier that NetWare is better than Linux performance. Units in KB

As you can see from the graph, Record Size is the key determiner of performance for this test. For smaller record sizes, NetWare is clearly better than Linux at CIFS performance. This is on the first read, the 're-read' test had caching enabled and CIFS-on-NetWare was vastly better than CIFS-on-Linux as a result.

For record-sizes larger than 64K, CIFS-on-Linux provided an average improved performance of about 7%.
CIFS Backward Read comparisonBackward Read Test,comparing NetWare vs Linux performance. The value is the multiplier that NetWare is faster than Linux. Units in KB.

As you can see from the graph, it is file-size that determines performance. Not the record size. Thecorrelationn here is less clear than it was for the Reader test, but it is present. For large files, performance on Linux is somewhat better then that on NetWare. When the chart is rotated to present the RecordSize view, there is some improvement for small records but only at really small file-sizes.

For file sizes larger than 8MB, CIFS-on-Linux provided about 14% better performance.

For the tests that do involve the cache-mechanism, we can only compare results for the data-sets that don't involve the cache. Specifically, large file-sizes and large record-sizes. For the 'Record Rewrite' test, which rewrites sub-sets of larger files, Linux provides about 10% improvement over the same test on NetWare. For the 'Random Write' test, which writes to random locations within the file, the improvement for Linux is also about 10%. For the 'Writer' test, which just lays down the file, the improvement is about 15% over NetWare. In all three cases, as the sub-range increases in size the better Linux performs over NetWare.

In the end, for the data most likely to be used by an end-user at WWU, CIFS-on-NetWare is the better choice of the two. Larger Access databases, big Power Point slideshows, and GIS maps may perform slower, but for most file-access it'll be faster.

Part 3: NCP

Labels: , , ,


Benchmark results part 1: caching features

Home
This'll go over a few posts, just due to the nature of the data and how I'm analyzing it.

Without going into detailed analysis of the data, a certain structure leaps out. Both CIFS-on-NetWare and NCP-on-Linux show clear signs of a local caching mechanism in use. This is odd, since I could have sworn I had NCP-on-NetWare enabled for local caching, but the data does not support that. What I have found is a sort of rule for caching.

For file operations on files 32MB or less, and in nibbles of 256K or less, the caching features strongly affect performance.

For file operations on files between 32MB and 64MB, and in nibbles of between 256K and 512K, caching features weakly affect performance.

For file operations on files larger than 64MB, or in nibbles of 1MB or larger, caching features do not affect performance.

Caching does not improve performance on all tests. Two tests very clearly show no influence of the caching mechanism. The Reader test, and the Backward Reader test. The Reader is the initial read of a file, so that makes sense that the caching would not affect performance. The Backward Reader test reads a file backwards, which is something that the caching mechanisms do not seem to pick up.

When you look at the rules and compare then to file-system statistics, you very clearly see that caching should improve performance on almost all operations the average user will perform. Inventories of our User-directory volumes show that 70% of files, by file-count, are 64K or smaller. Files larger than 32MB are a tiny, tiny percentage of files.

The situation on our big FacStaff shared volume is a little different. There 66% of files are 64K or smaller. In both cases, files larger than 64K consist of the majority of the data on the volumes. In the case of the shared volume, there are more files, as a percentage of total files, larger than the 32MB caching cut-off.

The conclusion you can draw from the above data is that NCP-on-Linux will be perceived as faster than NCP-on-NetWare. NCP-on-NetWare didn't show any caching behavior, so it suffers a major setback when compared to NCP-on-Linux which did exibit that behavior. The above data does not show what the 'true performance' of the two setups are. That'll come later, and the first-look data is that NCP-on-NetWare performs better in a no-caching state than NCP-on-Linux does.

Part 2: CIFS

Labels: , , ,


Friday, January 20, 2006

W-a-y early benchmark results

I have an NCP and CIFS benchmark under my belt against a NW65SP4a box, and the results are weird.

First, the environment:
The ServerThe Client
OES SP1 a.k.a NW65SP4a
WinXPsp2
NCP Caching Enabled
Novell Client, Caching Enabled
OPLOCK 2 Enabled
1GB RAM
CIFS with Oplock2 enabled
1x 3.0GHz Intel CPU
2GB RAMHyperthreading ON
2x 3.2GHz Intel CPU
100MB Ethernet
Hyperthreading OFF

100MB Ethernet


The thing that stands out most clearly is that CIFS/SMB's caching mechanism is far better than NCP's. In several of the test types, througputs were reported in the 'jaw dropping' range for CIFS, and that can only be attributed to pretty agressive caching. Though, once file-sizes get much above 128M, caching only goes so far and you start getting the feel for the efficiency of the base filesystem and network I/O.

That said, probably the best way to test the base system is what IOZONE calls the 'Backward Read' test. The test consists of the file being read backward, so caching mechanisms have to be designed to handle that case. This is the only test where NCP-on-NW stomped CIFS-on-NW across the board (mostly), and even there the performance increase was on the order of 5-15%. The one area on that test that CIFS-on-NW beat out NCP-on-NW was at the 64K file-size with 4K records, where the performance increase for using CIFS-on-NW was on average 13% better.

The performance of the network caching is interesting. It is STILL a common thread in the support forums for the Sysops to recommend turning off NetWare's file-caching features due to continuing and ongoing bugs. Yet in a benchmark I read that compared NetWare against the just-released Windows 2003 Server a couple years ago, in order for NetWare to beat out the Windows server on file-system performance caching and oplocks had to be turned on. At the time, that configuration was a known unstable one in the support forums.

Another area to note in the data I have now, is that network I/O is more of a bottleneck than raw disk I/O. Performances in the graph that are higher than the theoretical 100Mb ethernet max have to be, by definition, the result of client-side caching. This is an important distinction, since our file-servers performance will be judged by how zippy they seem to end-users on mapped drives, not the performance of web/db-applications hosted on the file-server.

Keep in mind, this is just the very early look at the data. I haven't done nearly enough work to draw conclusions. For instance, our Novell Client build may turn off client-side caching in a way I'm not familiar with. These things need checking.

Labels: , , ,


This page is powered by Blogger. Isn't yours?