Wednesday, December 23, 2009

That TCP Windowing fault

Here is the smoking gun, let me show you it (new window).

That's an entire TCP segment. Packet 339 there is the end of the TCP window as far as the NetWare side is concerned. Packet 340 is a delayed ACK, which is a normal TCP timeout. Then follows a somewhat confusing series of packets and the big delay in packet 345.

That pattern, the 200ms delay, and 5 packets later a delay measurable in full seconds, is common throughout the capture. They seem to happen on boundaries between TCP windows. Not all windows, but some windows. Looking through the captures, it seems to happen when the window has an odd number of packets in it. The Windows server is ACKing after every two packets, which is expected. It's when it has to throw a Delayed ACK into the mix, such as the odd packet at the end of a 27 packet window, is when we get our unstable state.

The same thing happened on a different server (NW65SP8) before I turned off "Receive Window Auto Tuning" on the Server 2008 server. After I turned that off, the SP8 server stopped doing that and started streaming at expectedly high data-rates. The rates still aren't as good as they were when doing the same backup to the Server 2003 server, but at least it's a lot closer. 28 hours for this one backup versus 21, instead of over 5 days before I made the change.

The packets you see are for an NW65 SP5 server after the update to the Windows server. Clearly there are some TCP/IP updates in the later NetWare service-packs that help it talk to Server 2008's TCP/IP stack.

Labels: , ,


Saturday, December 19, 2009

Sniffing packets

When I first started this sysadmin gig 'round about 1997, Windows based packet sniffers were still in their infancy. In fact, the word 'sniffer' was (and probably still is) a trademarked term for the software and hardware package for, er, sniffing packets. Sniffer. So when I needed to figure out a problem on the network, I went to the Network Guys who plugged their Sniffer into any available port on the 10baseT hub I needed analysis on and went to work. They told me what was wrong. Like a JetDirect card transmitting packets whenever it sensed a packet on the wire, thus bringing the network to is knees. Things like that.

Time passed and Sniffer was bought by Network Associates. Who then added a zero to the price because that package really did have a lock on the market. The next rev then more than doubled the already inflated price. So when it came time to renew/upgrade, our Sniffer couldn't handle Fast Ethernet, the price was eye watering. So. On came the free sniffers.

At first I was using Ether Boy, a now long lost packet sniffer. But eventually I found Ethereal (now WireShark), and I went to work. By the time I left my old job in 2003 I already had a rep for knowing WTF I was looking at, and the network guys didn't bat an eyelash when I asked for a span port. This ability was very handy when diagnosing slow Novell logins.

Fast forward to now. Right now I'm trying to figure out why the heck a certain NetWare server is so slow talking to the Data Protector media agent. It isn't obviously a TSA problem, but I've had problems with DP and NW talking to each other on the TCP level so that's where I'm looking now. Unfortunately for me, the desktop-grade GigE nic I have on the span isn't, shall we say, resourced enough to sniff a full GigE stream without at least a few buffer overruns. So I'm not getting ALL of the packets.

When I asked for the span port, the telecom guy said he greatly respected my ability to dig in to TCP issues. And said it in the voice of, "I think you're better at that kind of troubleshooting than we are." Which is a bit disconcerting to hear from your telecom router-gods. But there it is. What it means is that I can't very well ask for help interpreting these traces.

So far I've been able to determine that there is something hinky going on with network delays. There are some 200ms delays in there, which hints strongly at a failed protocol negotiation somewhere. But there are some rather longer delays, and it could be due to window size negotiation problems. Server 2008, the media-agent server, has a much newer TCP/IP stack than NetWare so it is entirely possible that they just don't work well together. I don't understand that quite well enough to manually deconstruct what's going on, so that's what I'm googling on right now.

And why Saturday? Because of course the volume that's doing this is our single largest and it is on the weekend where it is in the failed state where I can pry the hood off and look. Who knows, I may resort to posting packets and crowd sourcing the problem.

Update 12/23/09: Found it.

Labels: ,


Thursday, December 10, 2009

Old hardware

Watching traffic on the opensuse-factory mailing list has brought home one of the maxims of Linuxdom that has been true for over a decade: People run Linux on some really old crap. And really, it makes sense. How much hardware do you really need for a router/firewall between your home network and the internet? Shoving packets is not a high-test application if you only have two interfaces. Death and fundamental hardware speed-limits are what kills these beasts off.

This is one feature that Linux shares with NetWare. Because NetWare gets run on some really old crap too, since it just works, and you don't need a lot of hardware for a file-server for only 500 people. Once you get over a 1000 or very large data-sets the problem gets more interesting, but for general office-style documents... you don't need much. This is/was one of the attractions for NetWare, you need not much hardware and it runs for years.

On the factory mailing list people have been lamenting recent changes in the kernel and entire environment that has been somewhat deleterious for really old crap boxes. The debate goes back and forth, but at the end of the day the fact remains that a lot of people throw Linux on hardware they'd otherwise dispose of for being too old. And until recently, it has just worked.

However, the diaspora of hardware over the last 15 years has caught up to Linux. Supporting everything sold in the last 15 years requires a HELL of a lot of drivers. And not only that, but really old drivers need to be revised to keep up with changes in the kernel, and that requires active maintainers with that ancient hardware around for testing. These requirements mean that more and more of these really old, or moderately old but niche, drivers are drifting into abandonware-land. Linux as an ecosystem just can't keep up anymore. The Linux community decries Windows for its obsession with 'backwards compatibility' and how that stifles innovation. And yet they have a 12 year old PII box under the desk happily pushing packets.

NetWare didn't have this problem, even though it's been around longer. The driver interfaces in the NetWare kernel changed a very few times over the last 20 years (such as the DRV to HAM conversion during the NetWare 4.x era, and the introduction of SMP later on) which allowed really old drivers to continue working without revision for a really long time. This is how a 1998 vintage server could be running in 2007, and running well.

However, Linux is not NetWare. NetWare is a special purpose operating system, no matter what Novell tried in the late 90's to make it a general purpose one (NetWare + Apache + MySQL + PHP = a LAMP server that is far more vulnerable to runaway thread based DoS). Linux is a general purpose operating system. This key difference between the two means that Linux got exposed to a lot more weird hardware than NetWare ever did. SCSI attached scanners made no sense on NetWare, but they did on Linux 10 years ago. Putting any kind of high-test graphics card into a NetWare server is a complete waste, but on Linux it'll give you those awesome wibbly-windows.

There comes a time when an open source project has to cut away the old stuff. Figuring this out is hard, especially when the really old crap is running under desks or in closets entirely forgotten. It is for this reason that Smolt was born. To create a database of hardware that is running Linux, as a way to figure out driver development priorities. Both in creating new, missing drivers, and keeping up old but still frequently used drivers.

If you're running a Pentium 2-233 machine as your network's NTP server, you need to let the Linux community know about it so your platform maintains supportability. It is no longer good enough to assume that if it worked in Linux once, it'll always work in Linux.

Labels: , , ,


Monday, December 07, 2009

Account lockout policies

This is another area where how Novell and Microsoft handle a feature differ significantly.

Since NDS was first released back at the dawn of the commercial internet (a.k.a. 1993) Novell's account lockout policies (known as Intruder Lockout) were set-able based on where the user's account existed in the tree. This was done per Organizational-Unit or Organization. In this way, users in .finance.users.tree could have a different policy than .facilities.users.tree. This was the case in 1993, and it is still the case in 2009.

Microsoft only got a hierarchical tree with Active Directory in 2000, and they didn't get around to making account lockout policies granular. For the most part, there is a single lockout policy for the entire domain with no exceptions. 'Administrator' is subjected to the same lockout as 'Joe User'. With Server 2008 Microsoft finally got some kind of granular policy capability in the form of "Fine Grained Password and Lockout Policies."

This is where our problem starts. You see, with the Novell system we'd set our account lockout policies to lock after 6 bad passwords in 30 minutes for most users. We kept our utility accounts in a spot where they weren't allowed to lock, but gave them really complex passwords to compensate (as they were all used programatically in some form, this was easy to do). That way the account used by our single-signon process couldn't get locked out and crash the SSO system. This worked well for us.

Then the decision was made to move to a true blue solution and we started to migrate policies to the AD side where possible. We set the lockout policy for everyone. And we started getting certain key utility accounts locked out on a regular basis. We then revised the GPOs driving the lockout policy, removing them from the Default Domain Policy, creating a new "ILO polcy" that we applied individually to each user container. This solved the lockout problem!

Since all three of us went to class for this 7-9 years ago, we'd forgotten that AD lockout policies are monolithic and only work when specified in Default Domain Policy. They do NOT work per-user the way they are in eDirectory. By doing it the way we did, no lockout policies were being applied anywhere. Googling on this gave me the page for the new Server 2008-era granular policies. Unfortunately for us, it requires the domain to be brought to the 2008 Functional Level, which we can't do quite yet.

What's interesting is a certain Microsoft document that suggested settings of 50 bad logins every 30 minutes as a way to avoid DoSing your needed accounts. That's way more that 6 every 30.

Getting the forest functional level raised just got more priority.

Labels: , , , , , ,


Thursday, October 22, 2009

Windows 7 releases!

Or rather, its retail availability is today. We're on a Microsoft agreement, so we've had it since late August. And boy do I know that. I've been having a trickle of calls and emails ever since the beta released about various ways Win7 isn't working in my environment and whether I have any thoughts about that. Well, I do. As a matter of fact, Technical Services and ATUS both have thoughts on that:

Don't use it yet. We're not ready. Things will break. Don't call us when it does.

But as with any brand new technology there is demand. Couple that with the loose 'corporate controls' inherent in a public Higher Ed institution and we have it coming in anyway. And I get calls when people can't get to stuff.

The main generator of calls is our replacement of the Novell Login Script. I've spoken about how we feel about our login script in the past. Back on July 9, 2004 I had a long article about that. The environment has changed, but it still largely stands. Microsoft doesn't have a built in login script the same way NetWare/OES has had since the 80's, but there are hooks we can leverage. One of my co-workers has built a cunning .VBS file that we're using for our login script, and does the kinds of things we need out of a login script:
  • Run a series of small applications we need to run, which drive the password change notification process among other things.
  • Maps drives based on group membership.
  • Maps home directories.
  • Allows shelling out to other scripts, which allows less privileged people to manage scripts for their own users.
A fair amount of engineering did go into that script, but it works. Mostly. And that's the problem. It works good enough that at least one department on campus decided to put Vista in their one computer lab and rely on this script to get drive mappings. So I got calls shortly after quarter-start to the effect of, "your script don't work, how can this be fixed." To which my reply was (summarized), "You're on Vista and we told y'all not to do that. This isn't working because of XYZ, you'll have to live with it." And they have, for which I am greatful.

Which brings me to XYZ and Win7.

The main incompatibility has to do with the NetWare CIFS stack. Which I describe here. The NetWare CIFS stack doesn't speak NTLMv2, only LM and NTLM. In this instance, it makes it similar to much older Samba versions. This conflicts with Vista and Windows 7, which both default their LAN Manager Authentication Level to "NTLMv2 Responses Only." Which means that out of the box both Vista and Win7 will require changes to talk to our NetWare servers at all. This is fine, so long as they're domained we've set a Group Policy to change that level down to something the NetWare servers speak.

That's not all of it, though. Windows 7 introduced some changes into the SMB/CIFS stack that make talking to NetWare a bit less of a sure thing even with the LAN Man Auth level set right. Perhaps this is SMB2 negotiations getting in the way. I don't know. But for whatever reason, the NetWare CIFS stack and Win7 don't get along as well as the Vista's SMB/CIFS stack did.

The main effect of this is that the user's home-directory will fail to mount a lot more often on Win7 than on Vista. Also, other static drive mappings will fail more often. It is reasons like these that we are not recommending removing the Novell Client and relying on our still in testing Windows Login Script.

That said, I can understand why people are relying on the crufty script rather than the just-works Novell Login Script. Due to how our environment works, The Vista/Win7 Novell Client is dog slow. Annoyingly slow. So annoyingly slow that not getting some drives when you log in is preferable to dealing with it.

This will all change once we move the main file-serving cluster to Windows 2008. At that point, the Windows script should Just Work (tm). At that point, getting rid of the Novell Client will allow a more functional environment. We are not at that point yet.

Labels: , , ,


Tuesday, September 01, 2009

NetWare and Snow Leopard

In case you hadn't heard, the early release of Snow Leopard has tripped up Novell a bit.

http://www.novell.com/products/openenterpriseserver/snowleopard.html

What's interesting is that they'll be releasing a fix for NetWare too, not just OES. This suggests that the breakage isn't something like depreciating older authentication protocols, rather changing how such protocols are handled. That way the amount of engineering required is a lot less than trying to get Diffie Helman into NetWare.

Labels: ,


Thursday, August 06, 2009

Permission differences

In part, this blog post could have been written in 1997. We haven't exactly beaten down the door migrating away from NetWare.

Anyway, there are two areas that are vexing me regarding the different permissioning models between how Novell does it, and how Microsoft does it. The first has been around since the NT days, and relates to the differences (vast differences) between NTFS and the Trustee model. The second has to do with Active Directory permissions.

First, NTFS. As most companies contemplating a move from NetWare to Microsoft undoubtedly find out, Microsoft does permissions differently. First and foremost, NTFS doesn't have the concept of the 'visibility list', which is what allows NetWare to do this:

Grant a permission w-a-y down a directory tree.
Members of that rights grant will be able to browse from volume-root to that directory. They will see each directory entry along the path, and nothing else. Even if they have no rights to the intervening directories.

NTFS doesn't do that. In order to fake it you need two things:
  • Access Based Enumeration turned on on the share (default in Server 2008, and add-on option in Server 2003)
  • A specific rights grant on each directory between the share and the directory with the rights grant. The "Read" simple right granted to "this directory only".
Unfortunately, the second one is tricky. In order to grant it you have to add an Advanced right, because the "read" simple right grants read to, "This directory, files, and subdirectories," when what you want is, "this directory only". What this does is grant you the right to see that directory-entry in the previous directory's list.

Example: if I grant the group "StateAuditors" the write access to this directory:

\\accountServ\SharedFiles\Accounting\StandardsOffice\DocTeam\Procedures\

If I just grant the right directly on "Procedures", the StateAuditors won't be able to get to that directory by way of that share. I could just create a new share on that spot, and it'd work. Otherwise, I'll have to grant the above mentioned rights to each of DocTeam, StandardsOffice, and Accounting.

It can be done, and it can even be scripted, but it represents a significant change in thinking required when it comes to handling permissions. As most permissions are handled by our Desktop group, this will require retraining on their part.

Second, AD permissions. AD, unlike eDirectory, does not allow the permissions short-cut of assigning a right to an OU. In eDirectory, this allowed anything in that OU access to the whatever. In AD, you can't grant the permission in the first place without a lot of trouble, and it won't work like you expect even if you do manage to assign it.

This is going to be a problem with printers. In the past, when creating new print objects for Faculty/Staff printers, I'd grant the users.wwu OU rights to use the printer. As students aren't in the access list, they can't print to it unless they're in a special printer-access group. All staff can print, but only special students can. As it should be. No biggie.

AD doesn't allow that. In order to allow "all staff but no students" to print to a printer, I'd have to come up with a group of some kind that contains all staff. That's going to be too unwieldy for words, so we have to go to the 'printer access group for everyone' model. Since I'm the one that sets up printer permissions, this is something *I* have to keep in mind.

Labels: , ,


Tuesday, July 21, 2009

Digesting Novell financials

It's a perennial question, "why would anyone use Novell any more?" Typically coming from people who only know Novell as "That NetWare company," or perhaps, "the company that we replaced with Exchange." These are the same people who are convinced Novell is a dying company who just doesn't know it yet.

Yeah, well. Wrong. Novell managed to turn the corner and wean themselves off of the NetWare cash-cow. Take the last quarterly statement, which you can read in full glory here. I'm going to excerpt some bits, but it'll get long. First off, their description of their market segments. I'll try to include relevant products where I know them.

We are organized into four business unit segments, which are Open Platform Solutions, Identity and Security Management, Systems and Resource Management, and Workgroup. Below is a brief update on the revenue results for the second quarter and first six months of fiscal 2009 for each of our business unit segments:



Within our Open Platform Solutions business unit segment, Linux and open source products remain an important growth business. We are using our Open Platform Solutions business segment as a platform for acquiring new customers to which we can sell our other complementary cross-platform identity and management products and services. Revenue from our Linux Platform Products category within our Open Platform Solutions business unit segment increased 25% in the second quarter of fiscal 2009 compared to the prior year period. This product revenue increase was partially offset by lower services revenue of 11%, such that total revenue from our Open Platform Solutions business unit segment increased 18% in the second quarter of fiscal 2009 compared to the prior year period.

Revenue from our Linux Platform Products category within our Open Platform Solutions business unit segment increased 24% in the first six months of fiscal 2009 compared to the prior year period. This product revenue increase was partially offset by lower services revenue of 17%, such that total revenue from our Open Platform Solutions business unit segment increased 15% in the first six months of fiscal 2009 compared to the prior year period.

[sysadmin1138: Products include: SLES/SLED]



Our Identity and Security Management business unit segment offers products that we believe deliver a complete, integrated solution in the areas of security, compliance, and governance issues. Within this segment, revenue from our Identity, Access and Compliance Management products increased 2% in the second quarter of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 45%, such that total revenue from our Identity and Security Management business unit segment decreased 16% in the second quarter of fiscal 2009 compared to the prior year period.

Revenue from our Identity, Access and Compliance Management products decreased 3% in the first six months of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 40%, such that total revenue from our Identity and Security Management business unit segment decreased 18% in the first six months of fiscal 2009 compared to the prior year period.

[sysadmin1138: Products include: IDM, Sentinal, ZenNAC, ZenEndPointSecurity]



Our Systems and Resource Management business unit segment strategy is to provide a complete “desktop to data center” offering, with virtualization for both Linux and mixed-source environments. Systems and Resource Management product revenue decreased 2% in the second quarter of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 10%, such that total revenue from our Systems and Resource Management business unit segment decreased 3% in the second quarter of fiscal 2009 compared to the prior year period. In the second quarter of fiscal 2009, total business unit segment revenue was higher by 8%, compared to the prior year period, as a result of our acquisitions of Managed Object Solutions, Inc. (“Managed Objects”) which we acquired on November 13, 2008 and PlateSpin Ltd. (“PlateSpin”) which we acquired on March 26, 2008.

Systems and Resource Management product revenue increased 3% in the first six months of fiscal 2009 compared to the prior year period. The total product revenue increase was partially offset by lower services revenue of 14% in the first six months of fiscal 2009 compared to the prior year period. Total revenue from our Systems and Resource Management business unit segment increased 1% in the first six months of fiscal 2009 compared to the prior year period. In the first six months of fiscal 2009 total business unit segment revenue was higher by 12% compared to the prior year period as a result of our Managed Objects and PlateSpin acquisitions.

[sysadmin1138: Products include: The rest of the ZEN suite, PlateSpin]



Our Workgroup business unit segment is an important source of cash flow and provides us with the potential opportunity to sell additional products and services. Our revenue from Workgroup products decreased 14% in the second quarter of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 39%, such that total revenue from our Workgroup business unit segment decreased 17% in the second quarter of fiscal 2009 compared to the prior year period.

Our revenue from Workgroup products decreased 12% in the first six months of fiscal 2009 compared to the prior year period. In addition, services revenue was lower by 39%, such that total revenue from our Workgroup business unit segment decreased 15% in the first six months of fiscal 2009 compared to the prior year period.

[sysadmin1138: Products include: Open Enterprise Server, GroupWise, Novell Teaming+Conferencing,

The reduction in 'services' revenue is, I believe, a reflection in a decreased willingness for companies to pay Novell for consulting services. Also, Novell has changed how they advertise their consulting services which seems to also have had an impact. That's the economy for you. The raw numbers:


Three months ended


April 30, 2009

April 30, 2008

(In thousands)


Net revenue
Gross
profit


Operating
income (loss)


Net revenue
Gross
profit


Operating
income (loss)

Open Platform Solutions


$ 44,112
$ 34,756

$ 21,451

$ 37,516
$ 26,702

$ 12,191

Identity and Security Management



38,846

27,559


18,306


46,299

24,226


12,920

Systems and Resource Management



45,354

37,522


26,562


46,769

39,356


30,503

Workgroup



87,283

73,882


65,137


105,082

87,101


77,849

Common unallocated operating costs





(3,406 )

(113,832 )



(2,186 )

(131,796 )























Total per statements of operations


$ 215,595
$ 170,313

$ 17,624

$ 235,666
$ 175,199

$ 1,667



























Six months ended


April 30, 2009

April 30, 2008

(In thousands)


Net revenue
Gross
profit


Operating
income (loss)


Net revenue
Gross
profit


Operating
income (loss)

Open Platform Solutions


$ 85,574
$ 68,525

$ 40,921

$ 74,315
$ 52,491

$ 24,059

Identity and Security Management



76,832

52,951


35,362


93,329

52,081


29,316

Systems and Resource Management



90,757

74,789


52,490


90,108

74,847


58,176

Workgroup



177,303

149,093


131,435


208,840

173,440


155,655

Common unallocated operating costs





(7,071 )

(228,940 )



(4,675 )

(257,058 )























Total per statements of operations


$ 430,466
$ 338,287

$ 31,268

$ 466,592
$ 348,184

$ 10,148

So, yes. Novell is making money, even in this economy. Not lots, but at least they're in the black. Their biggest growth area is Linux, which is making up for deficits in other areas of the company. Especially the sinking 'Workgroup' area. Once upon a time, "Workgroup," constituted over 90% of Novell revenue.
Revenue from our Workgroup segment decreased in the first six months of fiscal 2009 compared to the prior year period primarily from lower combined OES and NetWare-related revenue of $13.7 million, lower services revenue of $10.5 million and lower Collaboration product revenue of $6.3 million. Invoicing for the combined OES and NetWare-related products decreased 25% in the first six months of fiscal 2009 compared to the prior year period. Product invoicing for the Workgroup segment decreased 21% in the first six months of fiscal 2009 compared to the prior year period.
Which is to say, companies dropping OES/NetWare constituted the large majority of the losses in the Workgroup segment. Yet that loss was almost wholly made up by gains in other areas. So yes, Novell has turned the corner.

Another thing to note in the section about Linux:
The invoicing decrease in the first six months of 2009 reflects the results of the first quarter of fiscal 2009 when we did not sign any large deals, many of which have historically been fulfilled by SUSE Linux Enterprise Server (“SLES”) certificates delivered through Microsoft.
Which is pretty clear evidence that Microsoft is driving a lot of Novell's Operating System sales these days. That's quite a reversal, and a sign that Microsoft is officially more comfortable with this Linux thing.

Labels: , , , , , , , ,


Monday, May 11, 2009

Rebuilds and nwadmin

Friday afternoon the Kala server, one of our three primary eDirectory replica servers, died. In event I've never seen before, one hard drive of a mirrored pair failed in such a way that bad data got committed. This server had to be rebuilt.

Happily for me, this is a procedure I can do without having to look things up in the Novell KB. This is part of the reason the letters "CNE" follow my name. The procedure is pretty straight-forward and I've done it before.
  1. Remove dead server's objects from the tree
  2. Designate a new server as the Master for any replica this server was the master of (all of them, as it happened)
  3. Install server fresh
The details change somewhat over time, but that's the same workflow it has been since the NetWare 4 days. In my case I did hit the KB to see if there was a way to do step 2 in iMonitor. I couldn't find one, so I did it through DSREPAIR which works just fine.

As for the install... this server is an HP BL20P G3, which means I used the procedure I documented a while back (Novell, local copy). A few minor steps changed (the INSERT Linux I used then now correctly handles SmartArray cards), but otherwise that's what I did. Still works.

For a wonder, our SSL administrator still had the custom SSL certificate we created for this server three years ago. That saved me the step of creating a CSR and setting up all the Subject Alternate Names we needed.

And today I fired up NWADMIN for the first time in not nearly long enough to associate the SLP scope to this server, since it was one of our two DA's. I could probably have done the same thing in iManager with "Other" attributes, but... why risk not getting all the right attributes associated when I have a tool that has all the built-in rules. This is the one thing that I still have NWAdmin around for. SLP-on-NetWare management.

Labels: , , ,


Wednesday, May 06, 2009

Windows 7 and NetWare CIFS

Now that RC1 is out we're trying things. Aaaaaand it doesn't work, even when set to the least restrictive Lan Man authentication level. Also? Windows 7 has a lot more NTLM tweakables in the policy settings that we don't understand. But one thing is clear, Windows 7 will not talk to NetWare CIFS out of the box. The Win7 workstation will need some kind of tweaks.

I may need to break out Wireshark and see what the heck is going on at the packet level.


Life on the bleeding edge, I tell you.

Update: Know what? It was a name-resolution issue. It seems that once you went to the resource with its FQDN rather than the short name, the short names started working. Kind of odd, but that's what did it. A bit of packet sniffing may illuminate why the short-name method didn't work at first (it should) which just might illuminate either a bug in Win7, or a simple feature of Windows name-resolution protocols.

The only change that needed to be made was drop the LAN Manager Authentication Level to not offer NTLMv2 responses unless negotiated for it.

Labels: , , ,


Monday, May 04, 2009

Cooperative multitasking and security, a browser perspective

An article over on Ars Technica goes into some detail about a dispute between two Firefox extensions that has gotten nasty. It seems that the extension environment inside Firefox (and that other Mozilla browser, SeaMonkey) is not sandboxed to any significant degree. The developer of NoScript was able to write in a complete disabling of the AdBlock Pro extension.

In some ways this reminds me strongly of how NetWare works. NetWare uses cooperative multitasking, rather than the preemptive multitasking used is pretty much every other modern server-class OS. This is part of the reason NetWare can squeeze out the performance it does under high load. The Firefox extensions run as children of the main Firefox process, can freely interact with each other, and when they crash hard can take the whole environment down with it.

Another way it reminds me of NetWare is the seeming lack of memory protection. In NetWare, unless you specifically declare that a module is to run in a protected memory space, it runs in the Kernel's memory space. This means that one program can access the memory of another program, so long as they're in the same memory space. This stands in contrast to other operating systems which provide a protected memory space for each process. It sounds to me like Firefox has a single memory space equivalent, and all process have access to it. This allows the extension-war described above to be possible without resorting to outright hacking around security features.

Moving away from a cooperative multitasking model made Windows at lot more stable (Win3.11 to Win95) as did the introduction of true memory protection (Win98 to Win2K). Memory protection is a major security barrier, and is something that Firefox seems to lack. If a Firefox install is unlucky enough to have an evil extension added to it, all the data that passes through that browser could be copied to persons nefarious, much like the Browser Helper Objects in IE.

Does it seem odd to you that I'm talking about Operating System protection features being built into browsers? These days browsers are in large part operating systems on their own, albeit ones missing the key feature of having exclusive control of the hardware.

These problems to appear to be on the verge of changing. Both Chrome and IE8 launch each browser tab in its own process, which adds a barrier between processes spawned in those tabs. That way when a flash game starts consuming all available CPU, you can kill the tab and keep the rest of your browser session running. Unfortunately, process separation still won't stop NoScript from killing AdBlocker. For that, more work needs to be done.

Labels: ,


Thursday, April 30, 2009

Windows 7 RC is out

And they're saying that Win7 will ship well before the Jan-2010 Vista timeframe mentioned before. Well, we kind of expected that. We've also been doing a lot with our network to make it more Win7 (and Vista) friendly, since we know we'll get a LOT of Win7 once it shows up for real.

The biggest concern is that Microsoft still hasn't fixed the issue that makes the Novell Client for Vista so darned slow. This is a major deal-breaker for us, so we've been informed from on high to Do Something so our Vista/Win7 clients can have fast file-serving and printing.

That "Something" has been to turn on the CIFS stack on our NetWare servers, with domain integrated login. The Vista and Win7 clients will have to turn their LanMan Authentication Level from the default (and secure) setting of, "Send NTLMv2 Response Only" to at most, "Send NTLM Response Only." The NetWare CIFS stack can't handle NTLMv2, nor will it ever. Those people who have been suffering through the NCV get downright bouncy when they see how fast it is.

Printing... we'll see. A LOT of the printing in fac/staff land is direct-IP which has no Novell dependencies. There are a few departments out there that have enough print volume that a print-server is a good idea, so I'm hoping there is an iPrint client for Win7 out pretty fast.

All in all, we're expecting uptake of Win7 to be a lot faster than Vista ever was. In this sense Win7 is a lot like Win98SE. All the press saying that Win7 is a lot better than Vista will help drive the push away from WinXP.

Labels: , , ,


Wednesday, April 22, 2009

A new version of BIND

I saw on the SANS log today that the ISC is starting work on BIND10. A list of the new stuff can be found here. A couple of those items are very interesting to me. Specifically the Modularity and Clustering items.

Modularity:
...the selection of a variety of back-ends for data storage, be it the current in-memory database, a traditional SQL-based server, an embedded database engine or back-ends for specific applications such as a high performance, pre-compiled answer database.
Which makes me think of eDirectory backed DNS. Novell has had this for ages with NetWare, and from what I recall it was based on BIND. But... BIND8. BIND10 would formalize this in the linux base, which would further allow Novell to publish a more 'pure' eDir-integrated BIND.

Clustering:
run on multiple but related systems simultaneously, using a pluggable, open-source architecture to enable backbone communications between individual members of the cluster. These coordination services would enable a server farm to maintain consistency and coherence.
This is exactly what AD-integrated DNS and the DNS on NetWare has been doing for over 8 years now. Glad to see BIND catch up.

The big thing about using a database of some kind as the back-end for DNS is that you no longer have to create Secondary servers and muck about with Zone Transfers. For domains that change on a second by second basis, such as an AD DNS domain with dynamic updates enabled and thousands of computers during morning power-on, it is entirely possible for a BIND secondary-server to be missing many, many DNS updates. Microsoft has known about this issue, which is why they have their own directory-integrated DNS service.

This also shows just how creaky the NetWare DNS service really is. That's based on BIND8 code, which is now over 10 years old. Very creaky.

I'm looking forward to BIND10. It is a needed update that addresses DNS as it is done today, and would better enable BIND to handle large Active Directory domains.

Labels: , , , ,


Wednesday, April 15, 2009

Windows 7 forces major change

I've said before that you'll have to pry the login-script out of our cold dead hands. The simple Novell login-script is the single most pervasive workstation management tool we have, since EVERYONE needs the Novell Client to talk to their file servers. Its one reason we have computer labs when others are paring down or getting rid of theirs. People can live without the Zen agents if they work at it, but they can't live without the Novell Client. Therefore, we do a lot of our workstation management through the login-script.

The Vista client has been vexing in this regard since it is so painfully slow in our clustered environment. The reason it is slow is the same reason the first WinXP clients were slow, the Microsoft and Novell name-resolution processes conmpete in bad ways. As each drive letter we map is its own virtual-server, every time you attempt to display a Save/Open box or open Windows Explorer it has to resolve-timeout-resolve each and every drive letter. This means that opening a Save/Open box on a Vista machine running the Novell client can take upwards of 5 minutes to display thanks to the timeouts. Novell knows about this issue, and has reported it to Microsoft. This is something Microsoft has to fix, and they haven't yet.

This is vexing enough that certain highly influential managers want to make sure that the same thing doesn't happen again for Windows 7. As anyone who follows any piece of the tech media knows, Windows 7 has been deemed, "Vista done right," and we expect a lot faster uptake of Win7 than WinVista. So we need to make sure our network can accommodate that on release-day. Make it so, said the highly placed manager. Yessir, we said.

So last night I turned CIFS on for all the file services on the cluster. It was that or migrate our entire file-serving function to Windows. The choice, as you can expect, was an easy one.

This morning our Mac users have been decidedly gleeful, as CIFS has long password support where AFP didn't. The one sysadmin here in techservices running Vista as his primary desktop has uninstalled the Novell Client and is also cheerful. Happily for us, the directive from said highly placed manager was accompanied by a strong suggestion to all departments that domaining PCs into the AD domain would be a Really Good Idea. This allows us to use the AD login-script, as well as group-policies, for those Windows machines that lack a Novell Client.

Ultimately, I expect the Novell Client to slowly fade away as a mandatory install. So that clientless-future I said we couldn't take part in? Microsoft managed to push us there.

Labels: , , , ,


Tuesday, February 17, 2009

tsatest and incrementals

Today I learned how to tell TSATEST to do an incremental backup. I also learned that the /path parameter requires the DOS namespace name. Example:

tsatest /V=SHARE: /path=FACILI~1 /U=.username.for.backup /c=2

That'll do an incremental (files with the Archive bit set) backup of that specific directory, on that specific volume.

Labels: , , ,


Wednesday, February 11, 2009

Performance tuning Data Protector for NetWare

HP Data Protector has a client for NetWare (and OES2, but I'm not backing up any of those yet). This is proving to take a bit of TSA tuning to work out right. I haven't figured out where the problem exactly is, but I've worked around it.

The following settings are what I've got running right now, and seems to work. I may tweak later:

tsafs /readthreadsperjob=1
tsafs /readaheadthrottle=1

This seems to get around a contention issue I'm seeing with more aggressive settings, where the TSAFS memory will go the max allowed by the /cachememorythreshold setting and sit there, not passing data to the DP client. This makes backups go really long. The above setting somehow prevent this from happening.

If these prove stable, I may up the readaheadthrottle setting to see if it halts on that. This is an EVA6100 after all, so I should be able to go up to at least 18 if not 32 for that setting.

Labels: , ,


High availability

64-bit OES provides some options to highly available file serving. Now that we've split the non-file services out of the main 6-node cluster, all that cluster is doing is NCP and some trivial other things. What kinds of things could we do with this should we get a pile of money to do whatever we want?

Disclaimer: Due to the budget crisis, it is very possible we will not be able to replace the cluster nodes when they turn 5 years old. It may be easier to justify eating the greatly increased support expenses. Won't know until we try and replace them. This is a pure fantasy exercise as a result.

The stats of the 6-node cluster are impressive:
  • 12 P4 cores, with an average of 3GHz per core (36GHz).
  • A total of 24GB of RAM
  • About 7TB of active data
The interesting thing is that you can get a similar server these days:
  • HP ProLiant DL580 (4 CPU sockets)
  • 4x Quad Core Xeon E7330 Processors (2.40GHz per core, 38.4GHz total)
  • 24 GB of RAM
  • The usual trimmings
  • Total cost: No more than $16,547 for us
With OES2 running in 64-bit mode, this monolithic server could handle what six 32-bit nodes are handling right now. The above is just a server that matches the stats of the existing cluster. If I were to really replace the 6 node cluster with a single device I would make a few changes to the above. Such as moving to 32GB of RAM at minimum, and using a 2-socket server instead of a 4-socket server; 8 cores should be plenty for a pure file-server this big.

A single server does have a few things to recommend it. By doing away with the virtual servers, all of the NCP volumes would be hosted on the same server. Right now each virtual-server/volume pair causes a new connection to each cluster node. Right now if I fail all the volumes to the same cluster node, that cluster node will legitimately have on the order of 15,000 concurrent connections. If I were to move all the volumes to a single server itself, the concurrent connection count would drop to only ~2500.

Doing that would also make one of the chief annoyances of the Vista Client for Novell much less annoying. Due to name cache expiration, if you don't look at Windows Explorer or that file dialog in the Vista client once every 10 minutes, it'll take a freaking-long time to open that window when you do. This is because the Vista client has to enumerate/resolve the addresses of each mapped drive. Because of our cluster, each user gets no less than 6 drive mappings to 6 different virtual servers. Since it takes Vista 30-60 seconds per NCP mapping to figure out the address (it has to try Windows resolution methods before going to Novell resolution methods, and unlike WinXP there is no way to reverse that order), this means a 3-5 minute pause before Windows Explorer opens.

By putting all of our volumes on the same server, it'd only pause 30-60 seconds. Still not great, but far better.

However, putting everything on a single server is not what you call "highly available". OES2 is a lot more stable now, but it still isn't to the legendary stability of NetWare 3. Heck, NetWare 6.5 isn't at that legendary stability either. Rebooting for patches takes everything down for minutes at a time. Not viable.

With a server this beefy it is quite doable to do a cluster-in-a-box by way of Xen. Lay a base of SLES10-Sp2 on it, run the Xen kernel, and create four VMs for NCS cluster nodes. Give each 64-bit VM 7.75GB of RAM for file-caching, and bam! Cluster-in-a-box, and highly available.

However, this is a pure fantasy solution, so chances are real good that if we had the money we would use VMWare ESX instead XEN for the VM. The advantage to that is that we don't have to keep the VM/Host kernel versions in lock-step, which reduces downtime. There would be some performance degradation, and clock skew would be a problem, but at least uptime would be good; no need to perform a CLUSTER DOWN when updating kernels.

Best case, we'd have two physical boxes so we can patch the VM host without having to take every VM down.

But I still find it quite interesting that I could theoretically buy a single server with the same horsepower as the six servers driving our cluster right now.

Labels: , , , , , , ,


This page is powered by Blogger. Isn't yours?