September 2007 Archives

Novell client for Linux, part 2

I'm doing a test with the vanilla openSUSE kernel and the results are VASTLY BETTER. A snippet:
    KB  reclen   write rewrite    read    reread
51200 256 9178 9189 9130 8898
Compare that with the earlier numbers for that record size and you can see the problem.
    KB  reclen   write rewrite    read    reread
51200 256 595 587 8551 8491
See? It has to be something in the Xen Dom0 environment.

This would be a lot easier if Wireshark would stop freezing hard when I try to save a capture.

Novell Client for Linux

I'm running the beta of it, but keep in mind I'm also running it on openSUSE, which is not what it has been specifically designed for. That said, I do have some performance numbers. Yesterday I ran a very simple IOZONE test on a 50MB file. This isn't a great test since it fits entirely inside the server's cache. But I'm not worried about that, I just want to see how fast this code can peddle.

Performance was... not good. Unfortunately, right now I can't tell if that is a side effect of running it on openSUSE 10.2, inside a Dom0 Xen domain. I want to re-run it running the standard kernel and see if the performance also doth suck.
                                   random  random
KB reclen write rewrite read reread
51200 4 3259 3218 3875 4026
51200 8 5144 4925 5404 5215
51200 16 193 196 6980 6788
51200 32 593 599 8371 8309
51200 64 609 594 8213 8437
51200 128 589 590 8399 8432
51200 256 595 587 8551 8491
51200 512 598 595 8528 8439
51200 1024 596 588 8580 8295
51200 2048 594 596 8595 8542
51200 4096 597 609 8258 8548
51200 8192 595 599 8683 8683
51200 16384 608 599 8673 8585
As you can see, performance at record sizes over 8 is not great for writes. Reads, on the other hand, are quite zippy. Still not as zippy as the Windows client, which I've seen get up to 11000 on those tests. But still, zippy. I don't know where the problem is. I did some sniffing to try and figure it out, but nothing really stuck out as a cause. I'm seeing some 160ms delays in ACKs, but they're coming from the server. I can't tell what condition of the client-side write is causing the delayed ACK. Need more testing.

OES2 release date

Just got out of the WebCast they had. First, the important stuff:

OES2 will be released on October 5th.
OES2-SP1 is targeted for mid-April, 2008.
AFP integration will be in SP1.

I sooooooooo hope they don't push SP1 past July. If that happens, my main migration of our cluster will have to be pushed to 2009. Ick. We're already running out of effective file-cache in 32-bit memory space. I need 64-bit to really give good performance. Hope hope hope.

A few other minor points:
  • Around the release of SP1, Prosoft and Condrey Consulting (Kanaka) will release an NCP client for Mac.
  • The clearing of throats next to a mic is a sign of someone who doesn't do a lot of work in front of mics.
  • OES2 is fully 64-bit optimized (on Linux)
  • They claim EVEN BETTER NSS performance on OES2. I hope to try that out, soon as I can figure out how to get SLES10/OES2-beta5 to talk to my SAN luns. It hates me.

OES2 Web-chat tomorrow

This isn't exactly widely spread, but here it is:

Open Enterprise Server 2 Live Webcast

Tomorrow, September 26th at 11AM PDT.

They'll be talking about all the spiffy thats in OES2, and some new info about code releases. I think this is the 'event' they mentioned a while back.

The perils of a manual process

Yesterday I found the root cause of a rather perplexing problem. We had a user, happily for me one of the other sysadmins at WWU, who couldn't get their eDir password changed. No matter how many times he ran the identity management process, his AD PW would change, but eDir would not even though the success on the event was good.

A word of note:

We do not use Novell Identity Management. We've built our own. When Novell first came out with DirXML 1.0, we already had the foundation of what we have right now. So, when I talk about IDM, I'm actually referring to our own self-built system not Novell's IDM.

To troubleshoot, I ran many tests. The longest one was to turn on dstrace logging on the root replica server, and push changes to the object. I'd push a change, watch the logs, then parse through the log for the user's object.
  • Changing it via LDAP made a sync.
  • Changing it via the IDM did not make a sync.
  • Changing it via iManager made a sync.
  • Changing it via ConsoleOne on the IDM server made a sync
This would point to some flaw in the IDM process. This is unlikely, as the password change logic has been largely unchanged for close to 7 years. The underlying libraries have also been unchanged for close to 3 years. Very unlikely to be that. What it could be, though, is some odd-ball untrapped error.

To figure out what that is, I needed to sniff packets. PKTSCAN to the rescue. On the IDM server I turned off connections to all but the server holding the Master replicas of everything. Then on the master replica server I loaded PKTSCAN. I turned on sniffing, make the change, wait 5 seconds just to be safe, turn off the sniff, save the sniff, and load the sniff in Wireshark. The two cases I tested:
  • Change the concurrent connections attribute through IDM
  • Change the concurrent connections attribute through ConsoleOne on the IDM server
This is what showed my problem. When I did it through IDM, it was attempting to change the Concurrent Connections attribute of T=WWU. Ahem. When I did it through ConsoleOne, it was attempting to change the Concurrent Connections attribute of CN=[username].OU=Users.O=WWU. AHAH!

Looking at the details of T=WWU, I saw that it had an aux class associated with it. It was posixAccount. Thus, was I illuminated.

This particular sysadmin requested to have his account 'turned on for linux'. Which is code for having the posixAccount aux-class associated and the uid, gid, cn, and shell attributes added. This is still a manual process for us since requests are few and far between, though that is changing. It would seem that when I did it, I right-clicked on the wrong object. Whoopsie poo! Easily fixed, though.

I removed the aux-class from the tree root object, and suddenly... IDM changes started applying to the right object! Hooray! I think the IDM code was keying off of commonName rather than CN for some reason, which is why the aux-class got in the way.

Neat eDir trick

One thing that I learned at BrainShare years ago is that eDir 8.7 permits LDAP clients to register against events. Probably the most widely applicable devnet thing is the LDAP Classes for Java. From my understanding, this sort of technology is used in both Novell Identity Manager and NSure Audit.

So, what the heck is it? From the documentation:
The event system extension allows the client to specify the events for which it wants to receive notification. This information is sent in the extension request. If the extension request specifies valid events, the LDAP server keeps the connection open and uses the intermediate extended response to notify the client when events occur. Any data associated with an event is also sent in the response. If an error occurs when processing the extended request or during the subsequent processing of events, the server sends an extended response to the client containing error information and then terminates the processing of the request.
It's an extension to LDAP that Novell created to permit event monitoring. It monitors events in eDirectory, from object changes, to internal eDirectory statuses like obituary processing. For example, you can set up a connection and tell the LDAP server to tell you of all changes to the "member" attribute, and track all group modifications. Or track the "last login time" attribute, and create a robust login audit log.

Stuff like this is downright handy in identity management situations. If a change is made to "phoneNumber" in the Identity tree, that change can be trapped by the monitor, and propagated to the production eDir tree, Active Directory, and NIS+. What's now a batch process can be event based.

I'm not a java programmer so I'm limited in what *I* can do with it. However, I have coworkers who DO speak java, and can probably do wonderful things with it.

Virtualization and Security

It's been a few days for it.

Two BrainShare's ago, when I first heard about AppArmor, the guy giving the demo was very, very clear that virtualization is not a security barrier. Especially AppArmor. This may seem a bit contradictory, considering what AppArmor is supposed to do. What he meant was that you should not rely on AppArmor to provide separation between two applications with very different security postures. Physical separation is best.

That extends to full virtualization products like VMWare or XenSource. On Saturday the Internet Storm Center had a nice diary entry on this very topic. To summarize, Malware already detects virtual machines and changes its behavior accordingly. Last Friday, VMWare released a patch for ESX server that fixes some very interesting security problems. The patch also links to CVE-2007-4496, which is well worth a read. In short, an administrative-user in a guest OS can corrupt memory or possibly execute code in the Host OS. These are the kind of vulnerabilities that I'm worried about.

Any time you run on shared hardware the possibility exists of 'leaking' across instances. Virtualization on x86 is still primitive enough that that the barriers between guest OS instances aren't nearly as high as they are on, say, IBM Mainframes which have been doing this sort of thing since the 1960's. I fully expect Intel (and AMD if they can keep up) to make the x86 CPU ever more virtualization friendly. But until we get to robust hardware enforcement of separation between guest OS instances, we'll have to do the heavy lifting in software.

Which means that a good best-practice is to restrict the guests that can run on a specific virtualization host or cluster to servers with similar security postures. Do not mix the general web-server with the credit-card processing server (PCI). Or mix the credit-card processing server (PCI) with the web interface to your Medical records (HIPPA). Or mix the bugzilla web-server for internal development (trade secrets) with the general support web-server.

Yes, this does reduce the 'pay-back' for using virtualization technology in the first place. However, it is a better posture. Considering the rate of adoption of VM technology in the industry, I'm pretty sure the black-hat crowd is actively working on ways to subvert VM hosts through the guests.

Mod_edir issues again

| 1 Comment
As I mentioned last week, I'm seeing a LOT of connections related to mod_edir. Late Friday I updated to the 1.0.13 build, which updates from 1.0.12 that comes with SP6. That doesn't seem to have fixed the problem. On the plus side, it would seem that the mod_edir developers know of this problem. On the down side, I don't see a fix.

Right now I'm suspecting libc, as that's been my problem in the past. Perhaps the connection tear-down code in mod_edir isn't "taking" somehow.

Unfortunately, I'm not sure if I can call in an incident against mod_edir, or if I'll have to work with the devs (somehow) and call in against libc. If I reboot the web-servers every couple of days that causes the connections to close, but that is not a fun solution.

That... is a lot of connections!

| 1 Comment
Oh, NetWare fans, take a look at this:
MONITOR Snapshot of an impressive number. Concurrent Connections = 13075Yep. Check the Concurrent Connections number. That is a very big number. During term we run between 1500 and 4000 concurrent connections. Yet... that is way above that. What's more, going into the Novell Remote Manager, I find this pair of very interesting numbers:

Connection Slots Allocated: 44000
Connection Slots Being Used: 43982

Looking at the connections shows me what the problem is. All those 'extra' connections are for the user account that allows MyWeb (what you're reading this through ultimately) to work. Somehow... and this is a guess... mod_edir seems to be creating a new connection for each request coming in, rather than reusing the old ones. Or perhaps it isn't cleaning up after itself. Probably since I put SP6 in.

This would explain why this particular server has an unreasonably high memory allocation to CONNMGR. Must Poke More.

OES2: clustering

| 1 Comment
I made a cluster inside Xen! Two NetWare VM's inside a Xen container. I had to use a SAN LUN as the shared device since I couldn't make it work doing it just to a single file. Not sure what's up with that. But, it's a cluster, the volume moves between the two just fine.

Another thing about speeds, now that I have some data to play with. I copied a bunch of user directory data over to the shared LUN. It's a piddly 10GB LUN so it filled quick. That's OK, it should give me some ideas of transfer times. Doing a TSATEST backup from one cluster-node to the other (i.e. inside the Xen bridge) gave me speeds on the order of 1000MB/Min. Doing a TSATEST backup from a server in our production tree to the cluster node (i.e. over the LAN) gave me speeds of about 350MB/Min. Not so good.

For comparison, doing a TSATEST backup from the same host only drawing data from one of the USER volumes on the EVA (highly fragmented, but must faster, storage) gives a rate of 550 MB/Min.

I also discovered the VAST DIFFERENCES between our production eDirectory tree, which has been in existence since 1995 if the creation timestamp on the tree object is to be believed, and the brand new eDir 8.8 tree the OES2 cluster is living in. We have a heckova lot more attributes and classes in the prod tree than in this new one. Whoa. It made for some interesting challenges when importing users into it.

OES2-beta progress

As mentioned before, I have the OES2 beta. Right now I have two NetWare servers parked in Xen VM's on SLES10SP1. This is how it is supposed to work!

I haven't gotten very far in my testing, but a few things are showing. I managed to do a TSATEST-based throughput run of a backup of SYS. That's about a gig of data. Throughputs for just one stream to one of the servers was around 500 MB/min, which is passible and within the realm of real performance for slower hardware. The downside of that is that the CPU reported by "xm top" was around 45%, where the CPU reported in MONITOR was closer to 25%. That's way higher than I expected, but could be related to all the disk I/O ops. This I/O was to a file in the file-system, not a physical device like a LUN on the SAN (that comes later).

Now I'm trying to get Novell Cluster Services installed. I want to get a weensy 2-node cluster set up to prove that it can be done. I suspect it can, but actually seeing it will be very nice.

Email encryption

| 1 Comment
The last time I seriously took a look at email encryption was at my old job, using GroupWise 5.5. I did some poking around here with Exchange/Outlook and made it work, but it wasn't a serious look. Back then there was still real doubt about which standard would reign supreme: PGP (or GPG) vs S/MIME. PGP had been around for ages, where S/MIME used the same PKI infrastructure used by banks for secure online banking.

Outlook and GroupWise both had S/MIME built in. Both used the Microsoft crypto API. Remember, this was GW 5.5 so there was no Linux version yet.

If you look at posts on Bugtraq, clearly PGP is reigning supreme. A lot of posts there tend to be signed, and almost all of the signatures are GPG (the GnuPGP) or PGP. So that would tend to suggest that PGP-style stuff is winning. Except... bugtraq is primarily a Linux list that also bashes Microsoft, so the preference for the old school secure email (PGP) is easy to understand.

Yet why are the major email systems shipping with S/MIME built in?

There are several reasons why digitally signed email hasn't caught on. First and foremost it requires active use on the part of the user, in the form of explicitly stating "I trust this user and their certificate". Second, managing certificate renewals and changes adds work. Third, certificates for S/MIME are subject to the same SSL problems web-site certificates are, price. Fourth, the certificates (be it PGP or S/MIME) generally are only usable on a single operating system instance, which makes portability challenging.

Thawte.com still offers free email SSL certificates for personal use. I haven't read the details, but I suspect that 'professional use' is invalidated, which would prevent WWU from going to them whole-sale. I'll have to look.

The very nature of secure email makes it something only people who want it will strive for. This is not something that can be pushed down from On High unto the masses for enterprise deployment. Like sites with bad SSL certificates, Outlook will throw a Warning! message when it receives an email signed by a certificate it doesn't trust or know about. End users are notorious for being annoyed by pop-ups they view as superfluous. As with SSL certificates we have the Trusted Certificate Authority problem, which means that any external signed communication needs to be signed with a certificate already known by everyone (i.e. VeriSign, or similar).

And ALL of this doesn't look at the problem of digitally signed email in web clients like gmail. I have many friends who use their web browser as their primary email interface. AJAX can do a lot, but I don't know if it can do secure decryption/validation of email. I'm pretty sure AJAX can do insecure decryption/validation, which sort of violates the point. Right now, in order to do actual secure email you have to use a full mail client with support for the relevant protocol(s). Which means that, as above, only people serious about email security will take the steps to do email securely; it can't be mandated and invisible to the user.

So, things haven't changed much in the 4-5 years since I last looked at it.

Portability could be solved through creative use of a directory-service. I know for sure that eDir can store SSL certificates just peachy, the trick is getting them out and integrated into a mail client by way of LDAP. Active Directory has similar capabilities, but even Microsoft hasn't implemented AD/SMIME integration.

That said, directory integration provides its own problems. I, with my god like powers, can create and export private keys for generic users and through that securely impersonate them. This creates a non-repudiation problem, and is the reason that Microsoft's SecureAPI has a setting to require a password to be entered before using a certificate for signing. That password is currently set on the local machine, not in AD, which is how god-like-me can be foiled in my quest to forge emails.

Still, email security remains the purview of those to whom it is important. Lawyers and security professionals are the groups I run into most often that use it. I know some hobbyists that use the technology between themselves, but that's all it is, a way to prove that they can make the technology work in the first place. It still isn't ready for "the masses".

An annoying phish

This one sailed right through our borders. It is a CapitalOne phish. The interesting parts:
  1. From: "CapitalOne Update Department"
  2. Return-Path: fl@tihw0035.totalit.dk
We have Sender Auth turned on for CapitalOne. This is the SPF Framework thing that has been talked about. CapitalOne has the DNS records for it. It turns out the border appliances are applying the SPF policies to 'totalit.dk' and not 'capitalone.com'. This, in my opinion, is a bug.

OES2: virtualization

I have the beta up and running. I have a pair of OES2-NW servers running in Xen on SLES10SP1. And it loads just spiffy. Haven't done any performance testing on it, kind of hard to really interpret results at this point anyway.

What I HAVE been spending time on is seeing if it is possible to get a cluster set up. Clusters, of course, rely on shared storage. And if it works the way I need it to work, I need multiple Xen machines talking to the same LUNs. It may be doable, but I'm having a hard time figuring it out. The documentation on Xen isn't what you'd call complete. Novell has some in the SLES10SP1 documentation, but the stuff in the OES2 documentation is... decidedly overview-oriented. This is the most annoying thing, as I can't just put my nose to a manual and find it.

So, looking for Xen manual. It has to be around somewhere. Google-foo failed me today.

What's been keeping me up late

Unfortunately, I haven't had time to crack open the OES2 beta. I'd really like to, but no time for that.

Instead what's been getting my attention are dsrepair.log entries like this:

ERROR: Inconsistent: Transitive Vector on partitionID: 00008204, DN: OU=[not that one].O=wwu.T=WWU
=>> Purging: 6 invalid entries from partition: OU=[not that one].O=wwu.T=WWU
=>> on server: CN=HERA
(1)Time stamp: 6-27-2006 3:12:46 pm; rep # = 0004; event = 0001
(2)Time stamp: 7-08-2004 9:00:57 am; rep # = 0005; event = 0001
(3)Time stamp: 5-09-2055 8:31:14 pm; rep # = 13F7; event = 0000
(4)Time stamp: 10-09-2002 1:37:16 pm; rep # = 0006; event = 0000
(5)Time stamp: 2-24-2070 7:43:44 am; rep # = 0070; event = 0034
(6)Time stamp: 8-25-1999 4:03:01 pm; rep # = 0007; event = 0000

=>> Updated: Transitive Vector on partition: OU=[not that one].O=wwu.T=WWU,
=>> with: 3 entries for server: CN=HERA
(1)Time stamp: 8-16-2007 12:02:19 pm; rep # = 0001; event = 0004
(2)Time stamp: 9-10-2007 9:56:11 pm; rep # = 0002; event = 0028
(3)Time stamp: 9-10-2007 10:00:03 pm; rep # = 0003; event = 0003

Those weren't alone. In fact, all of our partitions had some invalid transitive vectors. I get to go on another stomp fest tonight. Maybe this time they'll clear.

OES2 public beta is out

| 1 Comment
Jason Williams said so.

This looks to be Beta5. They released both the Linux and NetWare parts of it. The NW65SP7 overlay iso is 1.1GB in size. I sooooooooooooooooooooo gotta get DVD drives into my servers.

Rumor has it release is now mid-October. So who knows what's going on with the 'launch' on the 26th.

This thing on?

If so, the SAN update worked.

Update: Yep, it worked!

The mystery of the OES2 release date

| 1 Comment
Various sources have pointed at evidence that Novell will be launching OES2 on the 26th. As has been pointed out, "Launch" and "Release" are different things. And yet, and the same time rumor has it to "watch for events this Monday".

I don't know what to make of that.

It COULD be that the open beta will be out Monday. I have doubts about that, as that leaves very little time for reports to come back from the field for incorporation into OES2-release, presumably on or about the 26th.

It COULD be that it'll be released Monday, and the major PR push for launch will be two and a half weeks later. I have my doubts about that, Novell will be scooped by the likes of me as we put the new product to the test, but it could happen.

It COULD be that Monday is a red herring and Novell will announce a ship date on the 26th, and the opening of the beta. I put more stock into this possibility. The likes of me will swoop up the beta code, run it through its paces and send feedback about what we manage to break, for a presumed ship of OES in November or so.

Or it could be none of these. I guess we'll find out Monday or something.

Expanding the EVA

Our EVA3000 is full. All shelves have disks in them. In order to add space we need to replace our existing 143GB drives with 300GB drives. This is a rather expensive way to gain more space, as that extra 157GB of space costs the same as 300GB of space. But, that's what we have to do.

And wow does it take a while.

First I have to ungroup the disk. This can take up to two days. Then I pull the drive, and put the new one in. And regroup on top of it, which takes another up to two days. All the group/ungroup operations are competing for I/O with regular production.

Total time to add 157GB to the SAN? Looks to be 3 days and change.

We need a newer EVA.

The RIAA and us

Ars Technica had another article out lately about the RIAA and Universities. Ars posits that Universities are just like ISPs like Comcast, Qwest, or more locally CSS Communications. To a certain extent that is true, we hold very little central control over our users.

However, there is one key difference between us and the likes of CSS. We're a closed-access ISP. In order to have an account with us, you have to be a user of specific status with WWU. The rules are long and complex and buried in Banner, but the short version is that in order to have an internet connection with us you have to be staff, student, or faculty. How does that impact the DMCS 'safe harbor' provisions? I don't know. I do know that K-20, our upstream provider, doesn't get RIAA take-down notices.

What if WWU and/or ResTek went 'open access', in that anyone who forked over the $39.99/mo could get an internet account with us? Would that impact who got the 'pre-settlement letters'? Way back in the day, Universities were the only ISP's in a lot of areas so there is some history here.

What if WWU separated the telecom/network function into a fully 'self supporting' entity, where WWU was the soul customer? Would the Telecom org get the take-down notices, or would WWU? Or would the "John Doe at 140.160.129.43 @ 11:43am, September 21st, 2009, you are going to get sued" letters still come?

Hard to say. I don't think we'll become an open access ISP, as there are some security concerns there that need to be address. Right now our WLAN/LAN interface isn't quite robust enough for that sort of access from Joe Public. Also, our Telecom section is a 'self supporting' entity already, and they also field RIAA notices. ResTek has been an independent agency the whole time I've been here, and they do their own RIAA/MPAA notice handling.