July 2008 Archives

If you've been getting core files generated by ndsd on your Linux servers, and want to call Novell Support about it, there are a few things you can do to maximize what Novell will get out of the files themselves. You may not get much, but these will help the people with the debug symbols figure out what's going on.

Packaging the Core


First and foremost, you already have the tool to package core files for delivery to Novell already on your system. TID3078409 describes the details of how to use 'novell-getcore.sh'. It is included on 8.7.3.x installations as well as 8.8.x installations.

Running it looks like this:
edirsrv1:~ # novell-getcore -b /var/opt/novell/eDirectory/data/dib/core.31448 /opt/novell/eDirectory/sbin/ndsd
Novell GetCore Utility 1.1.34 [Linux]
Copyright (C) 2007 Novell, Inc. All rights reserved.


[*] User specified binary that generated core: /opt/novell/eDirectory/sbin/ndsd
[*] Processing '/var/opt/novell/eDirectory/data/dib/core.31448' with GDB...
[*] PreProcessing GDB output...
[*] Parsing GDB output...
[*] Core file /var/opt/novell/eDirectory/data/dib/core.31448 is a valid Linux core
[*] Core generated by: /opt/novell/eDirectory/sbin/ndsd
[*] Obtaining names of shared libraries listed in core...
[*] Counting number of shared libraries listed in core...
[*] Total number of shared libraries listed in core: 72
[*] Corefile bundle: core_20080725_092227_linux_ndsd_edirsrv1
[*] Generating GDBINIT commands to open core remotely...
[*] Generating ./opencore.sh...
[*] Gathering package info...
[*] Creating core_20080725_092227_linux_ndsd_edirsrv1.tar...
[*] GZipping ./core_20080725_092227_linux_ndsd_edirsrv1.tar...
[*] Done. Corefile bundle is ./core_20080725_092227_linux_ndsd_edirsrv1.tar.gz


Once you have the packaged core, you can upload it to ftp.novell.com/incoming as part of your service-request.

Including More Data


If you're lucky enough to be able to cause the core file to drop on demand, or it just plain happens often enough that repetition isn't a problem, there is one more thing you can do to include better data in the core you ship to Novell. TID3113982 describes a setting you can add to the ndsd launch script (/etc/init.d/ndsd) that'll include more data. The TID describes what is being done pretty well. In essence, you're using an alternate malloc call that fails with better information than the normal one. You don't want to run with this set for very long, especially in busy environments, as it impacts performance. But if you have a repeatable core, the information it can provide is better than a 'naked' core. Setting MALLOC_CHECK_=2 is my recommendation.

Be sure to unset this once you're done troubleshooting. As I said, it can impact performance of your eDirectory server.

Overthrowing Blackboard

| 2 Comments
The most recent Western Front, our Student newspaper, ran an article about looking for a replacement for Blackboard. You can read the article online here.

And now, a warning.

This is my personal opinion, it in no way reflects the official view of this department or any WWU entity.

There.

The Computer Science department is apparently evaluating Moodle as a possible Blackboard killer. I personally cheer this research, since I don't particularly like Blackboard. Plus, it could shave a few months off of any replacement project that may come of this. That said, there are a few bits of the article that need some amplification or clarification.
Western’s computer science department is in the process of testing a new e-learning software this summer quarter that could potentially replace Blackboard sometime next year.
Eh, not really. And the reason for this is actually alluded to in the next paragraph:
“I think students tend to find [Blackboard] more frustrating especially when it goes down and they have something due for a class and cannot access it,” said David Bover, chair of the computer science department.
If Blackboard goes down for even a single day, mayhem ensues on campus. Blackboard is, for lack of any other way to put it, critical path for us. If it goes down, the learning function of this University is significantly negatively impacted. Any instabilities are noticed, as Dr. Bover pointed out. This is a system that has to be rock stable, and always there when you need it. We have very few systems in that class of service, and SCT Banner (our ERP solution) is one of the others.

What this means is that any replacement for Blackboard has to be at least as stable as Blackboard, and provably so. It needs to come with Enterprise level support with a rapid response option, something I'm not sure Moodle has. It also needs to support the load we throw at it, and provide at least Blackboard-equivalent functionality for the exact same or less in resource costs.

Our Blackboard infrastructure right now includes 7 physical servers (only three of which are VM candidates) and 2 network load-balancers. Also involved are a large number of people on the back end to handle the Banner integration stuff that happens behind the scenes to do things like create new courses, manage enrollments in courses, and maintain the user accounts inside Blackboard. This second group, the Banner integration, is where the second largest engineering challenge will be for any presumed Blackboard-to-Moodle migration project. This second group is also the one that is hardest for the CompSci group to evaluate work-flow for.

What's more, due to various requirements, we need to have the ability for students who challenge grades to have access to course-work and grade-book for the class in question. We need something like 3 years of archive for this, so we will have to be dual-stacked for up to 3 years after migration go-live in order to handle challenges to courses done while still on Blackboard. This archival blackboard install will require us to have software and at least 2 servers to support it.

Moving to Moodle will also require us to be a bit more nimble in responding to user requests. As the article says:
With Moodle, professors will be able to install plugins or create their own to fit specific needs for their course.
This will ultimately require a dedicated Moodle programmer somewhere within ITS. That's a staff position, which means budget. Due to how WWU's accounting works, we can't just take the hardware and software savings from Blackboard and convert it into a new FTE. Whether or not this FTE is a lateral transfer from somewhere within ITS already, is up to the migration project people whoever they may ultimately be.

In short, any Blackboard-to-Moodle project will not be run to completion by Fall 2009 even if CompSci comes up with a Moodle config that reaches feature parity with Blackboard, and looks to be just as stable. The Banner integration alone will require significant engineering on the part of ITS departments, and to be blunt the group who 'owns' Blackboard is seriously short-staffed right now and can't even think of a migration project yet. Load testing and sheer re-education on the part of Blackboard users will take a lot of resources and time all by itself.

The somewhat ironic part of this is that Moodle was brought to my attention a couple of Brainshares ago. At the time we were having serious stability problems with Blackboard, so several of us got kinda wistful-eyed at the thought of giving Blackboard the ole heave-ho. Since then I've heard that Moodle is beginning to eat into Blackboard's installed base, especially in cash-strapped Community Colleges. One can hope.

And in closing, a few hints to any of those CompSci people working on this project:
  • For the love of Richard Stallman, make sure the database behind Moodle is either MSSQL2005 or Oracle. We already run two RDBMS's, we will not be running a third *cough*mysql*cough*. No matter how politely you ask. Asking us to add a brand new RDBMS onto the critical path is simply too much to ask for. There is a slight preference for Oracle since that's what Banner runs in and that eases certain integration tasks.
  • Please answer the question of, "The system is in flaming ruins, and we're all flummoxed. Who do we call?" Because the reply we give to irate Professors wondering why they can't report grades matters a lot, "We're working closely with the vendor right now," sounds way better and more professional than, "We've asked some people in the community who know this stuff better than we do, and hope to have some answers soon." We have way more Professors (and students for that matter) who don't give a wet noodle for the ethics behind their software, just so long as it works right.
  • Please account for a test environment that functions identically to production. Since Blackboard is critical path, we actually have a test environment we do things like work through upgrade problems, validate configs, and troubleshoot errors out of production. This right now is a single box with MSSQL2005 and Blackboard installed on it, compared to Production where the web-servers, content servers, and database server roles are all on different machines. Blackboard allows this, but not all software plays nice like that.
Thank you. And remember, this is just me blowing wind, it doesn't represent anything like an official viewpoint of this university, this department, or this office.

Twitter & https

One thing that twitter has done right is the use of HTTPS. If you go to the secure version of their login page, https://twitter.com/login, your session will stay SSL. Unlike certain other sites that bump you back to http, twitter will keep the whole session, and the all important login cookies, secure end to end. I like this.

On insecure wireless networks, or even secure ones with malicious people legitimately on it such as the type you find at any Security conference, it is possible to side-jack those cookies with simple network tools. And with those credentials, all too many sites allow you to impersonate the person that logged in with those credentials. Some sites, like Livejournal, offer the ability to bind a log in to an IP address but that only works if you're not behind a NAT gateway such as you find at ye-olde-coffee-hut.

By allowing users to keep their entire twitter-web session in SSL, twitter does security right. Yes it is an expensive operation to terminate, especially as user uptake increases. But that they offer it at all is a very good thing.

Patching SLES

Last night I attempted to patch one of our OES2 servers. This particular server is an elderly beast, a P3 1GHz machine. So I wasn't expecting anything like fastness out of it. Especially with rug.

But still, it was painful!
normandy: ~#: rug lu
Waking up ZMD...
[8 minutes later]
[list of one update, libzypp]
normandy: ~#: rug update
Resolving Dependencies....
[8 minutes later]
Install this update? (y/N)
y
[12 minutes later]
Restarting ZMD...
[8 minutes later]
normandy: ~#: rug lu
[list of updates. No need to wait 8 minutes this time.]
normandy: ~#: rug update
Resolving Dependencies...
[8 minutes later]
Dependency resolution failed for bind-util and bind-libs. libdns-whatzihoozit required by bind-util is provided by bind-libs. Please fix you hoser.
[insert swearing here]
normandy: ~#: rug in bind-util bind-libs
Resolving Dependencies....
[8 minutes later]
Install these updates? (y/N)
y
[12 minutes later]
normandy: ~#: exit

As this had taken far longer than even I was expecting, I stopped. I'll finish up tonight. As this is an OES2 server, this means SLES10-SP1. I can attest that SLES10-SP2 on identical hardware is MUCH faster. I can't wait until OES2-SP1 comes out and this dinosaur can get faster patching.

Your spam-checker ate my email

This is a question I get a fair amount. This is understandable, as the spam-checker is the software whose entire job is to eat email. So naturally that's the first place people think to check when mail gets sent but not received.

I also hate dealing with this kind of question. The spam appliances we use have a search feature, which is critical for figuring out if some email is being eaten incorrectly. Unfortunately, the search feature is devilishly slow. I swear, it is grepping hundreds files tens of megabytes in size and post-processing the output. It generally takes 5 minutes to answer a question every time I hit the 'search' button. And just like google, it can take a few tries to phrase my search terms correctly to get what I want.

Right now we have a complaint that all email sent to us by a certain domain never arrives. This is false, as on the day in question we received and passed on to Exchange about 20 messages from the domain. As it happens the Edge server is having a problem with it, and that needs attention. But I had to do about 30 minutes of waiting for search results to really determine this.

An exchange 2007 problem

While I was on vacation we had a few more instances of email going into a black hole. This is not good. I had suspected this was happening, but proof accumulated while I was broiling in the mid-west.

After doing a lot of message tracing in Exch2007, I noticed one trend. When an email to a group hits the Hub server, it attempts to dereference the group into a list of mailboxes to deliver to. It uses Global Catalogs for this function. When the GC used was one in our empty root rather than the child domain that everything lives in, this one group didn't return any people. The tracking code was, "dereferenced, 0 recipients". Which is a fail-by-success.

After a LOT of digging, I threw an LDAP browser at the GC's. What I noticed is that the GC entry for this one group was subtly different on the empty-root GC and the child-domain GC. Specifically, the object had no "member" attributes.

It turns out the problem was that the group in question was set to a Global group, rather than a Universal group. Ahah! Global groups apparently don't publish Member info globally, just in the domain itself. Universal groups are just that, Universal, and publish enterprise wide. Right. Gotcha.

Exch2003 did not manifest this, as it stayed in the domain pretty solidly. I don't know how many of our groups are still Global groups, but this one is going to take some clean-up to fix.