November 2010 Archives

User-factors again

Today I ran into a post by a developer working on software for Diaspora, yet another social media aggregator, which famously opened up to private alpha testing last week. A friend of mine managed to get on it, I haven't been as lucky. This developer was announcing that he was stopping development for his Diaspora app. The last straw?

Well, for the Diaspora profile, gender is a free-text field not a pick-list. In his own words. He's getting some flak for this position, so it's possible that link won't always work. I saved a copy of the page for posterity if it comes to it.

His main argument against a free-text gender field boils down to two main points:
  • Gender is a linguistic construct.
  • If you're writing something that will present the user with pronouns (he/she) a free-text field is useless.
Looking at it from the technical side, he's right. If the gender field is optional, or contains something other than "male/female", you can't programmatically determine which pronoun-set to use in text you present to the user or their friends. If the gender field is optional, or contains something like "other", the programmer is stuck with developing neuter text or picking a default gender. In English at least, neuter text sounds decidedly less personal then gendered text, which can be a problem for someone attempting to develop a brand.

As for gender being a linguistic construct, yes it is. However, that's but a subset of the overall concept of gender. On a social media site the Gender field on a profile is less about whether that person is a him or her and more about how they identify themselves. For 80-90% of social media users the idea that they could be something other than male/female has never crossed their minds, but then there are those edge cases.

The link Avery had in his post, which is where he learned about the free-text gender thing, goes into some of the edge cases.

As much as not being able to programatically determine which pronoun set to use annoys developers, having a text field for gender is a nice user-factors feature. If they really wanted to be dev-friendly, a second field for pronoun preference could be presented during the initial profile build:

Preferred Pronoun (for applications, will not show on profile):
For languages that aren't English that may have multiple pronoun markers, this is also useful (maybe there are different pronouns for different age clades, or caste, or whatnot. Not a linguist, it shows). For English it allows someone to state their gender as 'Yes' and pick the neuter option, while allowing someone else to put 'Lady Gaga' as their gender and get presented with masculine pronouns in linked apps. This works better by FAR if it is guaranteed to not be displayed on the profile itself. This is not a new concept.

The whole user-factors vs. automation fight has been going on as long as there has been paperwork with check-boxes. But still, this is one thing we can definitely improve on as an industry.

Larcenous information leakage

Or, losing company data through laptop theft. ServerFault had an interesting question on this topic crop up the other day. Most of the answers were focused on private industry, but this is a topic that affects us governmental/educational types as well. In different ways, of course.

Unlike a private business that has business methods and data that are intellectual property, us governmental types have to live with variations on the Freedom of Information Act. Here in Washington State, it's called a Public Records Request. Either way, it is entirely probable that a correctly worded PRR would be able to retrieve any source-code we have. There are some regulations that limit what we can let out, such as FERPA (Family Educational Rights and Privacy Act), but mere business process is open for citizen review.

Because of FERPA, we're quite paranoid about student data. That kind of information doesn't tend to wander on laptops, but we still don't want to get listed. We have policies about this.

That said, while our budget realities mean that very few people have work-supplied laptops, a lot of private laptops do end up in the office. These are laptops that generally do not connect to the wired Ethernet, they connect via the same wireless networks all of our students use. They can't get directly at our Banner data there, but they can get at pretty much everything else.

I believe I've mentioned before now that Higher Ed networks do not look like Corporate networks.

  • We do not have 'whole disk encryption' policies though those might be coming.
  • We're currently updating our email policies to make even more clear that University business conducted in private email (ahem, gmail) is still subject to Public Records Requests and archiving requirements.
  • For a while our use of Blackberries exploded, but the iPhone/Android revolution is rapidly reducing that. However, the number of people reading work-email over these devices has only gone up (see also, revised email policy).
  • Due to internal politics, policies restricting the use of USB-drive blocking GPOs and other technologies is exceedingly hard to put into place. The same holds true for blocking access to off-campus WebMail and social media sites.
In short, it's hard to keep our data from wandering.

There is a very good reason why our Security Audits are interesting reading. We're a kind of unholy cross between an ISP network and a corporate network.

LIO-Target on OpenSUSE 11.3

I mentioned I was playing with it, but now I have it working. So I'm sharing! Yay!

LIO-Target is one of several iSCSI modules available for Linux. As of the 2.6.38 kernel it'll be baked in. They even have a handy feature-comparison chart to explain why. For those with Microsoft or ESX environments, LIO-Target supports SCSI-3 persistent reservation, which is needed for clustering in both environments. It is nifty.

Disclaimer: There are some steps in this guide that I'm not going to give command-by-command guides to. If you don't know how to do that step you shouldn't be doing this at all. I know its unfriendly, but not being able to do that means you don't really know what you're doing and this kind of thing really isn't for you.

Anyway, this is how it works for 11.3, probably 11.2, and maybe not 11.4. 11.4 is still baking at the time of this post, and I'm reasonably certain that the LIO-Target stuff won't be mainline in time for feature freeze, but hey, won't know until it ships. It'll almost definitely be there for the next OpenSUSE version, be it 11.5 or 12.0.

Until such time as it becomes baked into OpenSUSE getting LIO-Target into it will require compiling custom kernel modules and hand editing certain key config files. Apparently there are some advanced UI tools available from Rising Tide systems, called 'rtsadmin', but I have not evaluated them.

In case you don't give a fig for this, I'm putting the guide under the fold.

Getting dirty with iscsi

I'm working on a low-cost storage solution again. This is the same thing I was working on earlier this year, but the budget demons  have eaten the proposal that would have required this thing to be replicated on another array, so I can actually move on it. Since my last round of software evals was some months ago, I'm taking another look at things. And really, it's different.

The criteria I'm dealing with right now:

  1. Most not cost anything more.
  2. It would be really, really nice to support SCSI 3 Persistent Reservation, as systems that require that are where most of my storage demand is these days.
  3. Since the Windows iSCSI initiator doesn't auto-reconnect when the connection fails, unlike linux, the iSCSI target software must not require a service restart to make config changes.
This limits things.

Also, if point number 3 above can be configured away some how, I haven't found it yet. Though I'd be happy (really happy) if I were wrong. Do let me know if you know differently.

OpenFiler, my previous best-bet, uses the Linux IET iSCSI system. Which unfortunately requires a restart to work. Therefore, I can't use it. The alternative is to shim in the newer LIO Target system onto OpenFiler, but if I'm going to do that I may as well use something with a newer kernel (like OpenSUSE) to get at the newer packages.

LIO-Target has taken me quite some time to crowbar onto OpenSUSE 11.3, but I finally found the right pry points. It states on the box that it does SCSI 3 PR, and I've just proven that it can make config changes without requiring a restart. JOY.

As it happens, LIO-Target will be replacing the current kernel-iscsi system as of 2.6.38. This also means that it is a highly moving target.

Unfortunately, the need for a crowbar means that if I decide to go production with this, the effort needed to, shall we say, keep things current will be all on me. Right now it's requiring a module recompile after every kernel update, which makes it a significant support burden. Also, UI doesn't exist yet, I'll have to create the management scripts from scratch.

One alternative is to wait until OpenSUSE 11.4, which should have a newer kernel. Unfortunately, at this point it looks like that'll be 2.6.37. So if I want to use 2.6.38, I'll have to do the kernel-dance m'self. Grar.

I should probably factor the time I spend dealing with this thing into our cost-per-GB.

The risk of email interception

Anyone who does email knows that it is really easy to intercept in-flight. Unless TLS is in use the messages are transmitted in plain text, and the SMTP protocol is designed around the assumption that untrusted 3rd parties may handle the messages between source and destination (a holdover from the UUCP days as it happens). The appliance and cloud anti-spam industries are designed around this very capability.

But how much of a risk is illicit interception? Or worse, monitoring? Everyone knows you don't send passwords or credit-card information in email, but we also send password reset messages in email. Some web-sites still send your password when asked for a 'reminder', so clearly some reset-system designers consider email secure-enough. Or maybe it's just convenience trumping security again.

To figure out how much of a risk it is we need to know 2 things:
  1. How can email be intercepted?
  2. How likely these methods are to be used?
Interception can be accomplished two ways:
  1. Catch the messages in-flight by way of a sniffer.
  2. Catch the messages in the mail-spools of the mailers handling the message.
There is another vector that is even more damaging, though. Catching the message in the final mailbox. That isn't interception, it's something else, but it really impacts the security of email so I'll be including it. Under the fold.

Creating AI/Life

The quest for artificial intelligence has some parallels with the quest to create new fully artificial life-forms. We've already done the latter with the advent of a fully synthetic genome for bacteria but the former still eludes us. Or does it? For all we know that synthetic genome is the biological equivalent of program that takes input and echoes it to output; 'close', but still not there yet.

With biology we've had a couple of centuries experience reverse engineering how things work and there is still LOTS we don't yet know. With computation we've engineered things from the ground up from base principles, and therefore understand rather well how it works. Human-like intelligence is more than just a computational problem, and more than just figuring out how and why synapses signal each other.

With the reverse engineering effort we've been approaching the task from both sides of the problem. From the top-down functional analysis, to the bottom up biochemical analysis. There is still a lot in the middle that is undiscovered country. We know it is there and we know it produces certain functions which in turn constrains what the middle bits are, but we don't know for certain.

With computing we're getting to the complexity levels where systems can mimic non-determinism. A perfect example of this is understood by anyone who has had to maintain Windows desktops being used by minimally clued users, at some point things will break in a way that boggles the mind. And yet, if you dig deep enough you can determine why it broke in just that way. Managing this complexity is one of the challenges of modern computing environments.

We've gotten to the point where we can program life, if we're given an existing cellular structure to work with. The instruction set is mind bogglingly vast, contains instructions that seemingly do the same thing but have different side effects in certain circumstances, the documentation is being written on the fly by the programmers attempting to use it, and was developed through unintelligent evolutionary processes. If you thought your mind broke when using LISP, that's peanuts to DNA/RNA. And we haven't yet determined if there are biological equivalents to instruction-set-architectures.

With AI we're... working on it. Attempts to create neural networks that mimic the brain's structure and throwing inputs at it in an attempt to educate the dynamic neural network didn't do what we expected. We haven't finished the work of reverse engineering how brains work yet, so making a digital version is still based on guesswork. We have expert systems but they're not independent decision makers yet.

Biology and computational theory will converge at some point, probably, but we've got a long way to go.

Surviving a freak wave

Author and fellow Movable Type user Charles Stross recently survived a massive surge of load on his server. He describes the experience here.He gets regularly Slashdotted and listed on Redit, so surges of load are nothing new. However, what happened to him was an order of magnitude stronger then that. And he has some helpful tips on surviving that kind of freak wave.

The first lesson of which is, Static HTML.

My choice of Movable Type was clinched by the fact that it can use static HTML for its pages and doesn't require a DB hit for every page load the way WordPress does. Such pages can scale f-a-r longer than their dynamic brethren.

The second lesson, no or minimal graphics.

Bandwidth and connection-duration will both be better handled by serving small files. I have one image, and it is 1.18KB in size.

The third lesson, even basic machines can handle a Slashdotting these days.

Less important for me since I'm currently using shared hosting rather than dedicated hosting, but the point is taken.

The fourth lesson, design your site for a single order of magnitude wave, and plan for a 2nd order of magnitude.

This is why I've minimized dynamic content as much as I can. I don't even live-publish comments, that's done every couple of minutes just to save load in the case of a major wave. Since I'm on shared hosting I'll hit maximums well before a dedicated site like Stross' would, so I need to load-shed a lot sooner. As it happens, a slashdotting WOULD be a 2nd order wave for me. If it started happening regularly, I'd have to change hosting.

That said, if I was blogging from the cloud (i.e. Blogger, I wouldn't have to worry about any of this. But then, this is a sysadmin blog.