The MSA1500cs we've had for a while has shown a bad habit. It is visible when you connect a serial cable to the management port on the MSA1000 controller, and doing a "show perf" after starting performance tracking. The line in question is "Avg Command Latency:", which is a measure of how long it takes to execute an I/O operation. Under normal circumstances this metric stays between 5-30ms. When things go bad, I've seen it as far as 270ms.
This is a problem with our cluster nodes. Our cluster nodes can seen LUNs on both the MSA1500cs and the EVA3000. The EVA is where the cluster has been housed since it started, and the MSA has taken up two low-I/O-volume volumes to make space on the EVA.
IF the MSA is in the high Avg Command Latency state, and
IF a cluster node is doing a large Write to the MSA (such as a DVD ISO image, or B2D operation),
THEN "Concurrent Disk Requests" in Monitor go north of 1000
This is a dangerous state. If this particular cluster node is housing some higher trafficked volumes, such as FacShare:, the laggy I/O is competing with regular (fast) I/O to the EVA. If this sort of mostly-Read I/O is concurrent with the above heavy Write situation it can cause the cluster node to not write to the Cluster Partition on time and trigger a poison-pill from the Split Brain Detector. In short, the storage heart-beat to the EVA (where the Cluster Partition lives) gets starved out in the face of all the writes to the laggy MSA.
Users definitely noticed when the cluster node was in such a heavy usage state. Writes and Reads took a loooong time on the LUNs hosted on the fast EVA. Our help-desk recorded several "unable to map drive" calls when the nodes were in that state, simply because a drive-mapping involves I/O and the server was too busy to do it in the scant seconds it normally does.
This is sub-optimal. This also doesn't seem to happen on Windows, but I'm not sure of that.
This is something that a very new feature in the Linux kernel could help out, that that's to introduce the concept of 'priority I/O' to the storage stack. I/O with a high priority, such as cluster heart-beats, gets serviced faster than I/O of regular priority. That could prevent SBD abends. Unfortunately, as the NetWare kernel is no longer under development and just under Maintenance, this is not likely to be ported to NetWare.
I/O starvation. This shouldn't happen, but neither should 270ms I/O command latencies.
As some of you know, the UK has passed a law which authorizes jail time for people who refuse to turn over encryption keys. If I'm remembering right, 2-3 years. This is a bill that's been making the rounds for quite some time, and got passage as a terror bill. Nefarious elements have figured out that modern encryption technologies really can flummox even the US National Security Agency deep crack mainframes and they therefore use it. There was a reason that encryption technologies were classified a munition and therefore export-restricted.
Those of you who've been with Novell/NetWare long enough remember this. Back in the day the NICI and other PKI components came in three flavors, Domestic (strong, 128bit), International (weak, 40bit? 56bit?), and basic (none). Things have loosened up since then.
Part of the problem of encryption is that while the private keys may be strong, securing them is tricky. When the feds raid your house and grab every device capable of both digital computation and communication to throw into the evidence locker, their computer forensics people can get your private keys. However, if your private keys are further locked away, such as PGP, it won't do them much good. To gain access to your key-ring they'll need the pass-phrase.
That's where the new law in the UK comes in. Police have two options to figure out your pass-phrase. They can intercept it somehow, or they can point a jail term at your head and demand the the pass-phrase.
That doesn't work in the US thanks to the Bill of Rights, and the 5th Amendment. This is the amendment that states that you have a right to not self incriminate, and by extension this means that police can't force you to divulge information that could be detrimental to you. As it happens, the people who wrote this amendment had the English legal system in mind when they came up with the idea, what with us being an ex-colony and all that. So if you performed safe encryption handling, didn't write the pass phrase anywhere and made a point of making sure it never hit disk in the clear, the US Government can't penalize you for not telling them the pass phrase. A US law similar to the UK law would face a much harder judicial battle than it got in the UK.
Which isn't the case in the UK. As one crypto expert I spoke with once said, the UK law amounts to, "rubber-hose cryptography." Which is an allusion to the fact that a sufficient application of pain (i.e. torture) can cause someone to fork over their own encryption keys, which is a concern in certain totalitarian regimes.
The accepted response to 'rubber-hose' crypto methods is to use a 'duress key'. This key will either destroy the crypted data, or reveal harmless data (40GB of soft porn!). The problem with such a key is that it works best if such a key is not known to exist. Forensics analysis can show what kind of crypto is in use, and if that particular type supports the use of a duress key, the interrogators can work that into their own information extraction methods. Also, any forensics person worth their salt works on a COPY of the data (as the RIAA knows all too freaking well, digital data is very easy to duplicate), so having the duress key destroy the data isn't a loss. In a judicial framework, having the key given destroy the (copy of the) data can earn the person a, "hampering a lawful investigation," charge and even more jail time.
All that said and done, there are still PLENTY of ways for the US Government to gain access to pass-phrases. I've heard of at least one case where a key-logger was installed on a machine for the express purpose of intercepting the key-strokes of the pass-phrase. If the pass phrase exists in the physical realm in any way (outside of your head), they can execute search warrants on it. Some crypto programs don't handle pass-phrase handling well. Also, if you have a Word document that was crypted, then decrypted so you could view it, the temp files Word saves ever 5-10 minutes are in-the-clear and recoverable through sufficient disk analysis. The end-user needs to know about safe handling of in-the-clear data.
All of which is expensive work. If the Government can save several thousand dollars in tech time by simply asking you the pass phrase and throwing you in the clink if you don't give it, that's what they'll do. If the person under investigation is known to be very crypto savvy (uses a Linix machine, with an encrypted file-system that requires a hand-entered password to even load, and uses PGP or similar on top of that to defend against attacks when the file-system is mounted) it becomes WAY cheaper to go the Judicial route than the tech route.
Yeah, 2-3 years may be much better than the 20-life you'd face on a terrorism charge. But you'd be in custody the whole time, and they'll be spending that 2-3 years going over your encrypted data the hard way. And if actual actionable evidence surfaces to support a terrorism charge, you can bet your bippy that you'd be hauled into a court-house for a new trial, only this time facing 20-life. If you're in the UK. Here in the US they'll just keep you under surveillance until they get the pass phrase or enough other evidence to hold you down in custody and give them an excuse to throw everything you've ever touched into evidence lock-up.
At home I've been noticing some persistent connections have been getting resets. A couple of times now I'll be VPNed into work here, and the connection will drop. Other times I've noticed telnet connections to weird ports will get reset sporadically. What's going on?
At home I'm on that network that's gotten some grief about discriminating against BitTorrent users, which I won't name here but you probably know.
Calling in to their Customer Support was pointless as they wanted me to go through fault isolation methods to see where the problems was. My router, their cable-modem, or what? Right, then.
As I no longer have a working 10Mb hub, I can't just drop a laptop in the unprotected segment between the cable-modem and my router and do some sniffing. So I have to get creative. I remembered yesterday that the new desktop gaming system has two ethernet ports on the back. Ahah. A bit of googling brought me to the 'brctl' command in Linux for creating ethernet bridges.
This is exactly what I wanted. Turn the (w-a-y more powerful than this function needs) gaming machine into a simple ethernet bridge, just so I can sniff traffic. I downloaded the latest Knoppix DVD ISO in the hopes that it'd have ethernet drivers for my motherboard. You see, this is a gaming PC that I built for Windows gaming. I did not build it for anything resembling Linux compatibility, so I had real fears that the LAN ports wouldn't be supported. Happily, Knoppix had a module for my ethernet ports and away we go.
ifconfig eth1 0.0.0.0
ifconfig eth2 0.0.0.0
brctl addbr whitehat
brctl addif eth1
brctl addif eth2
ifconfig whitehat up
ifconfig eth1 up
ifconfig eth2 up
In my case, eth0 is the Firewire "lan" port that seems to be on every new machine these days. Once the bridge is up, I can run Wireshark on it with a ring-buffer. Once I see a spurious connection reset, I can stop the sniff and see what exactly happened to the connection. I didn't get any resets last night when I was monitoring, but I may tonight. We'll see where things are going. Did see some RSTs come in, but it wasn't clear if that was normal or not, as it was almost always on HTTP traffic. This machine has 2GB of RAM in it, so the Knoppix RAMDisk is quite large. I don't have to worry about having my ring-buffers starved for space and having the reset fall off the back of the buffer.
If I can prove that the RSTs are coming from the ISP end of the connection and not my router I can go to customer service and tell them so. They'll try and tell me that since the RSTs are coming from the internet IP that the server there must be issuing it. Then I'll tell them that I have multiple internet IPs showing the exact same behavior, and all this started around the same time, and really, I find the possibility that all three (or so) servers got updated to exactly the same buggy TCP stack at the same time to be much less likely than this particular ISP's traffic shaper catching my traffic as collateral damage.
They'll just shrug and say, "oh well," and that'll be that. It won't get fixed. But my call will be logged! My own minuscule vote will be in their tracking system by golly. Maybe it'll be the straw that causes them to tweak their shaper to be less aggressive.