Recently in microsoft Category

Today I'm spending most of it sheepdogging a vendor installing an application. This vendor is VPNing in, and such access is a key part of the product's support contract.

This is something I've noticed recently. Several of the server-based off the shelf apps I've installed lately have had a requirement that the vendor have access to the server in some way. Some of it is so they can do the install. Some of it is so they can update it so we don't have to. Some of it is just in case we ever call for support and need their help.

I have a theory for why this is. I have a sneaking suspicion that its because that's how these vendors support installs in environments where the sysadmin is a desktop person who got handed a server and was asked, "make it work." This kind of vendor-based hand-holding makes the ongoing maintenance of applications lower on the client side of the equation, which can lead to more sales. But, I'm not sure if that's it or not.

This is causing some grumbling in the ranks, since it means untrusted parties have to be allowed to log in to servers in the domain. Before this recent spate of applications, vendors demanding such access had their apps relegated to servers not in the domain at all. This doesn't work when the app requires domain access. Console access to servers is a sensitive thing for us, so we don't like to hand it out on demand to vendors.

Especially when we weren't involved in the purchase process to begin with. Many a time we've been told:

Client: We spent umpty thousand dollars on this ap. Install it.
Us: *reads install document, cringes* They need Administrator access to the whole box and a tunnel into the inner Banner fortress. I don't want to.
Client: What part of umpty thousand dollars don't you understand? Make it work.
Us to Management: Insecure! Violates best practices!
Management to Us: It's too late to get a refund, and upper management was involved in the decision. Make it work.
Us: Wilco.
Or words to that effect.

Ahem.

How are y'all handling this kind of thing, presuming you're also seeing it and it isn't just me getting lucky.
Over half a year ago I mentioned that I was going to write up a script that sucks down Windows security log info and deposits the summarized data into a database. That script has been pretty much done for several months now, and the data is finally getting some attention on campus. And with attention comes...

...feature requests.

It's good to have them, but tricky to deal with. There is a quirk in how these logs are gathered and how Microsoft records the data.
  • Login events record the User and the IP address the request came from. The machine they're logging into is the Domain Controller in this case.
  • Account Lockout events record the machine that performed the lockout. This would be things like their workstation, or the Outlook Web Access servers.
Since we use DHCP on campus, the Machine : IP association is not static. Our desktop support users like having machine name wherever possible, with machine name AND IP being preferred. This will prove tricky to provide. What doesn't help is that a lot of login events come from IPs that trace to our hardware load-balancers, which generally means a login via LDAP by way of our single-sign-on product (CAS). That's not a domained machine, obviously.

There are a couple of ways to solve this problem, but none of them are terribly good.
  • During event-parse, use WMI to query the IP address to find out what it thinks it is. Should work fast for domained machines, but will be horribly slow for undomained machines.
  • Use a reverse IP lookup. Except for the fact that we use BIND for our reverse-DNS records so we'll get a lot of useless dhcp134-bh.bh.wwu.edu style addresses that won't correspond to the 'machine login' events I'm already tracking.
  • Do DB queries to find out which machine logged in with that IP address most recently and use that. But all those lookups will HORRIBLY slow down parsing, even if I keep a lookup table during the parsing run.
Slowing down parsing is not a good thing, since we chew between 70K-300K events every 15 minutes during busy periods. Even small efficiency dings can be horribly magnified in such an environment.

Nope, looks like the best solution here is to make the IP address a clickable link that leads to another query that'll populate with the most recent machines to hold that IP address. If any show up at all.

I wonder what they'll ask for next?
One of the more annoying problems with password and account-lockout policies in Active Directory has been that they apply to every account universally. I you want to force your users to change passwords every 90 days, with account lockout after a certain number of bad login attempts, then the same policies apply to your 'Administrator' user. Account lock-out was a really great way to DoS yourself in really critical ways.

In a way, that's what account-lockout is all about. It's to keep bad people from coming in, but its also a way for bad people from preventing legitimate people from using their own accounts. You need to take the good with the bad.

Since we were a NetWare shot for y-e-a-r-s we're very used to Intruder Lockout (ILO), and losing it during the move to Windows was seen as a loss of a key security feature. We had accounts that had to be exempted from lockout, which was dead easy in eDirectory but very difficult in AD.

Happily, Server 2008 introduces a way to do this. It's called "Fine-Grained Password Policy", and is NOT group-policy based. This was somewhat surprising. Getting this requires raising the domain and forest functional levels to the 2008 level. What it allows is setting password policy based on group memberships, with conflict resolution handled by a priority setting on the policy itself. Interestingly, the actual policies are created through ASDI Edit, so they're not beginner-friendly.

For instance, we can set a 'lock out after 6 tries in 30 minutes' setting to the Domain Users group at a Priority of 30, and a second 'never lock out ever' policy to the Domain Admins group at a Priority of 20. That way 'Administrator' will have the never-lock policy apply to it, but Joe User will have the lock-after-6-in-30 policy apply. This works best if the password policy specifies that Domain Admins need to have very complex and long passwords, which makes a brute-force cracking attempt take unreasonably long amounts of time.

We put this in place a few weeks ago, and it is working as we expected. SO GLAD to have this.

More than OpenFiler

| 7 Comments | No TrackBacks
I've received better requirements than I had before, and OpenFiler by itself doesn't meet them. The requirements are, roughly:
  • Must support both file-based and block-based storage serving.
  • Must have some kind of non-hierarchical backup capability.
  • Able to create a mirror copy of the storage in a remote location.
This distills down to:
  • Must support both iSCSI and SMB serving.
  • Must have snapshots, or some other copy-on-write technology.
  • DRBD or some other replication technology.
Since OpenFiler's SMB integration just doesn't work in our environment, I can't use just that. Also, Samba's annoying habit of requiring a smb-daemon reset to add shares makes it annoying to work with. We can't risk pissing off the Access database users (not to mention PST users) who'd be most peeved when they have to do DB recovery on their files after a reset. Nothing a little change-management can't fix, but our users are already used to instant gratification.

Another option, less free, is to use a combination of Windows Server 2008 and KernSafe iStorage. It has the features we need, and the entire environment is still cheaper per GB than the other storage options we already have.

A second potential is the combination of OpenFiler in pure iSCSI mode and then a Windows Server 2008 instance in the ESX cluster to front-end iSCSI storage for SMB sharing. This has its problems as well, as filers are memory hungry, and we're currently bandwidth-constrained in the ESX cluster right now (this is changing, but we're still a month or two out from fixing that). Once you amortize utilized resources for this ESX-based filer you get a price that's pretty close to the KernSafe/Windows combo if not a bit more expensive.

I'm open to other ideas, but in the mean time KernSafe's free option has enough of the right features that I can at least test the thing.

Read-only databases

| No Comments | No TrackBacks
I've been reading up on Active Directory read-only domain controllers (RODC), new in Server 2008. When I first glanced at them, they looked an awful lot like NDS read-only replicas which have been around since the advent of NetWare 4.0 too many years ago. Novell put r/o replicas into NDS in large part for complete X.500 compliance. However, their real use case was never made clear. The only case I could ever come up with is a kind of disaster-recovery site, where that R/O replica could be promoted to a R/W replica in an emergency. So why was Microsoft finally putting the last X.500 piece in now?

Turns out, it wasn't X.500, it was to solve a somewhat intractable problem with Active Directory domains; the satellite office problem. The Small Business Development Center is a part of the College of Business Education, and actually offices in downtown Bellingham. Before they got a reliable WAN connection to campus, they needed to be able to work when their internet connection was down. What we did was put all of those users into a single OU, made that OU a partition in eDirectory, and gave their NetWare server a copy of that replica. That way, only those security principles were ever at threat, and they could still log in and use resources local to them when their WAN link was down.

The same problem with AD is much trickier to solve, since you can't partition the AD database that way. You really had three options:

  1. Tell the users to live with the outage.
  2. Put a Domain Controller down there.
  3. Declare the site a new Domain in the forest and put Domain Controllers down there.
Putting a DC down there meant that the site would have a full copy of your entire authentication database, which can represent a major security vulnerability if the site lacks any way to truly secure the DC's physical existence. AD Sites allow for more efficient use of WAN resources, but that doesn't change the fact that a full and complete copy of the domain was hosted there.

A Read-Only Domain Controller is NOT a full copy of the domain; it does not contain any passwords by default. Unlike a R/O NDS replica, users can actually authenticate against it; the server proxies the authentication against a normal DC if it can find one. You can set a password-caching policy to tell it which passwords to keep local copies of, so branch-local users can still log in when the WAN is down. That's... not useless You're still having to keep the entire AD database down there complete with GPO SYSVOL goodness and all those groups, but at least if thieves run off with the RODC they'll only fully compromise local users.

It still isn't as robust as how eDirectory handles it, but at least it's a lot better than it used to be. Especially if politics prevent you from being able to declare a new domain.

Centralized IT

| 2 Comments | No TrackBacks
I've had quite a bit of experience with the process of centralizing IT. At my last job I was at ground zero as I was on the committee that was charged with rationalizing an IT job family structure that was grounded in the early 1980's (key clue, the phrase, "electronic data processing" was slathered across many job titles, a phrase not at all in vogue in the 1990's). This particular consolidation event was driven from a directive from on high, above the CIO. So, as it were, it happened in spite of the grumbling.

WWU has gone through some of its own consolidations, but there are natural barriers to complete consolidation in the Higher Ed market. I'll get to those in a bit. The one thing acting as a serious barrier to consolidation in any organization are departments that are large enough to support their own multi-person IT departments. Departments with one or two people effectively doing the full IT stack (stand-alone sysadmins who also do desktop support, database maintenance, to-the-desk network wiring, and maybe a bit of app-dev along the side) are most vulnerable to being consolidated into the central Borg.

Some departments are all too happy to join the central IT infrastructure, as they see it as a way to shed costs onto another business unit. Others are happy because their own IT people are so overworked, the idea of getting them help is seen as a cost-free mercy; or put another way agreeing to consolidation is seen as a cost-free way to increase IT investment. Still others are happy to join because they want some nifty new technology and their stick-in-the-mud IT people keep saying, "no," and view the central Borg as a way to get that thing.

The big reason departments don't want their IT people consolidated away from them is personalized service. These are people who know the business intimately, something those central-office folk don't. The cost of maintaining an independent IT infrastructure is seen as a perfectly valid business investment in operational efficiency. Any centralization initiative will have to deal with this concern.

The other big reason shows up less often, but is very hard to overcome without marching orders delivered from On High: distrust of central IT in specific. If the business unit that contains central IT is seen to be less competent as compared to the local IT people, that business unit will not consent to centralization. If the people in central IT are collectively viewed as a bunch of idiots, or run by idiots, the only way that unit is centralizing is if a metaphorical gun is held to their heads.

My last job handled the all of the above and eventually came to an agreement. First and foremost, it was a fiat from On High that IT centralization would happen. All IT job titles started being paid out of the same budget. We then spent the next four years hammering out the management structure, which meant that for a long time a whole bunch of people had their salary paid by people with 0% influence on their work direction.

Many departments gleefully joined the central infrastructure, driven in large part by their own IT people. They'd been overworked, you see, and the idea of gaining access to a much wider talent pool, and a significantly deeper one as well, was hard to not take advantage of. These were the departments with 1-3 IT people. In almost every case the local IT people stayed in their areas as the local IT contact, which maintained the local knowledge they'd developed over the years.

There was one small department that was a holdout until the very end. An attempt to merge some 5 years earlier had gone horribly wrong, and institutional memory remembered that very clearly. It wasn't until that department got a new director that an agreement was reached. The one IT guy up there stayed up there after the merger and stopped doing server and desktop support in favor of department-specific app-dev work, what he was hired to do in the first place as it happened.

Then the arm-wrestling over the bigger departments took place. For the most part they kept near complete control over their own IT staffs, but their top level IT managers were regularly hauled back to the home IT office for 'management team meetings'. This ended up being a good move, since it reduced the barriers for communication at the very top level, and ultimately lead to some better efficiencies overall; especially in the helpdesk area as staff started to move between stacks after a while. Also, the departments that had been deeply skeptical of this whole centralized IT thing started working with other IT managers and getting their concerns heard, which reduced some of the inherent distrust.

With Higher Ed, there is an additional factor or two that my previous job didn't face. First of all, the historic independence of specific Colleges. Second, Universities are generally a lot less command-and-control than their .com or even .gov brethren. This means that centralization relies far more on direct diplomacy between IT business units than it does on direct commands from on high. Distrust in this environment is much more hard to overcome as coercion is not a readily available option.

Back in the day, WWU had 7 separate NDS trees. 7. That's a lot. Obviously, there wasn't much in the way of cross-departmental access of data. Over the course of around 5 years we consolidated down to a single 'WWU' NDS tree. Some departments happily stopped spending IT time on account maintenance tasks and let central IT do it all. Some departments gave up their servers all together. Time passed and still more areas decided they really didn't need to bother keeping local replicas, and let central IT handle that problem.

In the end, handling IT in Higher Ed means dealing with a more heterogeneous environment than is otherwise cost-effective. I've mentioned before how network management on Higher Ed networks resembles ISPs more than it does corporate networks, and that unfortunately applies to things like server and storage purchases. Now that we're in the process of migrating off of NetWare and onto Windows, it means we're now in the process of wrangling over rules governing Active Directory management.

We wrangled NDS control back in the 90's and early 00's, and now it's Microsoft's turn. As with the last round of NDS wrangling, some departments have gleefully turned over control (GPOs and file-server management specifically) of their department over to us in ITS. Others, specifically one with a large local IT presence, is really holding out for complete control of their area. They're clearly angling to just use us as an authentication provider and they'll do the rest, something that... well... negotiations are ongoing.

My crystal ball says we have somewhere between 5 to 10 years before the next wave of 'directory' upgrade forces another consolidation. That consolidation just might involve consolidating with a State agency of some kind. Perhaps the State will force us to use a directory rooted in the wa.gov DNS domain (wwu.univ.wa.gov perhaps), and our Auth servers will be based in Olympia rather than on our local network. Don't know. What is true, is that we'll be going through this again, probably within the next decade.

Migration headaches

| 2 Comments | No TrackBacks
Today marked the day we cut over the largest volume we have, the Fac/Staff shared volume. That's 1.9TB of 'disorganized file data' (a.k.a. bog standard file-server) to migrate. This is the last of the major volumes to move, and this was done intentionally. Because of this, we have our system down. Unfortunately, a wrench was thrown into the works. But before I get to the wrench, a description of how we migrated this puppy from NetWare.
  1. At M-18 days, we performed an initial sync of the data via robocopy.
  2. At M-16 days, when the first sync completed (it took about 29 hours) we performed a delta-sync.
  3. At M-17 days we performed another delta sync, 24 hours after the previous, so we could get a feel for how long a daily 'copy the changed files' job would take.
  4. M-16 days, create a daily copy-job (robocopy source dest /mir /r:1 /xo /log:e:somewhere)
  5. M-14 days, we perform the rights migration, and open up the new share to everyone with sufficient rights to change permissions on the volume. Inform these people to fix broken rights on the Microsoft share.
  6. M-12 days, after feedback from the techs, release guidance for how to re-organize directories to better work with Microsoft permissions.
  7. M-12 to M-1 day, Technicians reorganize data and repermission as needed, with our assistance.
  8. M-12 hours, we do a delta sync
  9. Migration: Change login scripts, kick off terminal delta-sync to get net-change.
  10. M+2 hours, 8am arrives, script is done, we are done. Yay! Start working problems as reported.
The problem occurred between steps 8 and 9. One department decided that migration-night was the perfect time to reorganize over 150GB of data. They would have struggled to find a worse time for it. The result of this is that the terminal delta-sync in step 9 will end up taking far, far longer than the 2 hours budgeted.

The problem here is that when people start logging in at 8am, all of their data isn't there. There were some people who worked right up until the M-12 hour mark reorganizing data and were surprised when it wasn't on the new system yet. These people were alphabetically below the department that moved 150GB of data last night, so they hadn't been synced yet. So they're seeing and working with old files while the new ones copy in.

The worry for me is PST and MDB files that have a tendency to be open all day. The copy script will not be able to replace these open files, so they will in effect experience data-loss because of this department. There is not much we can do about that. We can troll through the log file for the files listed as failed-to-copy-due-to-lock and hand copy them afterwards, after clearing locks. In which case they'll lose whatever data they committed to these files during the morning. So these files? There WILL be data-loss, guaranteed.

The other problem we ran into is one department set up their rights to lock us godlike admins out of certain directories, something you can do on Microsoft filesystems since there is no equivalent to Novell's "Supervisor" trustee right. We didn't notice this until step 9 when the log-files filled up with 'access denied' errors, and the 30 second retry it causes, which further delayed execution of the terminal sync script. Obviously, those files will not get synced.

I hate hate hate it when this kind of thing happens.

Evolving best-practice

| No Comments | No TrackBacks
As of this morning, everyone's home-directory is now on the Microsoft cluster. The next Herculean task is to sort out the shared volume. And this, this is the point where past-practice runs smack into both best-practice, and common-practice.

You see, since we've been a NetWare shop since, uh, I don't know when, we have certain habits ingrained into our thinking. I've already commented on some of it, but that thinking will haunt us for some time to come.

The first item I've touched on already, and that's how you set permissions at the top of a share/volume. In the Land of NetWare, practically no one has any rights to the very top level of the volume. This runs contrary to both Microsoft and Posix/Unix ways of doing it, since both environments require a user to have at least read rights to that top level for anything to work at all. NetWare got around this problem by creating traverse rights based on rights granted lower down the directory structure. Therefore, giving a right 4 directories deep gave an inplicit 'read' to the top of the volume. Microsoft and Posix both don't do this weirdo 'implicit' thing.

The second item is the fact that Microsoft Windows allows you to declare a share pretty much anywhere, and NetWare was limited to the 'share' being the volume. This changed a bit when Novell introduced CIFS to NetWare, as they introduced the ability to declare a share anywhere; however, NCP networking still required root-of-volume only. At the same time, Novell also allowed the 'map root' to pretend there is a share anywhere but it isn't conceptually the same. The side-effect of being able to declare a share anywhere is that if you're not careful, Windows networks have share-proliferation to a very great extent.

In our case, past-practice has been to restrict who gets access to top-level directories, greatly limit who can create top-level directories, and generally grow more permissive/specific rights-wise the deeper you get in a directory tree. Top level is zilch, first tier of directories is probably read-only, second tier is read/write. Also, we have one (1) shared volume upon which everyone resides for ease of sharing.

Now, common-practice among Microsoft networks is something I'm not that familiar with. What I do know is that shares proliferate, and many, perhaps most, networks have the shares as the logical equivalent of what we use top-level directories for. Where we may have a structure like this, \\cluster-facshare\facshare\HumRes, Microsoft networks tend to develop structures like \\cluster-facshare\humres instead. Microsoft networks rely a lot on browsing to find resources. It is common for people to browse to \\cluster-facshare\ and look at the list of shares to get what they want. We don't do that.

One thing that really gets in the way of this model is Apple OSX. You see, the Samba version on OSX machines can't browse cluster-shares. If we had 'real' servers instead of virtual servers this sort of browse-to-the-resource trick would work. But since we have a non-trivial amount of Macs all over the place, we have to pay attention to the fact that all a Mac sees when they browse to \\cluster-facshare\ is a whole lot of nothing. We're already running into this, and we only have our user-directories migrated so far. We have to train our Mac users to enter the share as well. For this reason, we really need to stick to the top-level-directory model as much as possible, instead of the more commonly encountered MS-model of shares. Maybe a future Mac-Samba version will fix this. But 10.6 hasn't fixed it, so we're stuck for another year or two. Or maybe until Apple shoves Samba 4 into OSX.

Since we're on a fundamentally new architecture, and can't use common-practice, our sense of best-practice is still evolving. We come up with ideas. We're trying them out. Time will tell just how far up our heads are up our butts, since we can't tell from here just yet. So far we're making extensive use of advanced NTFS permissions (those permissions beyond just read, modify, full-control) in order to do what we need to do. Since this is a deviation from how the Windows industry does things, it is pretty easy for someone who is not completely familiar with how we do things to mess things up out of ignorance. We're doing it this way due to past-practice and all those Macs.

In 10 years I'm pretty sure we'll look a lot more like a classic Windows network than we do now. 10 years is long enough for even end-users to change how they think, and is long enough for industry-practice to erode our sense of specialness more into a compliant shape.

In the mean time, as the phone ringing off the hook today foretold, there is a LOT of learning, decision-making, and mind-changing to go through.

Migrating knowledge bases

| 4 Comments | No TrackBacks
This morning we moved the main class volume from NetWare to Windows. We knew we were going to have problems with this since some departments hadn't migrated key groups into AD yet, so the rights-migration script we wrote just plain missed bits. Those have been fixed all morning.

However, it is becoming abundantly clear that we're going to have to retrain a large portion of campus Desktop IT in just what it means to be dealing with Windows networking. We'd thought we'd done a lot of it, but it turns out we were wrong. It doesn't help that some departments had delegated 'access control' rights to professors to set up creative permissioning schemes, this morning the very heated calls were coming in from the professors and not the IT people.

There are two things that are tripping people up. One has been tripping people up on the Exchange side since forever, but the second one is new.
  1. In AD, you have to log out and back in again for new group-memberships to take.
  2. NTFS permissions do not grant the pass-through right that NSS permissions do. So if you grant a group rights to \Science\biology\BIOL1234, members of that group will NOT be able to pass through Science and Biology to get to BIOL1234.
We have a few spots here and there where for one reason or another rights were set at the 2nd level directories instead of the top level dirs. Arrangements like that just won't work in NTFS without busting out the advanced permissions.

An area we haven't had problems yet, but I'm pretty certain we will have some are places where rights are granted and then removed. With NSS that could be done two ways: an Inherited Rights Filter, or a direct trustee grant with no permissions. With NTFS the only way to do that is to block rights inheritance, copy the rights you want, and remove the ones you don't. That sounds simple, but here is the case I'm worried about:

\Humres\JobReview\VPIT\ITS\JohnSmith\

At 'HumRes' the group grp.hr is granted 'read' rights, and the HR director is granted Modify directly to their user (bad practice, I know. But it's real-world).
At 'JobReview' the group grp.hr.jobreclass is granted 'Modify'
At 'VPIT' Inheritance is Blocked and rights copied.
At 'JohnSmith' the HR user AngieSmith is granted the DENY right due to a conflict of interest.

Time passes. The old director retires, the new director comes in. IT Person gets informed that the new director can't see everything even though they have Modify to the entire \Humres tree. That IT person will go to us and ask, "WTH?" and we will reply with, "Inheritance is blocked at that level, you will need to explicitly grant Modify for the new director on that directory."

So this is a bit of a sleeper issue.

Meanwhile, we're dealing with a community of users who know in their bones that granting access to 'JohnSmith' means they can browse down from \HumRes to that directory just on that access-grant alone. Convincing them that it doesn't work that way, and working with them to rearrange directory structures to accommodate that lack will take time. Lots of time.

The things you learn

| 3 Comments | No TrackBacks
We had cause to learn this one the hard way this past week. We didn't know that Windows Server 2008 (64-bit) and Symantec Endpoint Protection just don't mix well. It affected SMBv1 clients, SMBv2 clients (Vista, Win7) were unaffected.

The presentation of it at the packet-level was pretty specific, though. XP clients (and Samba clients) would get to the second step of the connection setup process for mapping a drive and time out.

  1. -> Syn
  2. <- Syn/Ack
  3. -> NBSS, Session Request, to $Server<20> from $Client<00>
  4. <- NBSS, Positive Session Response
  5. -> SMB, Negotiate Protocol Request
  6. <- Ack
  7. [70+ seconds pass]
  8. -> FIN
  9. <- FIN/Ack
Repeat two more times, and 160+ seconds later the client times out. The timeouts between the retries are not consistent so the time it takes varies. Also sometimes the server issues the correct "Protocol Request Reply" packet and the connection continues just fine. There was no sign in any of the SEP logs that it was dropping these connections, and the Windows Firewall was quiet as well.

In the end it took a call to Microsoft. Once we got to the right network person, they knew immediately what the problem was.

ForeFront is now going on those servers. It really should have been on a month ago, but because these cluster nodes were supposed to go live for fall quarter they were fully staged up in August, before we even had the ForeFront clients. We never remembered to replaced SEP with ForeFront.

Other Blogs

My Other Stuff

About this Archive

This page is an archive of entries from June 2010 listed from newest to oldest.

May 2010 is the previous archive.

July 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.