Recently in sysadmin Category

There is more to an on-call rotation than a shared calendar with names on it and an agreement to call whoever is on the calendar if something goes wrong.

People are people, and you have to take that into consideration when setting up a rotation. And that means compromise, setting expectations, and consequences for not meeting them. Here are a few policies every rotation should have somewhere. Preferably easy to get to.

The rotation should be published well in advance, and easy to find.

This seems like an obvious thing, but it needs to be said. People need to know in advance when they're going to be obligated to pay attention to work in their usual time off. This allows them to schedule their lives around the on-call schedule, and you, as the on-call manager, will have to deal with fewer shift-swaps as a result. You're looking to avoid...

Um, I forgot I was on-call next week, and I'm going to be in Peru to hike the Andes. *sheepish look.*

This is less likely to happen if the shift schedule is advertised well in advance. For bonus points, add the shift schedules to their work calendars.

(US) Monday Holiday Law is a thing. Don't do shift swaps on Monday.

If you're doing weekly shifts, it's a good idea to not do your shift swap on Monday. Due to the US Monday Holiday Law there are five weeks in a year (10% of the total!) where your shift change will happen on an official holiday. Two of those are days that almost everyone gets off: Labor Day and Memorial Day.

Whether or not you need to avoid shift swaps on a non-work day depends a lot on how hand-offs work for your organization.

Set shift-handoff expectations.

When one watch-stander is relieved by the next, there needs to be a handoff. For some organizations it could be as simple as making sure the other person is there and responsive before stepping down. For others, it can be complicated as they have more state to transfer. State such as:

  • Ongoing issues being tracked.
  • Hardware replacements due during the next shift period.
  • Maintenance tasks not completed by the outgoing watch-stander.
  • Escalation engineers that won't be available.

And so on. If your organization has state to transfer, be sure you have a policy in place to ensure it is transferred.

Acknowledge time must be defined for alarms.

The maximum time a watch-stander is allowed to wait before ACKing an alarm must be defined by policy, and failure to meet that must be noticed. If the ACK time expires, the alarm should escalate to the next tier of on-call.

This is a very critical policy to define, as it allows watch-standers to predict how much life they can have while on-call. If response-time is 10 minutes, it means that if the watch-stander is driving they have to pull over to ack the alarm. 10 minute ACK likely means they can't do anything long, like go to movies or their kid's band recital.

This also goes for people who are on the escalation schedule. It may be different times, but it should still be defined.

Response time must be defined for alarms.

The time to first response, actually working on the reported problem, must be defined by policy, and failure to meet that must be noticed. This may vary based on the type of alarm received, but it should still be defined for each alarm.

This is another very critical policy to define, and is even more impactful on watch-stander ability to do other things than be at work. The watch-stander pretty much has to stay within N minutes of a reliable Internet connection for their entire watch. If their commute to and from work is longer than N minutes, they can't be on-watch during their commute. And other things.

  • If 10 minutes or less, it is nearly impossible to do anything out of the house. Such as picking up prescriptions, going to the kid's band recital, soccer games, and theatre rehearsals.
  • If 20 minutes or less, quick errands out of the house may be doable, but that's about it.

As with ACK time, this should also be defined for the people on the escalation schedule.

Consequences for escalations must be defined.

People are people, and sometimes we can't get to the phone for some reason. Escalations will happen for both good reasons (ER visits) and bad (slept through it). The severity of a missed alarm varies based on the organization and what got missed, so this will be a highly localized policy. If an alarm is missed, or a pattern of missed alarms develops, there should be defined consequences.

This is the kind of thing that can be brought up in quarterly/annual reviews.


This gets its own section because it's very important:

The right shift length

How long your shifts should be is a function of several variables:

  • ACK time.
  • Response time.
  • Frequency of alarms.
  • Average time-to-resolution (TTR) for the alarms.

People need to sleep, and watch-standers are more effective if they're not sleep-deprived. Chronically sleep-deprived people have poor job satisfaction and are more likely to leave for greener pastures. Here is a table I made showing the combinations of alarm frequency and TTR, and showing on average how many minutes a watch-stander will have between periods of on-demand attention:

TTR
Alarm Freq < 5 min 5 to 10 min 10 to 20 min up to 30 min up to 60 min
15 min 10 5 0 0 0
30 min 25 20 10 0 0
60 min 55 50 40 30 0
90 min 85 80 70 60 30
2 hour 115 110 100 180 60
4 hour 235 230 220 210 180
6 hour 355 350 340 330 300

Given that the average sleep cycle is 45 minutes, any combination that has less than that will be a shift that the watch-stander will not be able to sleep. This has been colored dark orange. If you allow people time to get to sleep, say 20 minutes, that locks out anything at or under 70 minutes (colored light orange). For people who don't go to sleep easily (such as me) even the 80 and 85 minute slack times would be too little. The white cells are shift combinations where sleeping while on-call is something that could happen.

If your alarm frequency and TTR are frequent enough, say in the 30% of attention or larger range, you don't have an on-call rotation you have a distributed NOC and being on watch is a full time job. Don't expect them to do anything else.

If you're in the orange areas, shifts shouldn't be longer than a day. And probably should be shorter.

If you're near the orange areas, you probably should not have a week of that kind of thing so shift lengths should be less than 7 days.

Shift lengths longer than these guidelines risks burnout and creating rage-demons. We have too many rage-demons as it is, so please have a heart.


I believe this policy set provides the groundwork for a well defined on-call rotation. The on-call engineers know what is expected of them, and know the or-else if they don't live up to it.

Ratios

| No Comments

In an effort to better understand the challenges facing the ops team of a particular project here at $DayJob, a project manager asked this question:

How many users per [sysadmin] can our system support?

The poor lead sysadmin on that side of the house swiveled her chair over and said to me, "there is no answer to this question!" And we had a short but spirited discussion about the various ratios to admin staff at the places we've been. Per-user is useless, we agreed. Machine/Instance count per admin? Slightly better. But even then. Between us we compiled a short list of places we've been and places we've read about.

  • Company A: 1000:1 And most of that 1 FTE was parts-monkey to keep the install-base running. The engineer to system ratio was closer to 10K:1. User count: global internet
  • Company B: 200:1 Which was desperately understaffed, as the ops team was frantically trying to keep up with a runaway application and a physical plant that was rotting under the load. User count: most of the US.
  • Company C: 150:1 Which was just right! User count: none, it was a being developed product.
  • Company D: 60:1 And the admin was part-time because there wasn't enough work. User count: 200
  • Company E: 40:1 Largely because 25-30 of those 40 systems were one-offs. It was a busy team. Monocultures are for wimps. User count: 20K.

This chart was used to explain to the project manager in question the "it depends" nature of admin staffing levels, and you can't rely on industry norms to determine the target we should be hitting. Everyone wants to be like Company A. Almost no one gets there.

What are the ratios you've worked with? Let me know @sysadm1138

The sysadmin skills-path.

| No Comments

Tom Limoncelli posted a question today.

What is the modern rite of passage for sysadmins? I want to know.

That's a hard one, but it got me thinking about career-paths and skills development, and how it has changed since I did it. Back when I started, the Internet was just becoming a big source of information. If it wasn't on Usenet, the vendor's web-site might have a posted knowledge-base. You could learn a lot from those. I also learned a lot from other admins I was working with.

One of the big lamentations I hear on ServerFault is that kids these days expect a HOWTO for everything.

Well, they're right. I believe that's because of how friendly bloggers like myself have trained others into finding out how to do stuff. So I posit this progression of skill-set for a budding sysadmin deploying a NewThing.

  1. There is always a checklist if you google hard enough. If that one doesn't work, look for another one.
    • And if that doesn't work, ask a patch of likely experts (or bother the expert in the office) to make one for you. It works sometimes.
    • And if that doesn't work, give up in disgust.
  2. Google for checklists. Find one. Hit a snag. Look for another one. Hit another snag. Titrate between the two to get a good install/config.
    • If that doesn't work, follow the step-1 progression to get a good config. You'll have better luck with the experts this time.
  3. Google for checklists. Find a couple. Analyze them for failure points and look for gotcha-docs. Build a refined procedure to get a good install/config.
  4. Google for checklists. Find a few. Generalize a good install/config procedure out of them and write your own checklist.
    • If it works, blog about it.
  5. Google for checklists. Find a few, and some actual documentation. Make decisions about what settings you need to change and why, based on documentation evidence and other people's experience. Install it.
    • If it works, write it up for the internal wiki.
  6. [Graduation] Plunder support-forums for problem-reports to see where installs have gone wrong. Revise your checklist accordingly.
    • If it works, go to local Meetups to give talks about your deploy experience.

That seems about right. When you get to the point where your first thought about deploying a new thing is, "what can go wrong that I need to know about," you've arrived.

A minimum vacation policy

| No Comments

A, "dude, that's a cool idea," wave has passed through the technology sector in the wake of an article about a minimum vacation policy. This was billed as an evolution of the Unlimited Vacation Policy that is standard at startups these days. The article correctly points out some of the social features of unlimited-vacation-polices that aren't commonly voiced:

  • No one wants to be the person who takes the most vacation.
  • No one wants to take more vacation than others do.
  • Devaluing vacation means people don't actually take them. Instead opting for low-work working days in which they only do 2 hours remotely instead of a normal 10 in the office.

These points mean that people with an unlimited policy end up taking less actual vacation than workplaces with an explicit 15 days a year policy. Some of the social side-effects of a discrete max-vacation policy are not often spelled out, but are:

  • By counting it, you are owed it. If you have a balance when you leave, you're owed the pay for those earned days.
  • By counting it, it has more meaning. When you take a vacation day, you're using a valuable resource and are less likely to cheapen it by checking in at work.
  • There is never any doubt that you can use those days, just on what days you can use them (maintain coverage during the holidays/summer, that kind of thing).

Less stress all around, so long as a reasonable amount is given. To me, this looks like a better policy than unlimited.

But what about minimum-vacation? What's that all about?

The idea seems to be a melding of the best parts of unlimited and max. Employees are required to take a certain number of days off a year, and those days have to be full-disconnect days in which no checking in on work is done. Instead of using scarcity to urge people to take real vacations, it explicitly states you will take these days and you will not do any work on them. For the employer it means you do have to track vacation again, but they're required days, don't create the vacation-cash-out liability that max-vacation policies create, and you only have to track up to the the defined amount. If an employee takes 21 days in a year, you don't care since you stopped tracking one they hit 15.

The social factors here are much healthier than unlimited:

  • Explicit policy is in place saying that vacations are no-work days. People get actual down-time.
  • Explicit policy is in place that N vacation days shall be used, so everyone expects to use at least those days. Which is probably more than they'd use with an unlimited policy.
  • Creates the expectation that when people are on vacation, they're unreachable. Which improves cross-training and disaster resilience.

I still maintain that a max-vacation policy working in-hand with a liberal comp-time policy is best for workers, but I can't have everything. I like min-vacation a lot better than unlimited-vacation. I'm glad to see it begin to take hold.

"Over the next few decades demand in the top layer of the labor market may well centre on individuals with high abstract reasoning, creative, and interpersonal skills that are beyond most workers, including graduates."
-Economist, vol413/num8907, Oct 4, 2014, "Special Report: The Third Great Wave. Productivity: Technology isn't Working"

The rest of the Special Report lays a convincing argument that people who have automation-creation as part of their primary job duties are in for quite a bit of growth and that people in industries subject to automation are going to have a hard time of it. This has a direct impact to sysadminly career direction.

In the past decate Systems Administration has been moving away from mechanics who deploy hardware, install software and fix problems and towards Engineers who are able to build automation for provisioning new computing instances, installing application frameworks, and know how to troubleshoot problems with all of that. In many ways we're a specialized niche of Software Engineering now, and that means we can ride the rocket with them. If you want to continue to have a good job in the new industrial revolution, keep plugging along and don't become the dragon in the datacenter people don't talk to.

Abstract Reasoning

Being able to comprehend how a complex system works is a prime example of abstract reasoning. Systems Administration is more than just knowing the arcana of init, grub, or WMI; we need to know how systems interact with each other. This is a skill that has been a pre-requisite for Senior Sysadmins for several decades now, so isn't new. It's already on our skill-path. This is where System Engineers make their names, and sometimes become Systems Architects.

Creative

This has been less on our skill-path, but is definitely something we've been focusing on in the past decade or so. Building large automation systems, even with frameworks such as Puppet or Chef, takes a fair amount of both abstract reasoning and creativity. If you're good at this, you've got 'creative' down.

This has impacts for the lower rungs of the sysadmin skill-ladder. Brand new sysadmins are going to be doing less racking-and-stacking and more parsing and patching ruby or ruby-like DSLs.

Interpersonal Skills

This is where sysadmins tend to fall down. A lot of us got into this gig because we didn't have to talk to people who weren't other sysadmins. Technology made sense, people didn't.

This skill is more a reflection of the service-oriented economy, and sysadmins are only sort of that, but our role in product creation and maintenance is ever more social these days. If you're one of two sysadmin-types in a company with 15 software engineers, you're going to have to learn how to have a good relationship with software engineers. In olden days, only very senior sysadmins had to have the Speaker to Management skill, now even mid-levels need to be able to speak coherently to technical and non-technical management.

It is no coincidence that many of the tutorials at conferences like LISA are aimed at building business and social skills in sysadmins. It's worth your time to attend them, since your career advancement depends on it.


Yes, we're well positioned to do well in the new economy. We just have to make a few changes we've known about for a while now.

And if there isn't a stipend...

| No Comments

Sysadmin-types, we kind of have to have a phone. It's what the monitoring system makes vibrate when our attention is needed, and we also tend to be "always on-call", even if it's tier 4 emergency last resort on-call. But sometimes we're the kind of on-call where we have to pay attention any time an alert comes in, regardless of hour, and that's when things get real.

So what if you're in that kind of job, or applying for one, and it turns out that your employer doesn't provide a cell phone and doesn't provide reimbursement. Some Bring Your Own Device policies are written this way. Or maybe your employer moves to a BYOD policy and the company paid telecoms are going away.

Can they do that?

Yes they can, but.

As with all labor laws, the rules vary based on where you are in the world. However, in August 2014 (a month and a half ago!) Schwann's Home Services, Inc lost an appeal in California Appellate court. This is important because California contains Silicon Valley and what happens there tends to percolate out to the rest of the tech industry. This ruling held that employees who do company business on personal phones are entitled to reimbursement.

The ruling didn't provide a legal framework for how much reimbursement is required, just that some is.

This thing is so new that the ripples haven't been felt everywhere yet. No-reimbursement policies are not legal, that much is clear, but beyond that, not much is. For non-California based companies such as those in tech hot-spots like Seattle, New York, or the DC area this is merely a warning that the legal basis for such no-reimbursement policies is not firm. As the California-based companies revise policies in light of this ruling, accepted-practice in the tech field will shift without legal action elsewhere.

My legal google-fu is too weak to tell if this thing can be appealed to the state Supreme Court, though it looks like it might have already toured through there.

Until then...

I strongly recommend against using your personal phone for both work and private. Having two phones, even phones you pay for, provides an affirmative separation between your work identity subject to corporate policies and liability, and your private identity. This is more expensive than just getting an unlimited voice/text plan with lots of data and dual-homing, but you face fewer risks to yourself that way. No-reimbursement BYOD policies are unfair to tech-workers the way that employers that require a uniform to be worn who don't provide a uniform allowance are unfair; for some of us, that phone is essential to our ability to do our jobs and should be expensed to the employer. Laws and precedent always take a while to catch up to business reality, and BYOD is getting caught up.

When it comes to things to send alarming emails about, CPU, RAM, Swap, and Disk are the four everyone thinks of. If something seems slow, check one or all of those four to see if it really is slow. This sets up a causal chain...

It was slow, and CPU was high. Therefore, when CPU is high it is slow. QED.

We will now alarm on high CPU.

It may be true in that one case, but high CPU is not always a sign of bad. In fact, high CPU is a perfectly normal occurrence in some systems.

  1. Render farms are supposed to run that high all the time.
  2. Build servers are supposed to be running that high a lot of the time.
  3. Databases chewing on long-running queries.
  4. Big-data analytics that can run for hours.
  5. QE systems grinding on builds.
  6. Test-environment systems being ground on by QE.

Of course, not all CPU checks are created equal. Percent-CPU is one thing, Load Average is another. If Percent-CPU is 100% and your load-average matches the number of cores in the system, you're probably fine. If Percent-CPU is 100% and your load-average is 6x the number of cores in the system, you're probably not fine. If your monitoring system only grabs Percent-CPU, you won't be able to tell what kind of 100% event it is.

As a generic, apply-it-to-everything alarm, High-CPU is a really poor thing to pick. It's easy to monitor, which is why it gets selected for alarming. But, don't do that.

Cases where a High-CPU alarm won't actually tell you that something is going wrong:

  • Everything in the previous list.
  • If your app is single-threaded, the actual high-CPU event for that app on a multi-core system is going to be WELL below 100%. It may even be as low as 12.5%.
  • If it's a single web-server in a load-balanced pool of them, it won't be a BOTHER HUMANS RIGHT NOW event.
  • During routine patching. It should be snoozed on a maintenance window anyway, but sometimes it doesn't happen.
  • Initializing a big application. Some things normally chew lots of CPU when spinning up for the first time.

CPU/Load Average is something you probably should monitor, since there is value in retroactive analysis and aggregate analysis. Analyzing CPU trends can tell you it's time to buy more hardware, or turn up the max-instances value in your auto-scaling group. These are all the kinds of thing you look at in retrospective, they're not things that you want waking you up at 2:38am.

Only turn on CPU alarms if you know that is an error condition worthy of waking up a human. Turning it on for everything just in case is a great way to train yourself out of ignoring high-CPU alarms, which means you'll miss the ones you actually care about. Human factors, they're part of everything.

Redundancy in the Cloud

| No Comments

Strange as it might be to contemplate, but imagine what would happen if AWS went into receivership and was shut down to liquidate assets? What would that mean for your infrastructure? Project? Or even startup?

It would be pretty bad.

Startups have been deploying preferentially on AWS or other Cloud services for some time now, in part due to venture-capitalist push to not have physical infrastructure to liquidate should the startup go *pop* and to scale fast should a much desired rocket-launch happen. If AWS shut down fully for, say, a week, the impact to pretty much everything would be tremendous.

Or what if it was Azure? Fully debilitating for those that are on it, but the wide impacts would be less.

Cloud vendors are big things. In the old physical days we used to deal with the all-our-eggs-in-one-basket problem by putting eggs in multiple places. If you're on AWS, Amazon is very big about making sure you deploy across multiple Availability Zones and helping you become multi-region in the process if that's important to you. See? More than one basket for your eggs. I have to presume Azure and the others are similar, since I haven't used them.

Do you put your product on multiple cloud-vendors as your more-than-one-basket approach?

It isn't as easy as it was with datacenters, that's for sure.

This approach can work if you treat the Cloud vendors as nothing but Virtualization and block-storage vendors. The multiple-datacenter approach worked in large part because colos sell only a few things that impact the technology (power, space, network connectivity, physical access controls), though pricing and policies may differ wildly. Cloud vendors are not like that, they differentiate in areas that are technically relevant.

Do you deploy your own MySQL servers, or do you use RDS?
Do you deploy your now MongoDB servers, or do you use DynamoDB?
Do you deploy your own CDN, or do you use CloudFront?
Do you deploy your own Redis group, or do you use SQS?
Do you deploy your own Chef, or do you use OpsWorks?

The deeper down the hole of Managed Services you dive, and Amazon is very invested in pushing people to use them, the harder it is to take your toys and go elsewhere. Or run your toys on multiple Cloud infrastructures. Azure and the other vendors are building up their own managed service offerings because AWS is successfully differentiating from everyone else by having the widest offering. The end-game here is to have enough managed services offerings that virtual private servers don't need to be used at all.

Deploying your product on multiple cloud vendors requires either eschewing managed-services entirely, or accepting greater management overhead due to very significant differences in how certain parts of your stack are managed. Cloud vendors are very much Infrastructure-as-Code, and deploying on both AWS and Azure is like deploying the same application in Java and .NET; it takes a lot of work, the dialect differences can be insurmountable, and the expertise required means different people are going to be working on each environment which creates organizational challenges. Deploying on multiple cloud-vendors is far harder than deploying in multiple physical datacenters, and this is very much intentional.

It can be done, it just takes drive.

  • New features will be deployed on one infrastructure before the others, and the others will follow on as the integration teams figure out how to port it.
  • Some features may only ever live on one infrastructure as they're not deemed important enough to go to all of the effort to port to another infrastructure. Even if policy says everything must be multi-infrastructure, because that's how people work.
  • The extra overhead of running in multiple infrastructures is guaranteed to become a target during cost-cutting drives.

The ChannelRegister article's assertion that AWS is now in "too big to fail" territory, and thus requiring governmental support to prevent wide-spread industry collapse, is a reasonable assertion. It just plain costs too much to plan for that kind of disaster in corporate disaster-response planning.

A taxonomy of IT users

| No Comments

Over the years I've seen a small collection of fake-names crop up in the sysadmin space. Here is a list:

BOFH
An oldie, but a Sysadmin who has gone over to the dark-side.

Fred
Originally coined by Laura Chappell, Fred is the User From Hell. Or, The Power User who Isn't. Fred knows everything, or rather, thinks they do. They're wrong, but don't know it, and it makes your life all too interesting. Fred may be a manager, a peer, or a frequent-flier in the ticket queue.

Leeroy
Originally from a famous Warcraft video, this is the peer who just deploys stuff because it's cool. They... haven't learned (the hard way) how this can go wrong, so aren't naturally suspicious. This could be the rose-colored glasses of youth and exuberance, or it could be a trusting nature. They'll learn.

Brent
Coined by The Phoenix Project, Brent is the person that ends up with their hands in everything one way or the other. They may be a single-point-of-knowledge, the only person who knows anything about topic X, or just the person that gets handed the weird stuff because, well, "Brent probably knows". A lot of us are a Brent, and it sure as heck makes getting long vacations approved difficult. There may be more than one of them, depending on topics.


I used to be a Leeroy, then I learned better.

I've been a Brent (oddball-stuff troubleshooter variety) at my current and last three jobs.

Right now people have figured out that I know how to use Wireshark to discover oddball problems, so I'm having to do a lot of packet analysis lately to rule out oddball problems. This isn't something I can cross-train on very well, but I'm going to have to find a way; people's eyes tend to glaze over when you get into TCP RFCs and it's easier to make me do it and not have to learn for themselves.

I did this at a previous job, so here are a few tips for what will make it easier on everyone.

Plan at least 4, preferably 6, weeks out.

This gives your watch-standers the chance to arrange their lives around the schedule. At 6 weeks out, they'll be rearranging their lives around the schedule, rather than rearranging the watch-schedule around their lives. As the one managing the schedule, this means you'll be doing fewer weekend-swaps and people are more likely to just know who is on call.

Send out calendar events for the rotations.

This is more of a weekend-watch thing, but putting calender events in their calendar will further cement that they're obligated for that period. Also, it's a nice reminder in email (or whatever) that their shift has been scheduled.

Have the call list posted somewhere mobile-friendly.

Many times, the watch-stander is merely the first responder; it's their job to figure out what domain the problem sits in (app/database/storage/hypervisor/facilities/etc) and call the person who can actually fix the thingy. Having the call-list easily accessible from mobile is a really nice thing to have. This can be a Google Doc, or an app like PagerDuty. An Excel spreadsheet on Sharepoint is not so much.

Have the duties of the watch-stander clearly defined.

This seems obvious, but... it isn't. There are some questions you need answers to, otherwise you're going to experience sadness:

  • How fast must they respond to automated alerts?
  • Do they need to always answer the phone, or is voice-mail acceptable so long as the response is within a window?
  • How fast must they respond to emails?

The answers to these questions tell the watch-stander how much of a life they can fit in around the schedule. A movie is probably Right Out, but nipping out to the grocery store for a few things... maybe. Do they need to turn on bluetooth while driving, or can they wait until they stop (or just not drive at all)? How much 'response' can happen on a phone will greatly affect the quality-of-life questions.

What kind of sadness can you expect?

Missed alerts mostly. Without clearly defined response guidelines your watch-standers are going to sleep through their phones, miss emails, and otherwise fail to meet performance expectations. If you write those expectations down, they're far more likely to stick to them!

If you're doing automatic alert assignment, have an escalation policy.

You need a backstop in case the watch-stander sleeps through something. The backstop tier should never get called, but when they do it's an Event. An event people try to avoid, because something failed. Knowing that someone will notice if an automated alert gets ignored makes people more likely to respond in time.

If you're doing a 7-day watch, swap shifts on something other than Monday or Friday.

Depends on locality, but for US locations, Monday Holiday Law means that Mondays are occasionally vacation days and you don't want to do a watch-swap on a non-business day. In the same vein, many organizations have a rule in place stating that if the observed holiday in the case of an exempt holiday (New Years for example) lands on a Saturday is to have the day off be Friday.

At the same time, there is a US holiday that camps on top of Thursday (see next item). Tuesday or Wednesday are good choices.

If you're doing a weekend watch, have a policy in place for handling long weekends.

The 4 day Thanksgiving Holiday in the US is a great example, as that duty schedule is double what a normal one would be. Decide if you're going to create two shifts for it or allow one person to cover the whole thing, and decide well in advance. For some organizations the Friday after Thanksgiving is a major production day so this may be moot ;).

Other Blogs

My Other Stuff

Monthly Archives