Recently in ediscovery Category

There is something that not many people seem to realize about how your personal email can get sucked into a lawsuit filed against your company. It all comes down to ediscovery...

A rare post in which I talk about my day-job.

We're doing a few things that are either really freaking cool or head-scratchy depending on your point of view. We've kicked a Software-as-a-Service product out the door that's aimed at the business market, specifically the legal bits of the business market.

Not supporting all IE versions.

A B2B product that doesn't support all IE versions? We're nuts.

And yet... there are enough law-firms, corporate legal departments, and civil-service law entities out there that don't have Mandatory IE policies that we're finding quite a lot of business. Yes, it does prevent some potential clients who are otherwise enthusiastic about the product from being able to use it. But we feel strongly enough about not having to support older IE versions (anything IE8 and older) that we're willing to let those sales go elsewhere.

We feel that those entities that do have flexible browser policies gain a significiant competetive advantage by using us, so we're OK with letting the inflexible slide. They'll get the clue eventually. We'll be there for them.

Embracing collaboration between entities, not just within the entity.

The nice thing about a SaaS product like we've built is that it allows people across the world to use it without having to have their own local install. This is a surprisingly revolutionary thing in this market because this market is filled to the brim with ultra-conservaties when it comes to data handling. Collaboration is the product of careful negotiation between entities over exactly what kind of data must be shared, what meta-data about the data will also be shared, and who will have access to it.

HIstorically the workflow for this has been:

  1. Negotiate.
  2. Build an export of the data and meta-data using industry standard(-ish) formats.
  3. Export data (possibly to disk).
  4. FedEx / FTP data to the other entity.
  5. Import the data into the system.
  6. Dicker over incompatibilities (those industry standards are only standards-ish).
  7. Repeat steps 2-6 until it imports right.

Total run-time, 1-5 days.

Our product can support this old-school workflow (up to step 3, same as the old days, and we're working on step 5), but we're really pushing for a new model based on online collaboration.

  1. Negotiate.
  2. Build and tag a dataset for sharing.
  3. Add a user to the Project with access to just that restricted set.
  4. Communicate with the other entity and get them logged in.
  5. They start working.

You can do that in a day. And if you can skip the first step, such as with an Expert Witness, it goes even faster.

The people we've shown this to have been blown away, even though this workflow is present in other markets. I believe I mentioned our market is deeply conservative? Yeah. We're working on changing that.

30 days to pay your bill or you get locked out

Anyone who has ever done business with certain types of big business or government knows they're big fans of the "pay you eventually" model of finance. You'll get your check, but only after a long wait (60-180 days) or whenever they feel like paying. The SaaS movement has been doing a lot to break this since so many such services operate on the pay-in-30-or-else model.

[Which we ran into ourselves. One of the company credit-cards expired in April and we didn't do a good enough job of tracking what all was using it for auto-pay. A whole bunch of services locked us out at the end of May.]

A client of ours who could be a major customer is learning this right now. They're a pay-in-3-6-months kind of shop and figured that bit of the usage agreement didn't apply to them. They're learning about or-else right now.

This pay-eventually model is rife in the non-SaaS market, which we've been in for many years. We know what these entities will pull because they've pulled it on us before (one memorable now-fired client went a year and a half between payments). Going SaaS and embracing the SaaS payment model allows us to lever sanity into our finances.

Yes, it'll lose us clients. But...

Not all clients are worth having.

This is something all small businesses figure out, and we're embracing it.

  • Technologically backward clients aren't worth the trouble to backport our stuff to.
  • Clients who are assholes about money aren't worth our time to bother with.
  • Clients who are endless fonts of special needs aren't worth the trouble (though they may be a good source of feature-requests).
  • The SaaS market we're in is wide open, there is always another client.

Yes, we may only be able to 'reach' 60% of our potential market (that number is made up), but those that do work with us will help bootstrap the industry into the modern era. Especially after the network-effects of collaboration kick in.

What's involved in eDiscovery?

Having spent last week at Lisa12 I ended up having to describe what my company does to people who ask what I do. Few people outside of the Legal industry know what all is involved. The part of the industry where Sysadmins live are only familiar with one stage of it, if they're familiar with any of it at all: collections.

If you drop "ediscovery flowchart" into your search-engine of choice you'll get a wide selection of graphics. To save load, I'll give it in outline form:

  1. Records Management
  2. Identification
  3. Preservation
  4. Collection
  5. Processing
    • Early Case Assessment
    • Review
  6. Production
  7. Presentation

Now to go into a bit more detail.

The cloud will happen

Like many olde tyme sysadmins, I look at 'cloud' and shake my head. It's just virtualization the way we've always been doing it, but with yet another abstraction layer on top to automate deploying certain kinds of instances really fast.

However... it's still new to a lot of entities. The concept of an outsourced virtualization plant is very new. For entities that use compliance audits for certain kinds of vendors it is most definitely causing something of a quandary. How much data-assurance do you mandate for such suppliers? What kind of 3rd party audits do you mandate they pass? Lots of questions.

Over on 3 Geeks and a Law Blog, they recently covered this dynamic in a post titled The Inevitable Cloud as it relates to the legal field. In many ways, the Law field shares information handling requirements similar to the Health-Care field, though we don't have HIPPA. We handle highly sensitive information, and who had access to what, when, and what they did with it can be extremely relevant details (it's called spoliation). Because of this, certain firms are very reluctant to go for cloud solutions.

Some of their concerns:

  • Who at the outsourcer has access to the data?
  • What controls exist to document what such people did with the data?
  • What guarantees are in place to ensure that any modification is both detectable and auditable?

For an entity like Amazon AWS (a.k.a. Faceless Megacorp) the answer to the first may not be answerable without lots of NDAs being signed. The answers to the second may not even be given by Amazon unless the contract is really big. The answers to the third? How about this nice third-party audit report we have...

The pet disaster for such compliance officers is a user with elevated access deciding to get curious and exploiting a maintenance-only access method to directly access data files or network streams. The ability of an entity to respond to such fears to satisfaction means they can win some big contracts.

However, the costs of such systems are rather high; and as the 3 Geeks point out, not all revenue is profit-making. Firms that insist on end-to-end transport-mode IPSec and universally encrypted local storage all with end-user-only key storage are going to find fewer and fewer entities willing to play ball. A compromise will be made.

However, at the other end of the spectrum you have the 3 person law offices of the world and there are a lot more of them out there. These are offices who don't have enough people to bother with a Compliance Officer. They may very well be using dropbox to share files with each other (though possibly TrueCrypted), and are practically guaranteed to be using outsourced email of some kind. These are the firms that are going into the cloud first, pretty much by default. The rest of the market will follow along, though at a remove of some years.

Exciting times.

We're hiring!

My employer is looking to hire a Software Engineer. (The position is closed) Interesting details:

  • 0-5 years experience (more is better, but we totally hire eager people just out of college)
  • Technologies that will make us sit up and take notice if they're in your history:
    • Ruby on Rails
    • ElasticSearch
    • Dynamic web-site engineering
  • PC, Mac, whichever platform works for you (so long as git works on it). Go for it.
  • Full self+1 health coverage

And you don't even have to move to the Washington, DC area! We do remote development too.

A couple more things to help decide if this is the job for you:

  • We have a major product in "coming soon!" status, so the pace of development is really kicking up.
  • We're in startup-mode again, but we have a profitable existing product. No worrying about the burn-rate!
  • We don't use the following words in our job-announcements:
  • We have beer in the office. Heck, we even have a keg.
  • Our ping-pong table is well used.

And finally, as we're a startup with a new "coming soon!" product looming, faster is better when it comes to applying. As is your start-date if hired.

If you do apply, drop a comment here. Comments are screened so I'll know it happened even if you don't see it. Candidates with internal advocates tend to fare better. That whole "networking" thingy.

Kind of an obvious statement, but an object lesson has been provided. In the linked case, the discovery request included the phrase "unallocated space" and included keywords with general meanings.

The result? An unmanageable wad of data only scant slivers of which were 'responsive', and would cost well over a million dollars to find it.

"Unallocated space" is what gives me the shivers. That would require sorting through the empty spots of partitions looking for complete or partial files and producing those files and fragments. I know WWU wasn't equipped for that kind of discovery request, and we'd be knock-kneed about how to handle "unallocated blocks" on the SAN arrays themselves. It would suck a lot.

And in this case it cost quite a bit to produce in the first place.

But this also shows the other side of the discovery request, the expense of sorting through the discovered data. My current employer does just that; pare down discovered data into the responsive parts (or make the responsive parts much easier to find during manual review). And yes, it costs a lot.

Pricing is somewhat complicated, but the dominant model is based on price-per-GB with modifiers for what exactly you want done with the data. OCR costs extra. Transformation into various industry-common formats costs extra. That kind of thing. The price has been dropping a lot lately, but it's still quite common to find prices over $200/GB, and very recently prices were hovering around $1,000/GB.

Many sysadmins I know pride themselves in their ability to phrase search queries into Google to get what they're looking for. It doesn't take long to locate exactly what we're looking for, or some hint on where to look next.

Lawyers have to get the search query right on the first try. Laziness (being overly broad) costs everyone.

The data we work with

For a good run-down of the type of data we most commonly work with, there is a very nice write-up over here on DiscoveryBrain.Those are the top twelve file-types we run into, and six of the twelve are Microsoft-specific file-types.

There is a long tail of other file-types we work with, which is where we get into how we're competitive versus other companies. We won a major contract a while back because we could natively handle Lotus Notes archives, rather than converting them to PST before processing like some other vendors. Things like that.

Processing all of those MS-Office files can be tricky to do with pure open-source tools. OpenOffice is very good at a lot of things, but there are some corner cases (or in some instances corner offices) where it doesn't yield very good results. So we may process with actual MS-Office, which in turn means we need Windows around.

Once in a great while we'll run into some Mac-specific formats. We can handle those too, though we don't do so with Macs.

We've even run into some Unix-specific formats. But the OSS support for those is rather strong, so those are pretty well handled.

But still. The vast majority of our processing is those twelve formats.