Recently in ediscovery Category

What's involved in eDiscovery?

| No Comments
Having spent last week at Lisa12 I ended up having to describe what my company does to people who ask what I do. Few people outside of the Legal industry know what all is involved. The part of the industry where Sysadmins live are only familiar with one stage of it, if they're familiar with any of it at all: collections.

If you drop "ediscovery flowchart" into your search-engine of choice you'll get a wide selection of graphics. To save load, I'll give it in outline form:

  1. Records Management
  2. Identification
  3. Preservation
  4. Collection
  5. Processing
    • Early Case Assessment
    • Review
  6. Production
  7. Presentation

Now to go into a bit more detail.

The cloud will happen

| No Comments
Like many olde tyme sysadmins, I look at 'cloud' and shake my head. It's just virtualization the way we've always been doing it, but with yet another abstraction layer on top to automate deploying certain kinds of instances really fast.

However... it's still new to a lot of entities. The concept of an outsourced virtualization plant is very new. For entities that use compliance audits for certain kinds of vendors it is most definitely causing something of a quandary. How much data-assurance do you mandate for such suppliers? What kind of 3rd party audits do you mandate they pass? Lots of questions.

Over on 3 Geeks and a Law Blog, they recently covered this dynamic in a post titled The Inevitable Cloud as it relates to the legal field. In many ways, the Law field shares information handling requirements similar to the Health-Care field, though we don't have HIPPA. We handle highly sensitive information, and who had access to what, when, and what they did with it can be extremely relevant details (it's called spoliation). Because of this, certain firms are very reluctant to go for cloud solutions.

Some of their concerns:

  • Who at the outsourcer has access to the data?
  • What controls exist to document what such people did with the data?
  • What guarantees are in place to ensure that any modification is both detectable and auditable?

For an entity like Amazon AWS (a.k.a. Faceless Megacorp) the answer to the first may not be answerable without lots of NDAs being signed. The answers to the second may not even be given by Amazon unless the contract is really big. The answers to the third? How about this nice third-party audit report we have...

The pet disaster for such compliance officers is a user with elevated access deciding to get curious and exploiting a maintenance-only access method to directly access data files or network streams. The ability of an entity to respond to such fears to satisfaction means they can win some big contracts.

However, the costs of such systems are rather high; and as the 3 Geeks point out, not all revenue is profit-making. Firms that insist on end-to-end transport-mode IPSec and universally encrypted local storage all with end-user-only key storage are going to find fewer and fewer entities willing to play ball. A compromise will be made.




However, at the other end of the spectrum you have the 3 person law offices of the world and there are a lot more of them out there. These are offices who don't have enough people to bother with a Compliance Officer. They may very well be using dropbox to share files with each other (though possibly TrueCrypted), and are practically guaranteed to be using outsourced email of some kind. These are the firms that are going into the cloud first, pretty much by default. The rest of the market will follow along, though at a remove of some years.

Exciting times.

We're hiring!

| No Comments
My employer is looking to hire a Software Engineer. (The position is closed) Interesting details:

  • 0-5 years experience (more is better, but we totally hire eager people just out of college)
  • Technologies that will make us sit up and take notice if they're in your history:
    • Ruby on Rails
    • ElasticSearch
    • Dynamic web-site engineering
  • PC, Mac, whichever platform works for you (so long as git works on it). Go for it.
  • Full self+1 health coverage

And you don't even have to move to the Washington, DC area! We do remote development too.

A couple more things to help decide if this is the job for you:

  • We have a major product in "coming soon!" status, so the pace of development is really kicking up.
  • We're in startup-mode again, but we have a profitable existing product. No worrying about the burn-rate!
  • We don't use the following words in our job-announcements:
  • We have beer in the office. Heck, we even have a keg.
  • Our ping-pong table is well used.

And finally, as we're a startup with a new "coming soon!" product looming, faster is better when it comes to applying. As is your start-date if hired.

If you do apply, drop a comment here. Comments are screened so I'll know it happened even if you don't see it. Candidates with internal advocates tend to fare better. That whole "networking" thingy.


Kind of an obvious statement, but an object lesson has been provided. In the linked case, the discovery request included the phrase "unallocated space" and included keywords with general meanings.

The result? An unmanageable wad of data only scant slivers of which were 'responsive', and would cost well over a million dollars to find it.

"Unallocated space" is what gives me the shivers. That would require sorting through the empty spots of partitions looking for complete or partial files and producing those files and fragments. I know WWU wasn't equipped for that kind of discovery request, and we'd be knock-kneed about how to handle "unallocated blocks" on the SAN arrays themselves. It would suck a lot.

And in this case it cost quite a bit to produce in the first place.

But this also shows the other side of the discovery request, the expense of sorting through the discovered data. My current employer does just that; pare down discovered data into the responsive parts (or make the responsive parts much easier to find during manual review). And yes, it costs a lot.

Pricing is somewhat complicated, but the dominant model is based on price-per-GB with modifiers for what exactly you want done with the data. OCR costs extra. Transformation into various industry-common formats costs extra. That kind of thing. The price has been dropping a lot lately, but it's still quite common to find prices over $200/GB, and very recently prices were hovering around $1,000/GB.

Many sysadmins I know pride themselves in their ability to phrase search queries into Google to get what they're looking for. It doesn't take long to locate exactly what we're looking for, or some hint on where to look next.

Lawyers have to get the search query right on the first try. Laziness (being overly broad) costs everyone.

The data we work with

| No Comments
For a good run-down of the type of data we most commonly work with, there is a very nice write-up over here on DiscoveryBrain.Those are the top twelve file-types we run into, and six of the twelve are Microsoft-specific file-types.

There is a long tail of other file-types we work with, which is where we get into how we're competitive versus other companies. We won a major contract a while back because we could natively handle Lotus Notes archives, rather than converting them to PST before processing like some other vendors. Things like that.

Processing all of those MS-Office files can be tricky to do with pure open-source tools. OpenOffice is very good at a lot of things, but there are some corner cases (or in some instances corner offices) where it doesn't yield very good results. So we may process with actual MS-Office, which in turn means we need Windows around.

Once in a great while we'll run into some Mac-specific formats. We can handle those too, though we don't do so with Macs.

We've even run into some Unix-specific formats. But the OSS support for those is rather strong, so those are pretty well handled.

But still. The vast majority of our processing is those twelve formats.

Other Blogs

My Other Stuff

Monthly Archives