The data we work with

For a good run-down of the type of data we most commonly work with, there is a very nice write-up over here on DiscoveryBrain.Those are the top twelve file-types we run into, and six of the twelve are Microsoft-specific file-types.

There is a long tail of other file-types we work with, which is where we get into how we're competitive versus other companies. We won a major contract a while back because we could natively handle Lotus Notes archives, rather than converting them to PST before processing like some other vendors. Things like that.

Processing all of those MS-Office files can be tricky to do with pure open-source tools. OpenOffice is very good at a lot of things, but there are some corner cases (or in some instances corner offices) where it doesn't yield very good results. So we may process with actual MS-Office, which in turn means we need Windows around.

Once in a great while we'll run into some Mac-specific formats. We can handle those too, though we don't do so with Macs.

We've even run into some Unix-specific formats. But the OSS support for those is rather strong, so those are pretty well handled.

But still. The vast majority of our processing is those twelve formats.