Index spam

We had a humorus event happen today which underlined a problem some people face with the scourge known as 'index spam'. As anyone who has used Google Desktop or let Mac Spotlight troll through our Shared volume knows, just because someone thinks you should be able to see it doesn't mean you care to know about it. One of the biggest draws of these tools is that it searches YOUR STUFF for things you're looking for, by dragging in a bunch of things that you can see but don't care for... it dilutes the usefullness of search-results in these tools.

We have several volumes that have gobally-readable data in them. Some of them are system directories we need everyone to be able to see, others are folders where the managers for those folders figured that there was no need for privacy here. Whatever the case, the amount of 'everyone can read it' data is not what you'd call trivial.

This is the sort of thing that security managers cringe at. But then, we're a governmental agency that:
  1. Is subject to Freedom of Information Requests
  2. Does not handle classified data
  3. Over the years has had several unflattering stories in the local paper using data obtained by FOIA requests.
So people tend to be a bit blase about data security. Why bother, since the paper will find out about it anyway? That said, we do have some data that is subject to other standards (PCI, HIPPA, etc) which is locked down.

That said, when people look at the results of their index spam and try and 'fix it', things can get... messy. Some applications, and I think Google Desktop is one of them, allow you to set indexing blacklists. Others just assume that if you can see it, you'll need it sometime. But when an end-user doesn't know about those features, and has just had to page through four pages of results before the document they were actually looking for was present, then goes about deleting the files that are 'in the way', the presence of 'write' privs suddenly becomes much more important.