How can I reduce the amount of disk space that is used by a search collection without, of course, reducing the number of documents being crawled?
Here are a few of the big ones
- disable cached content types (this means no html previw)
- don't store the text in the index (this means no dynamic summaries - but everything is still indexed)
- leverage the light crawler - or at least a subset of the settings (this will reduce the size of the crawler database but has some tradeoffs you need to be aware of)
A full merge will remove deleted documents.
A new crawl will reduce the size of a crawl log that was bloated by refreshes.
Misconfiguring distributed indexing can cause updates to be saved for later (written to disk) if they can't be sent to a client.