It is time for a few short tips on FAST Search Server 2010 for SharePoint and FAST Search Server 2010 for Internet Sites.

Tip One: Included with FAST Search Server 2010 for SharePoint is the Advanced Filter Pack. This Pack enables text and metadata extraction from hundreds of file formats, augmenting the document formats that are supported by Microsoft’s Filter Pack. This feature is not enabled by default during the installation of FAST Search for SharePoint. It can be enabled easily using Windows PowerShell, and the included script, i.e. “AdvancedFilterPack.ps1”

clip_image002

With the Advanced Filter Pack enabled, content and meta-data from the additional conversions are now available for mapping into your managed properties.

To read more about Filter Packs go to Microsoft Office 2010 Filter Packs download and overview. The Advanced Filter Packs are described here.

Tip Two: Do you really want to enable “Stopword Removal” on Queries?

When the subject of “Stopword Removal (SWR)” is broached with my colleague, Leonardo Souza, you just know he has a well-honed option. He smiles and then says: “Just say No”. By default, STR is disabled on FAST Search Sever 2010 for SharePoint and FAST Search Server 2010 for Internet Sites. By way of reference, stopwords are words that occur very frequently and are thus assumed to not carry much meaning. Traditionally in the Information Retrieval field, they are removed from the users query during query processing. The reason for not using SWR is:

Removing terms from the user query can bring a lot of undesirable effects. Just consider the queries below and what they look like after common stopword removal practices (removing articles, prepositions, pronouns, etc.):

The Who -> <nothing>

Somewhere over the rainbow -> <rainbow>

Are they the same queries? Do they still match what the user want? I would bet not.

And this is why FS4SP by default will not remove any terms from the user query, but instead will use relevancy to define how important a term is. So, if the term happens to be very frequent (the common definition of stopwords), then that term may not have any ranking associated with it, but it will NOT be removed from the list of query terms, which as you can see in the examples above, is a very good thing.”

It is hard to argue with his logic. Elimination of stopwords might make it impossible to match the query to the documents, which contain the phrase entered.

We look forward to your comments. If you have a technical tip you would like to share with your colleagues, send it to phelsel@microsoft.com to be included in our next Technical Tips blog.

By Phillip E. Helsel