SQL Full-text

Andrew Cencini is blogging on SQLJunkies. In this post he talks about new articles on MSDN on full-text searches.

I'd be interested in hearing more about custom document filters and using image columns to store documents. Are there filters out there for fairly standard document types, like XML?

It would be cool to write a doc filter for an XML document, where you can specify the XSD that the documents must conform to, and perhaps even a transformation to run before passing over the content to the FT indexer.

Also, he talks about specifying a language attribute on full-text columns so that the word-breaker can be language specific and know about inflected forms (like stemming). Is it possible to have a different language per row perhaps? Or do this in a filter? (ie the filter determines the language before passing off to the indexer). These days when we are supporting multiple languages in a database, it would seem that this capbility should be there somehow, or coming in Yukon.

Mike

1 Comment

  • Answers to many of the questions above should come in the forthcoming Yukon Full-Text Search paper due out in a few weeks.



    There are filters out there (third party and Microsoft) for other document types --> a number of XML filters, a PDF filter by Adobe, Corel has a WordPerfect filter apparently, and there are also supposedly filters for compressed files and lesser-known file types.



    A few people have ventured to write sample filters in C# which I think is great. I've done similar work, but more in reverse -- putting together code to get data _out_ of IFilters, Wordbreakers and Stemmers using C# (for testing purposes). I'll be posting that shortly.



    There are some more exciting multi-lingual capabilities coming in Yukon but I'd like to let the paper describe them :)



    Thanks,

    --andrew

Comments have been disabled for this content.