The DBA Blog by Tomer Lev

SQL information, scripts, tricks and more…

Full Text Search Analysis


Full-Text Search Overview

Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase.

Full-text search is applicable in a wide range of business scenarios such as e-businesses—searching for items on a web site; law firms—searching for case histories in a legal-data repository; or human resources departments—matching job descriptions with stored resumes. The basic administrative and development tasks of full-text search are equivalent regardless of business scenarios. However, in a given business scenario, full-text index and queries can be honed to meet business goals. For example, for an e-business maximizing performance might be more important than ranking of results, recall accuracy (how many of the existing matches are actually returned by a full-text query), or supporting multiple languages. For a law firm, returning every possible hit (total recall of information) might be the most important consideration.

Support Forms of Query Terms (Full-Text Search)

http://technet.microsoft.com/en-us/library/cc879300.aspx – this topic introduces the forms of full-text queries supported by SQL Server, and summarizes the support provided for each form of query by the full-text predicates and rowset-valued functions. Full-text queries can be run only on columns that have been full-text indexed.


Full-Text Search Tests

The following topic will describe the tests have been made on full-text search. The tests have been made to English and Hebrew languages.

The following tests have been made:

  1. Inflectional forms of a specific word (generation term) – The inflectional forms are the different tenses of a verb or the singular and plural forms of a noun. For example, search for the inflectional form of the word “drive”. If various rows in the table include the words “drive”, “drives”, “drove”, “driving”, and “driven”, all would be in the result set because each of these can be inflectionally generated from the word drive.
  1. Synonymous forms of a specific word (thesaurus) – A thesaurus defines user-specified synonyms for terms. For example, if an entry, “{car, automobile, truck, van}”, is added to a thesaurus, you can search for the thesaurus form of the word “car”. All rows in the table queried that include the words “automobile”, “truck”, “van”, or “car”, appear in the result set because each of these words belong to the synonym expansion set containing the word “car”.
  1. A word or a phrase where the words begin with specified text (prefix term) – A prefix term refers to a string that is affixed to the front of a word to produce a derivative word or an inflected form.

For a single prefix term, any word starting with the specified term will be part of the result set. For example, the term “auto*” matches “automatic”, “automobile”, and so forth.

For a phrase, each word within the phrase is considered to be a prefix term. For example, the term “auto tran*” matches “automatic transmission” and “automobile transducer”, but it does not match “automatic motor transmission”.

  1. A word or phrase close to another word or phrase (proximity term) – A proximity term indicates words or phrases that are in close proximity to another word or phrase or two words or phrases in any order.

For example, you want to find the rows in which the word “ice” is near the word “hockey” or in which the phrase “ice skating” is near the phrase “ice hockey”. Whether two terms or phrases are considered to be near to each other is calculated internally and is not configurable by the user. Many data points are considered when calculating nearness.

Each of the above will be detailed in the next topics; dividing tests have been made for each one.

Test Number #1: Inflectional forms of a specific word

The test has been made for checking matching between words like Cow-Cows.

The test has been made like the following:

4.1.   An row had been inserted inside AlgoData table using the following procedure:

alter PROCEDURE [dbo].[LoadAlgoData]

@xmlFileName nvarchar(300),

@AlgoDataid bigint,

@Algotype int,

@createuser nvarchar(50)= NULL,

@updateuser nvarchar(50) = null,

@AlgoDataExtension varchar(10)

AS

BEGIN

declare @strsql varchar(8000)

OPENROWSET

set @strsql = ‘INSERT INTO AlgoData(AlgoDataID,AlgoType, createdate, AlgoDataExtension,XmlFilename,AlgoData) ‘

set @strsql = @strsql + ‘SELECT ‘ + cast(@AlgoDataid as varchar) + ‘,’ + cast(@Algotype as varchar) + ‘,getdate(),’ +  ”” + @AlgoDataExtension + ”” + ‘,’ + ”” + @xmlFileName + ”’, xmlData

FROM

(

SELECT  *

FROM    OPENROWSET (BULK ”’ + @xmlFileName + ”’ , SINGLE_BLOB) AS XMLDATA

) AS FileImport (XMLDATA)

print @strsql

exec(@strsql)

END

5.2   The following execution has been invoked:

exec [dbo].[LoadAlgoData] ‘C:\Users\tomer\Downloads\Backup\XMLAQuery3\XMLAQuery11.xml’,11,1,’tomer’,’tomer’,’XML’

  • First test – The XML contined the words ‘cow’ and ‘cows’ in it.
  • Second test – The XML contined the words תפוח and תפוחים in it.
  • The same test had been done on: ‘go’,’went’,’going’ words.

5.3  The next query has been invoked:

SELECT *

FROM AlgoData

WHERE CONTAINS (Algodata, ‘FORMSOF(INFLECTIONAL, “cow”)’);

Result: A record set has been returned, finding the match.

Test Number #2: Synonymous forms of a specific word

Checking verbs match has been done on the following House, Home, and apartment.

This is done using a thesaurus defines user-specified synonyms for terms.

1.1.   An row had been inserted inside AlgoData table using the above  procedure

1.2.   The following execution has been invoked:

exec [dbo].[LoadAlgoData] ‘C:\Users\tomer\Downloads\Backup\XMLAQuery3\XMLAQuery11.xml’,11,1,’tomer’,’tomer’,’XML’

  • First test – The XML contined the words ‘House’ where as the search has been made for ‘Home’.
  • Second test – The XML contined the words דירה in it, where as the search has been made for בית.
  • The dictionary of synonims words can be enlarged and needs to be configured in 1.3 below.

1.3.   A thesaurus file has been edited to determine these verbs as the following:

<XML>

<thesaurus xmlns=”x-schema:tsSchema.xml”>

<diacritics_sensitive>0</diacritics_sensitive>

<expansion>

<sub>Home</sub>

<sub>House</sub>

<sub>Appartment</sub>

</expansion>

</thesaurus>

</XML>

1.4    The next query has been invoked:

SELECT *

FROM AlgoData

WHERE CONTAINS (AlgoData, ‘FORMSOF(THESAURUS , house)’);

Go

SELECT *

FROM AlgoData

WHERE CONTAINS (AlgoData, ‘FORMSOF(THESAURUS , “דירה”)’);

Result: A record set has been returned, finding the match.

Test Number #3: A word or a phrase where the words begin with specified text

The test done in this topic was for the words:

I love to take my dog out, I love my dogs

1.1.   An row had been inserted inside AlgoData table using the above  procedure

1.2.   The following execution has been invoked:

exec [dbo].[LoadAlgoData] ‘C:\Users\tomer\Downloads\Backup\XMLAQuery3\XMLAQuery11.xml’,11,1,’tomer’,’tomer’,’XML’

  • The XML contined the phrases ‘I love to take my dog out’ and ‘I love my dogs’.
     1.3 

SELECT *
FROM AlgoData
WHERE CONTAINS (Description, ' "I love*" ' );

Result: A record set has been returned, finding the match.


Test Number #4: A word or phrase close to another word or phrase (proximity term)

The test has been made here search for words or phrases in close proximity to another word or phrase.

For example:

SELECT AlgoDataID, AlgoData
FROM AlgoData AS L1 INNER JOIN
CONTAINSTABLE(AlgoData, AlgoData, '(love ~ my ~ dog)' ) AS KEY_TBL
ON L1.AlgoDataID = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK;
  • The XML contined the phrases ‘I love to take my dog out’ and ‘I love my dogs’.

This example searches for the word “love” close to the word “my” close to word “dog”.

** Need to mention that all of the tests have been done also for Hebrew and also for English.

Advertisements

June 23, 2009 - Posted by | full text search | ,

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: