1
0
mirror of synced 2024-12-13 22:56:04 +03:00
doctrine2/manual/docs/en/searching.txt
2007-10-17 21:19:51 +00:00

153 lines
6.7 KiB
Plaintext

++ Introduction
Searching is a huge topic, hence an entire chapter has been devoted to a plugin called Doctrine_Search. Doctrine_Search is a fulltext indexing and searching tool. It can be used for indexing and searching both database and files.
Consider we have a class called NewsItem with the following definition:
<code type="php">
class NewsItem extends Doctrine_Record
{
public function setTableDefinition()
{
$this->hasColumn('title', 'string', 200);
$this->hasColumn('content', 'string');
}
}
</code>
Now lets say we have an application where users are allowed to search for different news items, an obvious way to implement this would be building a form and based on that form build DQL queries such as:
<code type="sql">
SELECT n.* FROM NewsItem n WHERE n.title LIKE ? OR n.content LIKE ?
</code>
As the application grows these kind of queries become very slow. For example
when using the previous query with parameters '%framework%' and '%framework%'
(this would be equivalent of 'find all news items whose title or content
contains word 'framework') the database would have to traverse through each row in the table, which would naturally be very very slow.
Doctrine solves this with its search component and inverse indexes. First lets alter our definition a bit:
<code type="php">
class NewsItem extends Doctrine_Record
{
public function setTableDefinition()
{
$this->hasColumn('title', 'string', 200);
$this->hasColumn('content', 'string');
}
public function setUp()
{
$this->actAs('Searchable', array('fields' => array('title', 'content')));
}
}
</code>
Here we tell Doctrine that NewsItem class acts as searchable (internally Doctrine loads Doctrine_Template_Searchable) and fields title and content are marked as fulltext indexed fields. This means that everytime a NewsItem is added or updated Doctrine will:
1. Update the inverse search index or
2. Add new pending entry to the inverse search index (sometimes it can be efficient to update the inverse search index in batches)
Sometimes you may want to alter the search object options afterwards. The search object can be accessed as follows:
<code type="php">
$search = $conn->getTable('NewsItem')
->getTemplate('Searchable')
->getPlugin();
</code>
++ Index structure
The structure of the inverse index Doctrine uses is the following:
[ (string) keyword] [ (string) field ] [ (integer) position ] [ (mixed) [foreign_keys] ]
* **keyword** is the keyword in the text that can be searched for
* **field** is the field where the keyword was found
* **position** is the position where the keyword was found
* **[foreign_keys]** either one or multiple fields depending on the owner component (here NewsItem)
In the NewsItem example the [foreign_keys] would simply contain one field id with foreign key references to NewsItem(id) and with onDelete => CASCADE constraint.
An example row in this table might look something like:
|| keyword || field || position || id ||
|| database || title || 3 || 1 ||
In this example the word database is the third word of the title field of NewsItem 1.
++ Index building
Whenever a searchable record is being inserted into database Doctrine executes the index building procedure. This happens in the background as the procedure is being invoked by the search listener. The phases of this procedure are:
1. Analyze the text using a Doctrine_Search_Analyzer based class
2. Insert new rows into index table for all analyzed keywords
Sometimes you may not want to update the index table directly when new searchable entries are added. Rather you may want to batch update the index table in certain intervals. For disabling the direct update functionality you'll need to set the batchUpdates option to true.
<code type="php">
$search->setOption('batchUpdates', true);
</code>
The actual batch updating procedure can be invoked with the batchUpdateIndex() method. It takes two optional arguments: limit and offset. Limit can be used for limiting the number of batch indexed entries while the offset can be used for setting the first entry to start the indexing from.
<code type="php">
$newsItem = new NewsItem();
$newsItem->batchUpdateIndex();
</code>
++ Text analyzers
By default Doctrine uses Doctrine_Search_Analyzer_Standard for analyzing the text. This class performs the following things:
1. Strips out stop-keywords (such as 'and', 'if' etc.)
As many commonly used words such as 'and', 'if' etc. have no relevance for the search, they are being stripped out in order to keep the index size reasonable.
2. Makes all keywords lowercased
When searching words 'database' and 'DataBase' are considered equal by the standard analyzer, hence the standard analyzer lowercases all keywords.
3. Replaces all non alpha-numeric marks with whitespace
In normal text many keywords might contain non alpha-numeric chars after them, for example 'database.'. The standard analyzer strips these out so that 'database' matches 'database.'.
4. Replaces all quotation marks with empty strings so that "O'Connor" matches "oconnor"
You can write your own analyzer class by making a class that implements Doctrine_Search_Analyzer_Interface. This analyzer can then be applied to the search object as follows:
<code type="php">
$search->setOption('analyzer', new MyAnalyzer());
</code>
++ Query language
Doctrine_Search provides a query language similar to Apache Lucene. The parsed behind Doctrine_Search_Query converts human readable, easy-to-construct search queries to their complex sql equivalents.
++ File searches
As stated before Doctrine_Search can also be used for searching files. Lets say we have a directory which we want to be searchable. First we need to create an instance of Doctrine_Search_File which is a child of Doctrine_Search providing some extra functionality needed for the file searches.
<code type="php">
$search = new Doctrine_Search_File();
</code>
Second thing to do is to generate the index table. By default Doctrine names the database index class as FileIndex.
<code type="php">
$search->buildDefinition(); // builds to table and record class definitions
$conn->export->exportClasses(array('FileIndex'));
</code>
Now we can start using the file searcher. First lets index some directory:
<code type="php">
$search->indexDirectory('myfiles');
</code>
The indexDirectory() iterates recursively through given directory and analyzes all files within it updating the index table as necessary.
Finally we can start searching for pieces of text within the indexed files:
<code type="php">
$resultSet = $search->search('database orm');
</code>