130 lines
5.7 KiB
Plaintext
130 lines
5.7 KiB
Plaintext
++ Introduction
|
|
|
|
Searching is a huge topic, hence an entire chapter has been devoted to a plugin called Doctrine_Search. Doctrine_Search is a fulltext indexing and searching tool. It can be used for indexing and searching both database and files.
|
|
|
|
Consider we have a class called NewsItem with the following definition:
|
|
|
|
<code type="php">
|
|
class NewsItem extends Doctrine_Record
|
|
{
|
|
public function setTableDefinition()
|
|
{
|
|
$this->hasColumn('title', 'string', 200);
|
|
$this->hasColumn('content', 'string');
|
|
}
|
|
}
|
|
</code>
|
|
|
|
Now lets say we have an application where users are allowed to search for different news items, an obvious way to implement this would be building a form and based on that form build DQL queries such as:
|
|
<code type="sql">
|
|
SELECT n.* FROM NewsItem n WHERE n.title LIKE ? OR n.content LIKE ?
|
|
</code>
|
|
|
|
As the application grows these kind of queries become very slow. For example
|
|
when using the previous query with parameters '%framework%' and '%framework%'
|
|
(this would be equivalent of 'find all news items whose title or content
|
|
contains word 'framework') the database would have to traverse through each row in the table, which would naturally be very very slow.
|
|
|
|
Doctrine solves this with its search component and inverse indexes. First lets alter our definition a bit:
|
|
|
|
<code type="php">
|
|
class NewsItem extends Doctrine_Record
|
|
{
|
|
public function setTableDefinition()
|
|
{
|
|
$this->hasColumn('title', 'string', 200);
|
|
$this->hasColumn('content', 'string');
|
|
}
|
|
public function setUp()
|
|
{
|
|
$this->actAs('Searchable', array('fields' => array('title', 'content')));
|
|
}
|
|
}
|
|
</code>
|
|
|
|
Here we tell Doctrine that NewsItem class acts as searchable (internally Doctrine loads Doctrine_Template_Searchable) and fields title and content are marked as fulltext indexed fields. This means that everytime a NewsItem is added or updated Doctrine will:
|
|
|
|
1. Update the inverse search index or
|
|
2. Add new pending entry to the inverse search index (its efficient to update the inverse search index in batches)
|
|
|
|
++ Index structure
|
|
|
|
The structure of the inverse index Doctrine uses is the following:
|
|
|
|
[ (string) keyword] [ (string) field ] [ (integer) position ] [ (mixed) [foreign_keys] ]
|
|
|
|
* **keyword** is the keyword in the text that can be searched for
|
|
* **field** is the field where the keyword was found
|
|
* **position** is the position where the keyword was found
|
|
* **[foreign_keys]** either one or multiple fields depending on the owner component (here NewsItem)
|
|
|
|
In the NewsItem example the [foreign_keys] would simply contain one field id with foreign key references to NewsItem(id) and with onDelete => CASCADE constraint.
|
|
|
|
An example row in this table might look something like:
|
|
|
|
|| keyword || field || position || id ||
|
|
|| database || title || 3 || 1 ||
|
|
|
|
In this example the word database is the third word of the title field of NewsItem 1.
|
|
|
|
++ Index building
|
|
|
|
Whenever a searchable record is being inserted into database Doctrine executes the index building procedure. The phases of this operation are:
|
|
|
|
1. Analyze the text using a Doctrine_Search_Analyzer based class
|
|
2. Insert new rows into index table for all analyzed keywords
|
|
|
|
++ Text analyzers
|
|
|
|
By default Doctrine uses Doctrine_Search_Analyzer_Standard for analyzing the text. This class performs the following things:
|
|
|
|
1. Strips out stop-keywords (such as 'and', 'if' etc.)
|
|
As many commonly used words such as 'and', 'if' etc. have no relevance for the search, they are being stripped out in order to keep the index size reasonable.
|
|
2. Makes all keywords lowercased
|
|
When searching words 'database' and 'DataBase' are considered equal by the standard analyzer, hence the standard analyzer lowercases all keywords.
|
|
3. Replaces all non alpha-numeric marks with whitespace
|
|
In normal text many keywords might contain non alpha-numeric chars after them, for example 'database.'. The standard analyzer strips these out so that 'database' matches 'database.'.
|
|
4. Replaces all quotation marks with empty strings so that "O'Connor" matches "oconnor"
|
|
|
|
You can write your own analyzer class by making a class that implements Doctrine_Search_Analyzer_Interface. This analyzer can then be applied to the search object as follows:
|
|
|
|
<code type="php">
|
|
$search->setOption('analyzer', new MyAnalyzer());
|
|
</code>
|
|
|
|
++ Query language
|
|
|
|
Doctrine_Search provides a query language similar to Apache Lucene. The parsed behind Doctrine_Search_Query converts human readable, easy-to-construct search queries to their complex sql equivalents.
|
|
|
|
++ File searches
|
|
|
|
As stated before Doctrine_Search can also be used for searching files. Lets say we have a directory which we want to be searchable. First we need to create an instance of Doctrine_Search_File which is a child of Doctrine_Search providing some extra functionality needed for the file searches.
|
|
|
|
<code type="php">
|
|
$search = new Doctrine_Search_File();
|
|
</code>
|
|
|
|
Second thing to do is to generate the index table. By default Doctrine names the database index class as FileIndex.
|
|
|
|
<code type="php">
|
|
$search->buildDefinition(); // builds to table and record class definitions
|
|
|
|
$conn->export->exportClasses(array('FileIndex'));
|
|
</code>
|
|
|
|
Now we can start using the file searcher. First lets index some directory:
|
|
|
|
<code type="php">
|
|
$search->indexDirectory('myfiles');
|
|
</code>
|
|
|
|
The indexDirectory() iterates recursively through given directory and analyzes all files within it updating the index table as necessary.
|
|
|
|
Finally we can start searching for pieces of text within the indexed files:
|
|
|
|
<code type="php">
|
|
$resultSet = $search->search('database orm');
|
|
</code>
|
|
|
|
|