Wiki
Sphinx
Sphinx is an open source search engine that allows full-text searches. It is best known for performing searches over large data very efficiently. The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, mailboxes, and so on.
Official Website sphinxsearch.com
SphinxQL
The Sphinx search daemon supports MySQL binary network protocol and can be accessed with the regular MySQL API. Sphinx also supports a subset of SQL (SphinxQL). It supports standard querying of all index types with SELECT, modifying RealTime indexes with INSERT, REPLACE, and DELETE, and more.
Query Builder for SphinxQL github.com/FoolCode/SphinxQL-Query-Builder
Some key features of Sphinx are
- High indexing and searching performance
- Advanced indexing and querying tools
- Advanced result set post-processing
- Proven scalability up to billions of documents, terabytes of data, and thousands of queries per second
- Easy integration with SQL and XML data sources, and SphinxQL, SphinxAPI, or SphinxSE search interfaces
- Easy scaling with distributed searches
In this post I will set up Sphinx with MySQL server using the example SQL file included in the distribution package. It's a basic idea of how to use Sphinx for my project. Before installing Sphinx, I already have MySQL server installed on Ubuntu 14.04 x64 linux.
Install Sphinx
$ sudo apt-get install sphinxsearch
Now I have successfully installed Sphinx on my server. Before starting the Sphinx daemon, I should configure it.
Creating the Test Database
In this section, I will set up a database using the example SQL file provided with the package.
Import an example SQL file into the database.
Log in to the MySQL server shell.
$ mysql -u root -p
Enter the password for the MyQL root user when asked. The prompt will change to mysql>
.
Create a database named test
.
> CREATE DATABASE test;
Import the SQL file.
> SOURCE /etc/sphinxsearch/example.sql;
Then exit the MySQL shell.
> quit
Now I should have the test database filled with data.
Configuring Sphinx
In this section, I will configure the Sphinx configuration file.
Create the sphinx.conf
file.
$ sudo vim /etc/sphinxsearch/sphinx.conf
Sphinx configuration consists of 3 main blocks that are essential to run. They are index
, searchd
, and source
. Each of these blocks is described below, and at the end of this step, the entirety of sphinx.conf
is included for you to paste into the file.
The source
block contains the type of source, username and password to the MySQL server. The first column of the SQL query should be a unique id. The SQL query will run on every index and dump the data to Sphinx index file. Below are descriptions of each field and the source block itself.
sql_host
: Hostname for the MySQL host. In our example, this is the localhost
. This can be a domain or IP address.
sql_user
: Username for the MySQL login. In our example, this is root
.
sql_pass
: Password for the MySQL user. In our example, this is the root
MySQL user's password
sql_db
: Name of the database that stores data. In our example, this is test
.
sql_query
: This is the query thats dumps data to index.
source src1 { type = mysql sql_host = localhost sql_user = root sql_pass = password sql_db = test sql_port = 3306 # optional, default is 3306 sql_query = \ SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ FROM documents sql_attr_uint = group_id sql_attr_timestamp = date_added sql_query_info = SELECT * FROM documents WHERE id=$id }
The index
component contains the source and the path to store the data.
source
: Name of the source block. In our example, this is src1
.
path
: This path to save the index.
charset_type
: This is the charset of the index. You also set this to utf-8
.
index test1 { source = src1 path = /var/lib/sphinxsearch/data/test1 docinfo = extern charset_type = sbcs }
The searchd
component contains the port and other variables to run the Sphinx daemon.
listen
: This is the port which sphinx daemon will run. In our example, this is 9312.
query_log
: This path to save the query log.
pid_file
: This is path to PID file of Sphinx daemon.
max_matches
: Maximum number matches to return per search term.
seamless_rotate
: Prevents searchd stalls while rotating indexes with huge amounts of data to precache.
preopen_indexes
: Whether to forcibly preopen all indexes on startup.
unlink_old
: Whether to unlink old index copies on successful rotation.
searchd { listen = 127.0.0.1:9306:mysql41 # Port to listen on log = /var/log/sphinxsearch/searchd.log query_log = /var/log/sphinxsearch/query.log read_timeout = 5 max_children = 30 pid_file = /var/run/sphinxsearch/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 binlog_path = /var/lib/sphinxsearch/data }
The full configuration to copy and paste is below. The only variable you need to change below is the sql_pass
in the source block, which is highlighted.
source src1 { type = mysql sql_host = localhost sql_user = root sql_pass = password # change this to your root users MySQL password sql_db = test sql_port = 3306 sql_query = \ SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ FROM documents sql_attr_uint = group_id sql_attr_timestamp = date_added sql_query_info = SELECT * FROM documents WHERE id=$id } index test1 { source = src1 path = /var/lib/sphinxsearch/data/test1 docinfo = extern charset_type = sbcs } searchd { listen = 127.0.0.1:9306:mysql41 log = /var/log/sphinxsearch/searchd.log query_log = /var/log/sphinxsearch/query.log read_timeout = 5 max_children = 30 pid_file = /var/run/sphinxsearch/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 binlog_path = /var/lib/sphinxsearch/data }
Adding Data to the Index
In this section, we'll add data to the Sphinx index.
Add data to index using the config we created earlier.
$ sudo indexer --all
I get something that looks like the following.
Sphinx 2.0.4-id64-release (r3135) Copyright (c) 2001-2012, Andrew Aksyonoff Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/etc/sphinxsearch/sphinx.conf'... indexing index 'test1'... WARNING: collect_hits: mem_limit=0 kb too low, increasing to 25600 kb collected 4 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 4 docs, 193 bytes total 0.003 sec, 59696 bytes/sec, 1237.24 docs/sec total 3 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg total 9 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
If you got following error, that means searchd is running, stop it. Then delete the .spl
file.
FATAL: failed to lock /var/lib/sphinxsearch/data/test1.spl: Resource temporarily unavailable, will not index. Try --rotate option.
To keep the index up to date, lets create a cronjob.
Open crontab.
$ crontab -e
You may be asked which text editor you want to use. Choose whichever you prefer; in this post, I've used vim.
Type the following at the end of the file, then save and close it.
@hourly /usr/bin/indexer --rotate --config /etc/sphinxsearch/sphinx.conf --all
The above cronjob will run on every hour and add new data to the index using the default config file.
Starting Sphinx
In this section, I'll start the Sphinx daemon.
By default, the Sphinx daemon is tuned off. To enable Sphinx, first open /etc/default/sphinxsearch
.
$ sudo vim /etc/default/sphinxsearch
Find the line START=no
and set it to yes.
START=yes
Then, save and close the file.
Finally, start the Sphinx daemon.
$ sudo service sphinxsearch start
Testing Search
In this section, I will test my search. Enter the following command.
$ search this is my test document number
I get something that looks like the following.
Sphinx 2.0.4-id64-release (r3135) Copyright (c) 2001-2012, Andrew Aksyonoff Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/etc/sphinxsearch/sphinx.conf'... index 'test1': query 'this is my test document number ': returned 2 matches of 2 total in 0.006 sec displaying matches: 1. document=1, weight=7431, group_id=1, date_added=Fri Aug 14 16:44:37 2015 id=1 group_id=1 group_id2=5 date_added=2015-08-14 16:44:37 title=test one content=this is my test document number one. also checking search within phrases. 2. document=2, weight=7431, group_id=1, date_added=Fri Aug 14 16:44:37 2015 id=2 group_id=1 group_id2=6 date_added=2015-08-14 16:44:37 title=test two content=this is my test document number two words: 1. 'this': 4 documents, 4 hits 2. 'is': 4 documents, 4 hits 3. 'my': 2 documents, 2 hits 4. 'test': 3 documents, 5 hits 5. 'document': 2 documents, 2 hits 6. 'number': 3 documents, 3 hits
Use Query Builder for SphinxQL with PHP
Install package via composer
$ composer require foolz/sphinxql-query-builder
Create test script file
<?php require_once __DIR__ . '/vendor/autoload.php'; use Foolz\SphinxQL\SphinxQL; use Foolz\SphinxQL\Connection; // create a SphinxQL Connection object to use with SphinxQL $conn = new Connection(); $conn->setParams(array('host' => '127.0.0.1', 'port' => '9306')); $query = SphinxQL::create($conn)->select('id') ->from('test1') ->where('id', '=', 1); try { $result = $query->execute(); } catch (Exception $e){ var_dump($e); }
Output
array(1) { [0] => array(2) { ["id"] => string(1) "1" ["weight"] => string(1) "1" } }
By using Sphinx, we can easily add a custom search to your site.