Skip to content

How_to_install_Sphinx

RussH edited this page Sep 3, 2015 · 5 revisions

###How to install Sphinx

##Please note - new guidance is contained here for installing Sphinx https://github.com/opencats/OpenCATS/tree/master/optional-updates/latest-sphinx-search. This supersedes the instructions below.

Two methods;

the Sphinx package from the original CATS files, and manually using the latest Sphinx

Using the Sphinx package

Download the Sphinx package and untar in a local directory on your cats server;

wget http://www.opencats.org/downloads/sphinx_for_cats.tar.gz tar -xvf sphinx_for_cats.tar.gz

run the script (as a normal user)

chmod 755 install.sh  ./install.sh

Manually using the latest Sphinx

Install and Configure Sphinx by mabdalla » Fri Jul 02, 2010 3:18 am

These instructions apply to a SUSE Linux installation of CATS where the CATS package has been installed in the directory /srv/www/htdocs/cats. If your installation differs, you will need to make appropriate adjustments to the paths referenced in this documentation.

Basic Requirements

  • Installed and operational CATS system
  • functional cron
  • LSB compatible init.d for run control

What it Does

The Sphinx package consists of two primary parts, the indexer that creates the search indexes and is run periodically to rebuild those indexes, and the searchd daemon that handles the queries from the sphinxapi.php library.

The CATS integration design calls for a primary index, cats, that is rebuilt once per day via cron.daily. This once per day rebuild picks up all candidates/resumes in the database and completely reindexes the text resume, key skills, and candidate’s first and last names. Additionally, it resets the sph_counter to a high water mark.

A second delta index, catsdelta, handles additions to the database during the business day via a cron script that rebuilds only the secondary delta index based on the high water mark set at the prior run of the primary index. It is evisioned that this script would run every 20 or 30 minutes during the business day to keep that delta index up to date with recent additions to the database.

Installation Instructions

/usr/local/bin/indexer /usr/local/bin/searchd /usr/local/man/man8/searchd.8.gz /usr/local/etc/sphinx.conf.dist /usr/share/doc/packages/sphinx-0.9.7-rc2 /usr/share/doc/packages/sphinx-0.9.7-rc2/COPYING /usr/share/doc/packages/sphinx-0.9.7-rc2/doc /usr/share/doc/packages/sphinx-0.9.7-rc2/doc/mk.cmd /usr/share/doc/packages/sphinx-0.9.7-rc2/doc/sphinx.css /usr/share/doc/packages/sphinx-0.9.7-rc2/doc/sphinx.html /usr/share/doc/packages/sphinx-0.9.7-rc2/doc/sphinx.txt /usr/share/doc/packages/sphinx-0.9.7-rc2/doc/sphinx.xml /usr/share/doc/packages/sphinx-0.9.7-rc2/INSTALL

Copy sphinxapi.php

  • Copy the api/sphinxapi.php file to /srv/www/htdocs/cats/lib/sphinxapi.php

Create Indexer Cron Script

  • Create the following cron script in /etc/cron.daily/indexer to run the indexer on a daily basis.

#!/bin/sh /usr/local/bin/indexer --all --rotate --config /srv/www/htdocs/cats/modules/search/sphinx.conf

  • chown root:root /etc/cron.daily/indexer - chmod 700 /etc/cron.daily/indexer

Create Searchd init.d Script

  • Create an /etc/init.d/searchd script for the searchd daemon, the example below works well for a SUSE installation, you may need to alter it for non-SUSE distributions.

#! /bin/sh # Copyright (c) 1995-2004 SUSE Linux AG, Nuernberg, Germany. # All rights reserved. # # Author: Kurt Garloff # Please send feedback to http://www.suse.de/feedback/ # # /etc/init.d/searchd #   and its symbolic link # /(usr/)sbin/rcsearchd # #    This program is free software; you can redistribute it and/or modify  #    it under the terms of the GNU General Public License as published by  #    the Free Software Foundation; either version 2 of the License, or  #    (at your option) any later version.  #    This program is distributed in the hope that it will be useful,  #    but WITHOUT ANY WARRANTY; without even the implied warranty of  #    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the  #    GNU General Public License for more details.  #    You should have received a copy of the GNU General Public License  #    along with this program; if not, write to the Free Software  #    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # # ### BEGIN INIT INFO # Provides:          searchd for sphinx # Required-Start:    $syslog $remote_fs mysql # Should-Start: $time ypbind sendmail # Required-Stop:     $syslog $remote_fs # Should-Stop: $time ypbind sendmail # Default-Start:     3 5 # Default-Stop:      0 1 2 6 # Short-Description: searchd daemon for sphinx search # Description:       Starts the Sphinx searchd daemon ### END INIT INFO # Check for missing binaries (stale symlinks should not happen) # Note: Special treatment of stop for LSB conformance LOGFILE=/var/log/searchd.log SEARCHD=/usr/local/bin/searchd test -x $SEARCHD || { echo "$SEARCHD not installed";    if [ "$1" = "stop" ]; then exit 0;   else exit 5; fi; } # Source LSB init functions . /etc/rc.status # Reset status of this service rc_reset case "$1" in    start)   echo -n "Starting $SEARCHD "   ## Start daemon with startproc(8). If this fails   ## the return value is set appropriately by startproc.   startproc -l $LOGFILE $SEARCHD --config /srv/www/htdocs/cats/modules/search/sphinx.conf   # Remember status and be verbose   rc_status -v   ;;    stop)   echo -n "Shutting down $SEARCHD "   ## Stop daemon with killproc(8) and if this fails   ## killproc sets the return value according to LSB.   killproc -TERM $SEARCHD      # Remember status and be verbose   rc_status -v   ;;    try-restart|condrestart)   ## Do a restart only if the service was active before.   ## Note: try-restart is now part of LSB (as of 1.9).   ## RH has a similar command named condrestart.   if test "$1" = "condrestart"; then      echo "${attn} Use try-restart ${done}(LSB)${attn} rather than condrestart ${warn}(RH)${norm}"   fi   $0 status   if test $? = 0; then      $0 restart   else      rc_reset   # Not running is not a failure.   fi   # Remember status and be quiet   rc_status   ;;    restart)   ## Stop the service and regardless of whether it was   ## running or not, start it again.   $0 stop   $0 start   # Remember status and be quiet   rc_status   ;;    force-reload)   ## Signal the daemon to reload its config. Most daemons   ## do this on signal 1 (SIGHUP).   ## If it does not support it, restart.   echo -n "Reload service $SEARCHD "   ## if it supports it:   killproc -HUP $SEARCHD   rc_status -v   ## Otherwise:   #$0 try-restart   #rc_status   ;;    reload)   ## Like force-reload, but if daemon does not support   ## signaling, do nothing (!)   # If it supports signaling:   echo -n "Reload service $SEARCHD "   killproc -HUP $SEARCHD   rc_status -v        ## Otherwise if it does not support reload:   #rc_failed 3   #rc_status -v   ;;    status)   echo -n "Checking for service $SEARCHD "   ## Check status with checkproc(8), if process is running   ## checkproc will return with exit status 0.   # Return value is slightly different for the status command:   # 0 - service up and running   # 1 - service dead, but /var/run/  pid  file exists   # 2 - service dead, but /var/lock/ lock file exists   # 3 - service not running (unused)   # 4 - service status unknown :-(   # 5--199 reserved (5--99 LSB, 100--149 distro, 150--199 appl.)       # NOTE: checkproc returns LSB compliant status values.   checkproc $SEARCHD   # NOTE: rc_status knows that we called this init script with   # "status" option and adapts its messages accordingly.   rc_status -v   ;;    *)   echo "Usage: $0 {start|stop|status|try-restart|restart|force-reload|reload|probe}"   exit 1   ;; esac rc_exit

  • Save the searchd init.d script to /etc/init.d/searchd, make it executable, then install it as a service using insserv searchd. - Create a run control softlink for the init.d script

ln -s /etc/init.d/searchd /usr/sbin/rcsearchd

Create sphinx.conf

Create the sphinx.conf configuration file and save it to /srv/www/htdocs/cats/modules/search. You will need to create the ‘search’ directory since it doesn’t exist yet. Be sure to specify your correct and on the configuration lines where indicated. Also be sure to create the index file directory you specify under the index path if it doesn’t exist (/srv/www/htdocs/cats/modules/search/index).

# # sphinx configuration file for CATS # ############################################################################# ## data source definition ############################################################################# source catsdb {   type            = mysql   strip_html         = 0   index_html_attrs   =   # some straightforward parameters for 'mysql' source type   sql_host         = localhost   sql_user         =    sql_pass         =    sql_db         = cats   sql_port         = 3306   # optional, default is 3306   sql_query_pre      = REPLACE INTO sph_counter SELECT 1, MAX(attachment_id) from attachment   sql_query         = \      SELECT attachment_id, data_item_id, UNIX_TIMESTAMP(attachment.date_created) AS date_added, title, text \      last_name, first_name, notes, key_skills \      FROM attachment left join candidate on data_item_id = candidate_id \      where resume = 1 and attachment.site_id = 1 and data_item_type = 100 \      and attachment_id <= (SELECT max_doc_id from sph_counter where counter_id = 1)   sql_group_column   = data_item_id   sql_date_column      = date_added   sql_query_post      =   sql_query_info      = SELECT * FROM attachment WHERE attachment_id=$id }

source delta : catsdb {    sql_query_pre=    sql_query           = \        SELECT attachment_id, data_item_id, UNIX_TIMESTAMP(attachment.date_created) AS date_added, title, text \        last_name, first_name, notes, key_skills \        FROM attachment left join candidate on data_item_id = candidate_id \        where resume = 1 and attachment.site_id = 1 and data_item_type = 100 \        and attachment_id > (SELECT max_doc_id from sph_counter where counter_id = 1)     } ############################################################################# ## index definition ############################################################################# index cats {   source         = catsdb   # this is path and index file name without extension   #   # indexer will append different extensions to this path to   # generate names for both permanent and temporary index files   #   # .tmp* files are temporary and can be safely removed   # if indexer fails to remove them automatically   #   # .sp* files are fulltext index data files. specifically,   # .spa contains attribute values attached to each document id   # .spd contains doclists and hitlists   # .sph contains index header (schema and other settings)   # .spi contains wordlists   #   # MUST be defined   path         = /srv/www/htdocs/cats/modules/search/index/cats   docinfo         = extern   morphology         = none   stopwords         =   min_word_len      = 1   charset_type      = sbcs } index catsdelta : cats {    source          = delta    path            = /srv/www/htdocs/cats/modules/search/index/cats_delta } ############################################################################# ## indexer settings ############################################################################# indexer {   mem_limit         = 32M } ############################################################################# ## searchd settings ############################################################################# searchd {      address = 127.0.0.1   port            = 3312   log            = /var/log/searchd.log   query_log         = /var/log/query.log   read_timeout      = 5   max_children      = 30   pid_file         = /var/run/searchd.pid   # default is 1000 (just like with Google)   max_matches         = 1000 } # --eof--

Add sph_counter to CATS database

create the sph_counter table in the CATS database.

# in MySQL use cats CREATE TABLE sph_counter (    counter_id INTEGER PRIMARY KEY NOT NULL,    max_doc_id INTEGER NOT NULL );

Try creating your index

Index your CATS database by running the indexer from the commandline

helphand:~ # /usr/local/bin/indexer --all --config /srv/www/htdocs/cats/modules/search/sphinx.conf Sphinx 0.9.7-RC2 Copyright (c) 2001-2006, Andrew Aksyonoff using config file '/srv/www/htdocs/cats/modules/search/sphinx.conf'... indexing index 'cats'... collected 4668 docs, 20.7 MB sorted 2.1 Mhits, 100.0% done total 4668 docs, 20663522 bytes total 3.324 sec, 6216481.50 bytes/sec, 1404.34 docs/sec helphand:~ #      

Start the Searchd Daemon

Assuming your indexer processed without errors, your install is in good shape, so start the searchd daemon.

helphand:~ #rcsearchd start

Test Search from Commandline

Now test the search from the commandline by searching for a resume keyword.

helphand:~ # search --config /srv/www/htdocs/cats/modules/search/sphinx.conf controller         [lot's of returned stuff snipped for brevity]

       date_created=2006-10-11 13:38:19        date_modified=2006-10-11 13:38:19

words: 1. 'controller': 402 documents, 776 hits helphand:~ #

Setup Cron for Regular Updates

Assuming the search returned expected results, you are almost finished with the Sphinx install. You simply need to add a crontab entry to run the following periodically throughout the business day to index any new candidate resumes added to the database during the day. Create the following file in /etc/cron.d/cats

# use /bin/sh to run commands, no matter what /etc/passwd says SHELL=/bin/sh # mail any output to `root', no matter whose crontab this is MAILTO=root PATH=/usr/local/bin # # Business Days, Business Hours   20,50 7-17 * * Mon,Tue,Wed,Thu,Fri        root  $PATH/indexer --rotate --config /srv/www/htdocs/cats/modules/search/sphinx.conf catsdelta >>/dev/null

chmod 600 /etc/cron.d/cats

You’re done!

If you have any questions about this procedure, or would like to try it out on other unix/linux flavors, drop me a line at mabdalla [at] meait.com. I'd be happy to help

Regards,

-MA

Clone this wiki locally