As it is now, the default Drupal/Backdrop search works only with exact search terms. There are a few solutions for Drupal out there, but given our target audience of small businesses and shared hosting environments, we should definitely not look into implementing things like Apache Solr etc.

This issue also aims to solve backdropcms.org issues like funny search results in the modules search page: https://github.com/backdrop-ops/backdropcms.org/issues/95#issuecomment-1...

Here are some acceptable solutions that I believe could be merged/implemented.

N-Gram

Respective d.org issue against D8 (has patches, but only for D6 - was closed as wontfix for performance reasons): https://www.drupal.org/node/103548

Search is working, but I noticed that it doesn't pick up on partial words. For example, if you search on 'quake' you would expect to get back results containing the term 'earthquakes' but there are no results.

This behavior is also the case with plurals: Searching on 'engineer' when a node only includes 'engineers' will not return that node in the results. It is pretty standard searching behaviour for people to omit plurals and expect to see them in results. For example, searching on 'engineer' should return:

engineers engineering engineer's

...

It implements N-Gram searching. Basically, N-Gram will break the words into N-size pieces and those will be whats added to the index. Thus standard search queries will work as is but will find partial words. The actual code to do that all the N-Gram stuff is really small, the stop words take up most of the lines of code in that example.

For example, N = 3, It will take Bart and break it into Bar, art. And it will take Bart@simpson.com and break it into Bar, art, rt@, t@s .....

And the way search api is written, all queries will be broken into 3-grams as well. So when you search for Bart, it will really search for words in the search index matching Bar, art, thus returning bart@simpson.com

Same is true for earthquake and quake. The module works by trading off the size of the search index for query speed. Which, in my experience is acceptable. Much MUCH MUCH better then LIKE queries /me shivers. I make use of this on a couple small sites that I have deployed with strong results. Pretty elegant module, simple and small.

N-Grams are language independent as well. (Where as, stemming is very language dependent.) Though, the stop words part of that module is language dependent.

Partial Word Search

Respective d.org issue: https://www.drupal.org/node/498752

It was at some point marked as duplicate of the N-Gram issue, but it is a different approach and it has a working D7 patch (according to other people's comments - have not tested myself).

Porter-Stemmer

There is a D7 module: https://www.drupal.org/project/porterstemmer

Have tried this myself in the past, but it is language specific:

... Limitations and Notes - The Porter stemming algorithm has a few parts that work better with American English than British English, so some British spellings will not be stemmed correctly. It is also definitely English-specific, and non-English content will not be stemmed correctly. ...

...and has other limitations too that the N-Gram/partial-word search does not have, such as:

... - The Porter stemming algorithm attempts to reduce words to their lingustic root words -- it does not do general substring matching. So, for instance, it should make "walk", "walking", "walked", and "walks" all match in searching, but it will not make "walking" a match for "king".

Fuzzy Search

D7 module: https://www.drupal.org/project/fuzzysearch

This module provides drupal sites with a fuzzy search engine to allow for broader keyword matches including partial or misspelled keywords.

This one seems to be an implementation of the N-Gram solution as a module instead of a core patch:

Fuzzy matching is implemented by using ngrams. Each word in a node is split into 3 (default) letter lengths, so 'apple' gets indexed with 3 smaller strings 'app', 'ppl', 'ple'. The effect of this is that as long as your search matches X percentage (administerable in the admin settings) of the word the node will be pulled up in the results.

Features (highlighting and respective backdrop issues linking by me):

  • Misspellings and typos still provide relevant results.
  • External scoring factor hooks exposed so contrib modules can give administrators options for scoring.
  • Re-index function available to allow modules to specifically call a certain node for re-indexing at next cron run.
  • Indexing of CCK textfield field types and taxonomy terms.
  • Implements hook_nodeapi's 'update index' op, so current modules integrating with core search will work the same.
  • Works independently of core search.
  • Block provides related content type results from url query (#1317).
  • Ngram length is configurable.
  • Content types may be excluded from results.
  • Fuzzy highlighting of misspelled search terms.
GitHub Issue #: 
1320