[UX] Implement N-Gram/Porter-Stemmer/Fuzzy/Partial-Word search in core.

As it is now, the default Drupal/Backdrop search works only with exact search terms. There are a few solutions for Drupal out there, but given our target audience of small businesses and affordable shared hosting environments, things like Apache Solr implementations etc. are not always possible.

As part of the efforts/solution in this issue, a nice-to-have would be solving some of the search-related issues in backdropcms.org, or at least improving the situation. Things like the funny search results in the modules search page for instance: https://github.com/backdrop-ops/backdropcms.org/issues/95#issuecomment-1...

Here are some acceptable solutions that I believe could be merged/implemented.

Partial Word Search

Respective d.org issue: https://www.drupal.org/node/498752

It was at some point marked as duplicate of the N-Gram issue (see below), but it is a different approach and it has a working D7 patch (according to other people's comments - have not tested myself).

Allow wildcards like * in search

Respective d.org issue: https://www.drupal.org/project/drupal/issues/487764

Started as a D6/D7 request - has a starting patch for D8 - currently slatted for 11.x, but very few comments and activity overall.

Allow users to use advanced search without keywords entered

Respective issue in d.org: https://www.drupal.org/project/drupal/issues/1126688

Problem/Motivation

When using Advanced Search on a Node search page, you have to enter keywords even if there are other valid search conditions chosen. You shouldn't have to.

This has been shifting between D7 and D8, and has patches for both. No recent activity though.

N-Gram

Respective d.org issue: https://www.drupal.org/node/103548

Started as D6/7 but had patches only for D6 - was then moved to D8 and later closed as wontfix for performance reasons. At some point re-opened, and has some recent activity and patches agains D10.

Search is working, but I noticed that it doesn't pick up on partial words. For example, if you search on 'quake' you would expect to get back results containing the term 'earthquakes' but there are no results.

This behavior is also the case with plurals: Searching on 'engineer' when a node only includes 'engineers' will not return that node in the results. It is pretty standard searching behaviour for people to omit plurals and expect to see them in results. For example, searching on 'engineer' should return:

engineers engineering engineer's

...

It implements N-Gram searching. Basically, N-Gram will break the words into N-size pieces and those will be whats added to the index. Thus standard search queries will work as is but will find partial words. The actual code to do that all the N-Gram stuff is really small, the stop words take up most of the lines of code in that example.

For example, N = 3, It will take Bart and break it into Bar, art. And it will take Bart@simpson.com and break it into Bar, art, rt@, t@s .....

And the way search api is written, all queries will be broken into 3-grams as well. So when you search for Bart, it will really search for words in the search index matching Bar, art, thus returning bart@simpson.com

Same is true for earthquake and quake. The module works by trading off the size of the search index for query speed. Which, in my experience is acceptable. Much MUCH MUCH better then LIKE queries /me shivers. I make use of this on a couple small sites that I have deployed with strong results. Pretty elegant module, simple and small.

N-Grams are language independent as well. (Where as, stemming is very language dependent.) Though, the stop words part of that module is language dependent.

Porter-Stemmer

There is a D7 module: https://www.drupal.org/project/porterstemmer

Have tried this myself in the past, but it is language specific:

... Limitations and Notes - The Porter stemming algorithm has a few parts that work better with American English than British English, so some British spellings will not be stemmed correctly. It is also definitely English-specific, and non-English content will not be stemmed correctly. ...

...and has other limitations too that the N-Gram/partial-word search does not have, such as:

... - The Porter stemming algorithm attempts to reduce words to their lingustic root words -- it does not do general substring matching. So, for instance, it should make "walk", "walking", "walked", and "walks" all match in searching, but it will not make "walking" a match for "king".

Fuzzy Search

D7 module: https://www.drupal.org/project/fuzzysearch

This module provides drupal sites with a fuzzy search engine to allow for broader keyword matches including partial or misspelled keywords.

This one seems to be an implementation of the N-Gram solution as a module instead of a core patch:

Fuzzy matching is implemented by using ngrams. Each word in a node is split into 3 (default) letter lengths, so 'apple' gets indexed with 3 smaller strings 'app', 'ppl', 'ple'. The effect of this is that as long as your search matches X percentage (administerable in the admin settings) of the word the node will be pulled up in the results.

Features (highlighting and respective backdrop issues linking by me):

Misspellings and typos still provide relevant results.

External scoring factor hooks exposed so contrib modules can give administrators options for scoring.

Re-index function available to allow modules to specifically call a certain node for re-indexing at next cron run.

Indexing of CCK textfield field types and taxonomy terms.

Implements hook_nodeapi's 'update index' op, so current modules integrating with core search will work the same.

Works independently of core search.

Block provides related content type results from url query (#1317).

Ngram length is configurable.

Content types may be excluded from results.

Fuzzy highlighting of misspelled search terms.

GitHub Link:

https://github.com/backdrop/backdrop-issues/issues/1320

GitHub Issue #:

1320

[UX] Implement N-Gram/Porter-Stemmer/Fuzzy/Partial-Word search in core.

Partial Word Search

Allow wildcards like * in search

Allow users to use advanced search without keywords entered

Problem/Motivation

N-Gram

Porter-Stemmer

Fuzzy Search

Weekly office hours

Recent topics

imagecolorsforindex(): Argument #2 ($color) is out of range

Laying out foundation for the homepage

Removing `Powered by Backdrop` Text

Recent comments

imagecolorsforindex(): Argument #2 ($color) is out of range

Customizable newsletter

Creating Footer columns