Char filters

Introduction

Char filters are filters applied to texts before they reach the Tokenizer and they become Tokens.

HTML Strip

The HTML Strip removes all HTML from the analyzed text. You can use it by calling the stripHTML method on a NewAnalyzer instance or by passing a new instance of the HTMLStrip class to the charFilter method.

use Sigmie\Index\Analysis\CharFilter\HTMLStrip;
 
$newAnalyer->charFilter(new HTMLStrip);
 
// OR
 
$newAnalyer->stripHTML();

Here is an example of how a text containing the <span> HTML tags is transformed using the Strip HTML char filter.

"<span>Some people are worth melting for.</span>"
-------------------------------------------------
Strip HTML
-------------------------------------------------
"Some people are worth melting for."

Char Mapping

The Char Mapping filter replaces any occurrences of the passed string with another one. To use it either pass the replacements to the mapChars method on the NewIndex builder or use the charFilter method passing a Mapping class instance.

use Sigmie\Index\Analysis\CharFilter\Mapping;
 
$newAnalyer->charFilter(new Mapping(
name: 'mapping_char_filter',
mappings: [
':)' => 'happy',
':(' => 'sad',
]
));
 
// OR
 
$newAnalyzer->mapChars([':)'=> 'happy']);

In the below example you can see how a text containing a happy emoji (:)) will look like after we replace it with the word happy using the Map Chars filter.

"Even miracles take a little time. :)"
-----------------------------------------
Map Chars ":)" -> "happy"
-----------------------------------------
"Even miracles take a little time. happy"

Pattern replace

For handling any edge case you can use the Pattern replace filter, which will match a regex pattern and replace it with the given string.

You can use it by calling the patternReplace method on the NewIndex builder instance, or by passing an instance of the Pattern class to the charFilter method.

use Sigmie\Index\Analysis\CharFilter\Pattern;
 
$newAnalyer->charFilter(new Pattern(
name: 'pattern_replace_char_filter',
pattern: ':D|:\)',
replace: 'happy'
));
 
// OR
 
$newAnalyer->patternReplace(pattern: ':D|:\)', replace:'happy');

Here is an example of how your text will look after we apply the Pattern Replace filter to replace any match of the :D|:) pattern with the happy word.

"This is the perfect time to panic! :D :)"
------------------------------------------------
Pattern Replace ":D|:\)" -> "happy"
------------------------------------------------
"This is the perfect time to panic! happy happy"