Semantic Search
- Semantic Search & AI Features
Semantic Search & AI Features
Sigmie provides powerful semantic search capabilities using vector embeddings and AI-powered features that go beyond traditional keyword-based search.
Introduction
Semantic search allows you to find documents based on meaning and context rather than just exact keyword matches. This is particularly useful for:
- Finding similar content even with different wording
- Handling synonyms and related concepts naturally
- Improving search relevance for natural language queries
- Creating recommendation systems
- Multilingual search capabilities
Setting Up Semantic Fields
Basic Semantic Field
To enable semantic search on a field, simply add the semantic()
method:
use Sigmie\Mappings\NewProperties; $properties = new NewProperties;$properties->title('title')->semantic();$properties->text('description')->semantic();$properties->shortText('tags')->semantic(); $sigmie->newIndex('articles') ->properties($properties) ->create();
Advanced Semantic Configuration
You can fine-tune semantic fields with various parameters:
use Sigmie\Enums\VectorSimilarity;use Sigmie\Enums\VectorStrategy; // Configure accuracy and dimensions$properties->text('content') ->semantic(accuracy: 3, dimensions: 512); // Choose similarity function$properties->text('description') ->semantic(similarity: VectorSimilarity::Cosine); // Different similarity options$properties->text('title') ->semantic(similarity: VectorSimilarity::Euclidean); $properties->text('summary') ->semantic(similarity: VectorSimilarity::DotProduct); $properties->text('abstract') ->semantic(similarity: VectorSimilarity::MaxInnerProduct);
Vector Strategy Options
Control how vectors are used in search:
use Sigmie\Enums\VectorStrategy; // Default HNSW (Hierarchical Navigable Small World)$properties->text('content')->semantic(); // Script score strategy for more control$properties->shortText('experience') ->semantic() ->vectorStrategy(VectorStrategy::ScriptScore); // Concatenate strategy (requires elastiknn plugin)$properties->title('title') ->semantic() ->vectorStrategy(VectorStrategy::Concatenate);
Accuracy Levels
Accuracy affects the balance between search quality and performance:
// Accuracy 1 (fastest, less accurate)$properties->text('content')->semantic(accuracy: 1);// Results in: m=16, ef_construction=80, dims=256 // Accuracy 3 (balanced)$properties->text('content')->semantic(accuracy: 3);// Results in: m=64, ef_construction=300, dims=256 // Accuracy 5 (high quality)$properties->text('content')->semantic(accuracy: 5);// Results in: m=64, ef_construction=400, dims=256 // Accuracy 7 (script score for highest quality)$properties->text('content')->semantic(accuracy: 7);// Uses script score strategy with exact vectors
Multiple Vector Representations
You can have multiple vector representations for the same field:
$properties->text('job_description') ->semantic(accuracy: 3, dimensions: 512, similarity: VectorSimilarity::Cosine) ->semantic(accuracy: 5, dimensions: 512, similarity: VectorSimilarity::Euclidean);
Advanced Semantic Field Builder
For more control, use the NewSemanticField
builder:
use Sigmie\Mappings\NewSemanticField; $properties->text('content') ->newSemantic(function (NewSemanticField $semantic) { $semantic->cosineSimilarity(); // Other options: // $semantic->euclideanSimilarity(); // $semantic->dotProductSimilarity(); // $semantic->maxInnerProductSimilarity(); });
Using Semantic Search
Basic Semantic Search
Enable semantic search in your queries:
$response = $sigmie->newSearch('articles') ->semantic() ->properties($properties) ->queryString('artificial intelligence machine learning') ->get(); $hits = $response->hits();
Combining Keyword and Semantic Search
By default, Sigmie combines both keyword and semantic search for best results:
$response = $sigmie->newSearch('articles') ->semantic() // Enable semantic search ->properties($properties) ->queryString('AI technology') // Will use both keyword and vector search ->fields(['title', 'description']) ->get();
Semantic-Only Search
Disable keyword search to rely purely on semantic matching:
$response = $sigmie->newSearch('articles') ->semantic() ->disableKeywordSearch() // Pure semantic search ->properties($properties) ->queryString('machine learning algorithms') ->get();
Handling Empty Queries
Control behavior when query string is empty:
$response = $sigmie->newSearch('articles') ->semantic() ->noResultsOnEmptySearch() // Return no results for empty queries ->properties($properties) ->queryString('') ->get();
Document Indexing with Embeddings
Automatic Embedding Generation
When you index documents with semantic fields, Sigmie automatically generates embeddings:
$documents = $sigmie->collect('articles', refresh: true) ->properties($properties) // Properties with semantic fields ->merge([ new Document([ 'title' => 'Introduction to Machine Learning', 'content' => 'Machine learning is a subset of artificial intelligence...', ]), new Document([ 'title' => 'Deep Learning Fundamentals', 'content' => 'Neural networks form the basis of deep learning...', ]), ]);
Working with Pre-computed Embeddings
If you have pre-computed embeddings, you can include them directly:
new Document([ 'title' => 'AI Research Paper', 'content' => 'Artificial intelligence has evolved significantly...', 'embeddings' => [ 'title_vector' => [0.1, 0.2, 0.3, ...], // Your pre-computed vectors 'content_vector' => [0.4, 0.5, 0.6, ...], ]]);
AI Providers
Sigmie supports different AI providers for generating embeddings:
Sigmie AI (Default)
use Sigmie\Semantic\Providers\SigmieAI; // Default provider - automatically configured$response = $sigmie->newSearch('articles') ->semantic() ->properties($properties) ->queryString('natural language processing') ->get();
No-op Provider (for Testing)
use Sigmie\Semantic\Providers\Noop; // For testing - doesn't generate real embeddings$noop = new Noop();
Custom AI Providers
You can implement custom AI providers by extending the AbstractAIProvider
:
use Sigmie\Semantic\Providers\AbstractAIProvider;use Sigmie\Semantic\Contracts\AIProvider; class CustomAIProvider extends AbstractAIProvider implements AIProvider{ public function embeddings(array $texts): array { // Your custom embedding logic return $this->callCustomEmbeddingAPI($texts); }}
Advanced Features
Reranking
Improve search results with AI-powered reranking:
use Sigmie\Semantic\Reranker; $response = $sigmie->newSearch('articles') ->semantic() ->properties($properties) ->queryString('quantum computing applications') ->get(); // Apply reranking$reranker = new Reranker();$rerankedResults = $reranker->rerank($response, $queryString);
Handling Arrays and Multiple Values
Semantic fields work with array values:
new Document([ 'experience' => [ 'Artist', 'Graphic Design', 'Creative Director' ],]); // Search will find semantically related terms$response = $sigmie->newSearch('professionals') ->semantic() ->properties($properties) ->queryString('drawing illustration') // Finds "Artist" ->get();
Field-Specific Semantic Search
Limit semantic search to specific fields:
$response = $sigmie->newSearch('articles') ->semantic() ->properties($properties) ->fields(['title']) // Only search in title semantically ->queryString('deep learning neural networks') ->get();
Plugin Requirements
Some advanced features require additional Elasticsearch plugins:
Elastiknn Plugin
For certain vector strategies:
use Sigmie\Sigmie; // Register pluginsSigmie::registerPlugins([ 'elastiknn']); // Now you can use Concatenate strategy$properties->title('title') ->semantic() ->vectorStrategy(VectorStrategy::Concatenate);
Checking Plugin Availability
Test for plugin availability in your tests:
$this->skipIfElasticsearchPluginNotInstalled('elastiknn');
Vector Similarity Functions
Cosine Similarity (Default)
Best for general text similarity, handles different text lengths well:
$properties->text('content')->semantic(similarity: VectorSimilarity::Cosine);
Euclidean Distance (L2 Norm)
Good for precise distance measurements:
$properties->text('content')->semantic(similarity: VectorSimilarity::Euclidean);
Dot Product
Efficient for normalized vectors:
$properties->text('content')->semantic(similarity: VectorSimilarity::DotProduct);
Max Inner Product
Optimized for maximum inner product similarity:
$properties->text('content')->semantic(similarity: VectorSimilarity::MaxInnerProduct);
Performance Considerations
Index Parameters
The accuracy level affects these HNSW parameters:
- m: Number of bi-directional links for each node
- ef_construction: Size of the dynamic candidate list
- dims: Vector dimensions (typically 256, 512, or 768)
// Low accuracy = faster indexing, less memory$properties->text('content')->semantic(accuracy: 1); // m=16, ef_construction=80 // High accuracy = better quality, more resources$properties->text('content')->semantic(accuracy: 5); // m=128, ef_construction=800
Memory Usage
Higher accuracy levels use more memory:
// Memory efficient$properties->text('content')->semantic(accuracy: 1, dimensions: 256); // More memory intensive$properties->text('content')->semantic(accuracy: 5, dimensions: 768);
Search Speed vs Quality
Use script score for highest quality when search speed is less critical:
$properties->text('content')->semantic(accuracy: 7); // Uses script score
Common Patterns
E-commerce Product Search
$properties = new NewProperties;$properties->name('name')->semantic();$properties->text('description')->semantic();$properties->tags('features')->semantic();$properties->category('category');$properties->price('price'); $response = $sigmie->newSearch('products') ->semantic() ->properties($properties) ->queryString('comfortable running shoes') // Finds semantically similar products ->filters('price<=100') ->get();
Content Recommendation
$properties = new NewProperties;$properties->title('title')->semantic();$properties->longText('content')->semantic(accuracy: 3);$properties->tags('tags')->semantic(); // Find similar articles$response = $sigmie->newSearch('articles') ->semantic() ->disableKeywordSearch() // Pure similarity search ->properties($properties) ->queryString('machine learning deep learning neural networks') ->size(5) ->get();
Multilingual Search
$properties = new NewProperties;$properties->text('content')->semantic(); // Embeddings handle language naturally $response = $sigmie->newSearch('documents') ->semantic() ->properties($properties) ->queryString('machine learning') // Finds "apprentissage automatique", "机器学习", etc. ->get();
Job Matching
$properties = new NewProperties;$properties->shortText('skills')->semantic();$properties->longText('experience')->semantic();$properties->text('job_description')->semantic(accuracy: 3, dimensions: 512); $response = $sigmie->newSearch('candidates') ->semantic() ->properties($properties) ->queryString('python data science machine learning') ->get();
Troubleshooting
Empty Results with Semantic Search
If semantic search returns no results:
- Check if embeddings are being generated
- Verify AI provider configuration
- Use keyword search as fallback
- Check for empty query handling
$response = $sigmie->newSearch('articles') ->semantic() ->properties($properties) ->queryString($query ?: 'fallback query') // Prevent empty queries ->get(); if ($response->total() === 0) { // Fallback to keyword search $response = $sigmie->newSearch('articles') ->properties($properties) ->queryString($query) ->get();}
Performance Issues
- Lower accuracy for better performance
- Reduce vector dimensions
- Use appropriate similarity function
- Consider batch operations
// Performance optimized$properties->text('content')->semantic(accuracy: 1, dimensions: 256);
Memory Errors
- Reduce accuracy level
- Use smaller dimensions
- Implement proper index management
// Memory efficient configuration$properties->text('content')->semantic( accuracy: 2, dimensions: 256, similarity: VectorSimilarity::Cosine);
Best Practices
- Start Simple: Begin with basic semantic fields and default settings
- Test Accuracy Levels: Experiment with different accuracy levels for your use case
- Monitor Performance: Watch memory usage and search latency
- Combine Strategies: Use both keyword and semantic search for best results
- Field Selection: Apply semantic search to the most important fields
- Batch Indexing: Index documents in batches for better performance
- Error Handling: Always handle empty queries and fallback scenarios
// Well-balanced configuration$properties = new NewProperties;$properties->title('title')->semantic(accuracy: 3); // Important field, good accuracy$properties->text('content')->semantic(accuracy: 2); // Large field, moderate accuracy$properties->tags('tags')->semantic(accuracy: 1); // Many values, fast processing $response = $sigmie->newSearch('articles') ->semantic() ->noResultsOnEmptySearch() ->properties($properties) ->queryString($userQuery) ->get();