Indices
Create, configure, and update Elasticsearch indices with Sigmie — schema, analysis, sharding, replication, zero-downtime updates, and deletion.
On this page
An index is a container for related documents — closer to a database table than a folder. Unlike a relational database, you don’t have to define columns before inserting rows: Elasticsearch will create the index and infer field types on first write. But for serious applications you almost always define a schema first.
Create an index
The simplest possible index:
$sigmie->newIndex('movies')->create();
That works, but Elasticsearch will guess at field types as you index documents. For control over how fields are stored and searched, pass properties:
use Sigmie\Mappings\NewProperties; $props = new NewProperties;$props->title('title');$props->name('director');$props->category('genre');$props->number('year')->integer();$props->date('release_date'); $sigmie->newIndex('movies') ->properties($props) ->create();
Configure analysis
Index-level analysis controls how text is tokenized and normalized at index time:
$sigmie->newIndex('movies') ->properties($props) ->tokenizeOnWhitespaces() // split on whitespace ->lowercase() // normalize to lowercase ->trim() // strip surrounding whitespace ->create();
The same analyzer runs on query strings at search time, so a query for Matrix matches a document storing matrix.
See Analysis and Token Filters for the full pipeline.
Language analyzers
use Sigmie\Languages\English\English; $sigmie->newIndex('articles') ->properties($props) ->language(new English) ->englishStemmer() ->englishStopwords() ->englishLowercase() ->create();
See Languages for English, German, and Greek builders.
Autocomplete
$props = new NewProperties;$props->title('title');$props->text('description');$props->autocomplete(); $sigmie->newIndex('movies') ->properties($props) ->autocomplete(['title', 'description']) ->create();
Sharding and replication
$sigmie->newIndex('movies') ->shards(3) ->replicas(1) ->create();
A shard is a smaller index that holds a subset of documents. An index with 3 shards and 8 documents distributes like:
movies├─ shard 1│ ├─ document 1│ ├─ document 2│ └─ document 3├─ shard 2│ ├─ document 4│ ├─ document 5│ └─ document 6└─ shard 3 ├─ document 7 └─ document 8
A replica is a copy of a shard on another node, for fault tolerance. With 3 primaries and 2 replicas across 3 nodes:
cluster├─ node 1│ ├─ primary 1│ ├─ replica of primary 2│ └─ replica of primary 3├─ node 2│ ├─ primary 2│ ├─ replica of primary 1│ └─ replica of primary 3└─ node 3 ├─ primary 3 ├─ replica of primary 1 └─ replica of primary 2
If a node fails, the surviving nodes still hold every document. Replicas are promoted to primaries automatically.
For most workloads, keep each shard under 30 GB.
Add documents
To write documents into an index, get a collection for it:
use Sigmie\Document\Document; $sigmie->collect('movies') ->merge([ new Document(['title' => 'Cinderella']), new Document(['title' => 'Snow White']), new Document(['title' => 'Sleeping Beauty']), ]);
To make documents immediately searchable (for tests), pass refresh: true:
$sigmie->collect('movies', refresh: true)->merge($documents);
See Documents for the full collection API.
Update an index
Elasticsearch indices are immutable: once analysis is applied to a document, you can’t re-analyze it without re-indexing. Sigmie provides an update() method that handles this transparently using aliases:
use Sigmie\Index\UpdateIndex; $sigmie->index('movies')->update(function (UpdateIndex $update) { $update->properties($newProperties); $update->lowercase();});
Behind the scenes, update():
- Creates a new physical index with a timestamp suffix.
- Reindexes every document into the new index.
- Switches the
moviesalias to point at the new index. - Deletes the old index.
Step 1: Create new indexmovies (alias) ──► movies_20221122210823379774 ├─ Cinderella ├─ Snow White └─ Sleeping Beauty movies_20221222210823379774 (empty) Step 2: Reindexmovies (alias) ──► movies_20221122210823379774 ├─ Cinderella ├─ Snow White └─ Sleeping Beauty movies_20221222210823379774├─ Cinderella├─ Snow White└─ Sleeping Beauty Step 3: Swap aliasmovies_20221122210823379774 (orphaned) movies (alias) ──► movies_20221222210823379774 ├─ Cinderella ├─ Snow White └─ Sleeping Beauty Step 4: Delete old indexmovies (alias) ──► movies_20221222210823379774
To clients, the index name is unchanged. There’s no downtime.
Warning: Index settings are not merged. Anything you don’t re-declare in the
update()callback is dropped. Re-set everything you want to keep.
Inspect an index
$index = $sigmie->index('movies'); $index->mappings; // index mappings$index->mappings->properties(); // property definitions$index->raw; // raw Elasticsearch response$index->analyze('The Matrix'); // run text through the analyzer
Delete an index
$sigmie->index('movies')->delete();
Advanced settings
Apply any Elasticsearch index module setting with config():
$sigmie->newIndex('movies') ->config('index.max_ngram_diff', 3) ->create();