Documents
Index documents into Elasticsearch with Sigmie collections — add, merge, update, iterate with each() and lazy(), and stream large result sets.
On this page
A Document is a JSON object stored in an index. Sigmie treats every index as a writable collection: you add Document instances, iterate over them, query them, and update them.
Create documents
use Sigmie\Document\Document; $doc = new Document(['title' => 'The Matrix', 'year' => 1999]);
Documents can hold any JSON-serializable structure:
$doc = new Document([ 'title' => 'Inception', 'director' => [ 'name' => 'Christopher Nolan', 'born' => 1970, ], 'cast' => [ ['name' => 'Leonardo DiCaprio', 'role' => 'Cobb'], ['name' => 'Marion Cotillard', 'role' => 'Mal'], ], 'metadata' => [ 'runtime' => 148, 'budget' => 160_000_000, ],]);
Custom document IDs
Pass an ID as the second argument:
$doc = new Document(['title' => 'The Matrix'], 'matrix_1999');
Custom IDs let you re-index a document later by writing the same ID — Elasticsearch overwrites it in place.
Get a collection
$movies = $sigmie->collect('movies');
For tests, refresh: true makes documents immediately searchable:
$movies = $sigmie->collect('movies', refresh: true);
Warning: Don’t use
refresh: truein production. It forces a costly refresh on every write.
Add documents
A single document:
$movies->add(new Document(['title' => 'Mickey Mouse']));
Many documents (much faster than calling add() in a loop):
$movies->merge([ new Document(['title' => 'Snow White']), new Document(['title' => 'Cinderella']), new Document(['title' => 'Sleeping Beauty']),]);
Validate with properties
Pass properties to the collection and Sigmie validates each document against the schema before indexing:
use Sigmie\Mappings\NewProperties; $props = new NewProperties;$props->title('title');$props->date('release_date');$props->number('rating')->float(); $movies = $sigmie->collect('movies')->properties($props); $movies->merge([ new Document([ 'title' => 'The Matrix', 'release_date' => '1999-03-31T00:00:00Z', 'rating' => 8.7, ]),]);
Invalid data (a non-numeric rating, an unparseable release_date) is caught at indexing time.
Update a document
To update, write the same ID:
$movies->add(new Document([ 'title' => 'The Matrix', 'year' => 1999, 'rating' => 8.7,], 'matrix_1'));
Elasticsearch indexes the new version under the same _id, replacing the previous one.
Iterate over a collection
each() streams every document without loading the index into memory. Sigmie pages through results internally using a Point-in-Time and search_after, so writes during iteration don’t corrupt the cursor.
$movies->each(function (Document $doc): void { echo $doc['title'] . "\n";});
The default page size is 500. Override it with chunk():
$movies->chunk(100)->each(function (Document $doc): void { processOne($doc);});
For iteration over a subset (filtered, sorted), use NewSearch::each() or NewSearch::lazy() instead. See Iterating over all matching hits.
Other collection methods
$movies->count(); // total document count$movies->has('matrix_1'); // does this ID exist$movies->get('matrix_1'); // fetch one by ID$movies->getMany(['matrix_1', 'inception_1']); // fetch many by ID$movies->random(5); // 5 random documents$movies->remove('matrix_1'); // delete by ID$movies->clear(); // delete every document$movies->toArray(); // load all into memory (small indices only)
Document implements ArrayAccess:
$doc['title'] = 'New Title';$title = $doc['title'];isset($doc['year']);unset($doc['description']);
Complex data types
Dates
$props->date('created_at'); new Document([ 'title' => 'New Movie', 'created_at' => '2023-04-07T12:38:29.000000Z',]);
Geo points
$props->geoPoint('location'); new Document([ 'venue' => 'Cinema Downtown', 'location' => ['lat' => 40.7128, 'lon' => -74.0060],]);
Nested arrays of objects
$props->nested('cast', function (NewProperties $props) { $props->name('actor'); $props->keyword('role');}); new Document([ 'title' => 'Avengers', 'cast' => [ ['actor' => 'Robert Downey Jr.', 'role' => 'Iron Man'], ['actor' => 'Chris Evans', 'role' => 'Captain America'], ],]);
Nested fields preserve the relationship between sibling values during search — see Filter Parser for nested filtering syntax.
When are writes visible to search
By default Elasticsearch operates in “near real-time” — writes become searchable about a second later:
$movies = $sigmie->collect('movies');$movies->add(new Document(['title' => 'Snow White']));$movies->count(); // 0 (immediately)sleep(1);$movies->count(); // 1
refresh: true makes them visible immediately:
$movies = $sigmie->collect('movies', refresh: true);$movies->add(new Document(['title' => 'Snow White']));$movies->count(); // 1
For batch processing, use the default and refresh once when you’re done:
$movies = $sigmie->collect('movies');$movies->merge($manyDocuments);$sigmie->index('movies')->refresh();