The Inpsyde Elasticsearch Plugin 2/2

In my first blog post yesterday I showed some internal processes and “how to create a concept at Inpsyde”. Today we’re going to dig a little bit deeper into concepts and some code examples to show how we’ve implemented everything.

In his yesterday’s post Inpsyder Christian described how he as main developer and his team started with the “Inpsyde Elasticsearch plugin” project. In the tenth Advent Calendar post he goes on a deeper level. He shows the concepts behind our Inpsyde Elasticsearch Plugin and gives some code examples to show how the team implemented everything.


Table of Contents

1. A Name
2. Define and Build Modules
2.1. App module
2.2. Client Module
2.3. Debug Module
2.4. Index-Mapping-Property Module
2.5. Document Module
2.6. CLI Module
2.7. Queue Module
3. Current State and Future


1. A Name

I think this was kind of the funniest part of the whole conception and work on the plugin. There were a ton of funny names in our brainstorming session, but in the end we stopped at:

ElasticWP

2. Define and Build Modules

After having a name for our baby, we started to write down some conceptional parts to define all requirements and how solve them. To reduce the amount of text, I’ll just give you a short insight about the end result and the decisions were made during the process.  Moreover, I show how we implemented everything.

The whole section contains multiple sub-parts:

  1. Define required modules
  2. Conceptional work on those modules
  3. Review and discussions
  4. Finalize concept
  5. Define the MVP (minimum viable product)
  6. Implement it

The implementation itself was done via rapid prototyping, by creating a working proof of concept which contained the following:

  1. Allow to register Modules via Providers
  2. Configure a Client
  3. Creating an API to create an Index via configuration
  4. Transform data via Documents from WP_Post, WP_Comment, WP_User & WP_Term to Elasticsearch
  5. Add unit tests
  6. Provide a local setup via docker-compose

I created the prototype on a weekend and it was in a really early state. But it allowed us to work continiously on it by iterating a few days with reviews and rewrites of some modules until all core modules were done.

Let’s dive into the results:

2.1. App Module

The app module is the main part of the plugin. It provides a PSR-11 container implementation with a provider interface which allows to register classes or configuration to the container. Moreover, it has a BootableProvider interface which allows modules to actually listen to WordPress hooks.

The main Plugin-Container:

<?php declare(strict_types=1); # -*- coding: utf-8 -*-
namespace ElasticWP;

use ElasticWP\App\BootableProvider;
use ElasticWP\App\Provider;
use Psr\Container\ContainerInterface;

final class ElasticWP implements ContainerInterface
{
public function set(string $id, $value): self { /*snip */ }

public function register(Provider $provider) { /*snip */ }

public function boot(): bool { /*snip */ }

public function get($id) { /*snip */ }

public function has($id) { /*snip */ }
}

The Provider:

<?php declare(strict_types=1); # -*- coding: utf-8 -*-

namespace ElasticWP\App;

use ElasticWP\ElasticWP;

interface Provider
{

   public function register(ElasticWP $plugin);
}

The BootableProvider:

<?php declare(strict_types=1); # -*- coding: utf-8 -*-

namespace ElasticWP\App;

use ElasticWP\ElasticWP;

interface BootableProvider extends Provider
{

   public function boot(ElasticWP $plugin);
}

As you can see, the BootableProvider extends the provider. This means you have register something before it can be booted.

The registration of providers or specific configurations and classes to the container are possible via a bootstrap-hook. It looks like following:

<?php declare(strict_types=1); # -*- coding: utf-8 -*-

use ElasticWP\ElasticWP;

add_action(
   'ElasticWP.boot',
   function (ElasticWP $plugin) {
        // $plugin->set(string $key, mixed $value);
        // $value = $plugin->get(string $key);
        // $plugin->register(Provider $provider);
   }
);

2.2. Client Module

The client module provides a way to configure and create an Elasticsearch\Client from the “elasticsearch-php”-package.

We decided to provide an ElasticWP\Client\ClientConfigurationBuilder which reads your configuration automatically from either a defined constant or an environment variable.

This way you can configure your client connection globally to Elasticsearch. Additionally, we planned to ensure that invalid configuration should fail. Moreover, we set some default settings as well – such as the used Logger shipped by the plugin. The minimum requirement to create a client instance is to provide at least 1 host.

Here’s an example for your wp-config.php:

<?php # -*- coding: utf-8 -*-

$config = ['hosts' => ['localhost']];
$config = base64_encode(serialize($config));

// v1 - via constant
define('ELASTICWP_CLIENT_CONFIG', $config);

// v2 - via env var
putenv('ELASTICWP_CLIENT_CONFIG=' . $config);

Secondly, it should be also possible to create a client manually and register it to the app module.

<?php declare(strict_types=1); # -*- coding: utf-8 -*-

use ElasticWP\ElasticWP;
use Elasticsearch\Client;


add_action(
   'ElasticWP.boot',
   function (ElasticWP $plugin) {


      $plugin->set(
         Client::class,
         function (ElasticWP $plugin): Client {
             // return an instance of the Client
         }
      );

   }
);

Read more about the client in official documentation: https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/_configuration.html

2.3. Debug-Module

The Debug-Module provides an implementation of PSR-3 LoggerInterface. The module itself contains a single class which uses internally a custom action:

do_action( "ElasticWP.{errorLevel}", string $message, array $context );

Providing such an implementation and using WordPress internals, we’re free to create logs as we want.

[!] ProTip: We’re using Inpsyde\Wonolog to listen to those actions and push data to a Logging-Service.

2.4. Index-Mapping-Property Module

Now we continue with a big module. These are three modules highly depending on each other. Therefore, and also to avoid jumping between sections, I’ll focus on the whole concept of creating an index with mapping and properties.

But before we start …

[!] Important to know: An index created in Elasticsearch 6.x only allows a single-type per index. Any name can be used for the type, but there can be only one. The preferred type name is _doc, so that index APIs have the same path as they will have in 7.0: PUT {index}/_doc/{id} and POST {index}/_doc.

We decided to remove the complete “type”-definition in our index. Instead, we always use “_doc” to ensure compatibility with future releases.

The right way to configure a complete index is to provide all required information in one array which creates an index via ElasticWP\Index\IndexBuilder and is registered to the ElasticWP\Index\IndexRegistry.

Here’s a short example index schema with comments:

<?php declare(strict_types=1); # -*- coding: utf-8 -*-

use ElasticWP\Configuration\IndexConfiguration;
use ElasticWP\Mapping\Property\PropertyInterface;

$indexSchema = [
   'index' => 'name of your index',    // string - unique name
   'settings' => [],                   // array - optional
   'mappings' => [
       '_meta' => [
           'dataSource' => IndexConfiguration::DATA_SOURCE_POST,
           'objectTypes' => [],        // array - optional
           'version' => '1.0.0',       // string
       ],
       'properties' => [],             // PropertyInterface[]
   ]
];

The format and structure of this array is similar to the Elasticsearch index. But let’s have a look at all fields step by step.

index

The name of the actual index in Elasticsearch.

settings

Here you can define your own settings for the current index.

See also: https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_settings.html

mappings

Since only one type per index is allowed, we’ve completely removed the “type”  from our index schema and set it to “_doc” by default.

mappings._meta

The _meta in Elasticsearch is an optional array. They can be filled with additional information which are not used by Elasticsearch. We decided to use this array as configuration for automatic hooking into the right actions in WordPress to provide data to the index.

Following fields are required:

1. dataSource

The dataSource is a required field and defined by one of the available ElasticWP\Configuration\IndexConfiguration::DATA_SOURCE_*-constants.

  • IndexConfiguration::DATA_SOURCE_POST – PostType as entry point
  • IndexConfiguration::DATA_SOURCE_TERM – Taxonomy Terms as entry point
  • IndexConfiguration::DATA_SOURCE_COMMENT – Comments as entry point
  • IndexConfiguration::DATA_SOURCE_USER< – User as entry point
2. objectTypes

This configuration restricts the dataSource even further – e.G. if you just want from dataSource=IndexConfiguration::DATA_SOURCE_POST only PostType=”page”, then this is the point where you’re restricting it.

3. version

The third field version is used to detect changes in the index and update it’s mapping.  It’s up to you how you’re defining the versions of your index. But keep in mind that we’re using version_compare() via “greater than” to detect changes.

mappings.properties

Properties are the main part of your mapping. Since we’re processing data based on the defined dataSource and objectTypes, we need to transform those data into the right format to push it to Elasticsearch.

Therefore we cannot use the multidimensional array as it is used in Elasticsearch. Instead we planned to provide an own interface ElasticWP\Mapping\Property\PropertyInterface:

<?php declare(strict_types=1); # -*- coding: utf-8 -*-

namespace ElasticWP\Mapping\Property;

interface PropertyInterface
{
  // Used to sort processors before executing them.
  public function priority(): int;

  // Processing data to the given Document.
  public function transform(DocumentInterface $document): DocumentInterface;

  // Contains the array of property definition.
  public function definition(): array;
}

Read more about properties in the official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/properties.html

Here’s a short example how a PostAuthorProperty with email, login, name and id looks like:

<?php declare(strict_types=1); # -*- coding: utf-8 -*-

use ElasticWP\Document\DocumentInterface;
use \ElasticWP\Mapping\Property\PropertyInterface;

class PostAuthorProperty implements PropertyInterface
{

   public function priority(): int
   {
       return 1;
   }

   public function definition(): array
   {
       return [
           'author' => [
               'type' => 'object',
               'properties' => [
                   'email' => [
                       'type' => 'keyword',
                   ],
                   'login' => [
                       'type' => 'keyword',
                   ],
                   'name' => [
                       'type' => 'keyword',
                   ],
                   'id' => [
                       'type' => 'long',
                   ],
               ],
           ],
       ];
   }

   public function transform(DocumentInterface $document): DocumentInterface
   {
       $userId = $document->object()->post_author;
       $user = get_userdata($userId);
      
       $document->set(
           'author',
           [
               'email' => $user->user_email,
               'login' => $user->user_login,
               'name' => $user->display_name,
               'id' => $userId,
           ]
       );

       return $document;
   }
}

To actually build the index and register it to the plugin you have to register the index in your plugin:

<?php declare(strict_types=1); # -*- coding: utf-8 -*-

use ElasticWP\ElasticWP;
use ElasticWP\Index\IndexBuilder;
use ElasticWP\Index\IndexRegistry;

add_action(
   'ElasticWP.boot',
   function (ElasticWP $plugin) {
        // @var ElasticWP\Index\IndexBuilder $indexBuilder
        $indexBuilder = $plugin->get(IndexBuilder::class);

        // @var ElasticWP\Index\IndexInterface $index
        $index = $indexBuilder->fromArray($indexSchema);

        // @var ElasticWP\Index\IndexRegistry $indexRegistry
        $indexRegistry = $plugin->get(IndexRegistry::class);

        // Register the Index to the Plugin.
        $indexRegistry->register($index);
   }
);

That’s it. Our ElasticWP Plugin now has a new index, which will be created automatically in Elasticsearch and listens to the right hooks in WordPress to transform the data from WordPress into your schema to push it to Elasticsearch.

2.5. Document Module

The document module is the main part which is responsible to generate from a given dataSource (e.G. “WP_Post”) to a possible restricted objectType (e.G. PostType=”page”) for a given index a document and either creates/updates or deletes the document based on the current action.

The ElasticWP\Document\DocumentInterface looks like following:

<?php declare(strict_types=1); # -*- coding: utf-8 -*-

namespace ElasticWP\Document;

interface DocumentInterface
{

   // Unique ID which represents the Document in Index
   public function id(): string;

   // Contains the type of dataSource like \WP_Post|Comment|Term|User
   public function dataSource(): string;

   // Contains the type of object like the CPT, CommentType or Taxonomy.
   public function objectType(): string;

   // Returns the complete entity which is present to build the data.
   public function object();

   // Returns the ID of the object from WordPress.
   public function objectId(): int;

   // The current_blog_id where the Document belongs to.
   public function blogId(): int;

   // Array of all data which is set to Document and used to insert/update.
   public function body(): array;

   // Returns true, if the current Document has a valid object, otherwise false.
   public function isValid(): bool;

   public function set(string $key, $value);
   public function get(string $key);}
   public function remove(string $key): bool;
}

In background, after listening to the right hook, our ElasticWP\Document\DocumentSyncInterface will use an ElasticWP\Document\DocumentDataGenerator to loop over all defined ElasticWP\Mapping\Property\PropertyInterface to build a document which is either created, updated or deleted in Elasticsearch.

Since we’re using a replaceable interface in our container based on the ElasticWP\Document\DocumentSyncInterface, it’s easy to replace the implementation completely, when saving/updating/deleting a document on your own.

2.6. CLI Module

We basically support the Core API delivered by “elasticsearch-php”-package by parsing the Client, IndicesNamespace and ClusterNamespace doc blocks:

NAME

  wp elasticwp

SYNOPSIS

  wp elasticwp <command>

SUBCOMMANDS

  client       
  cluster      
  indices      
  reindex      Reindex all Documents to a given Index.

Additionally, we’ve implemented a custom WP-CLI-command “reindex” which allows us to bulk rebuild the complete index:

NAME

  wp elasticwp reindex

DESCRIPTION

  Reindex all Documents to a given Index.

SYNOPSIS

  wp elasticwp reindex <indexName>

OPTIONS

  <indexName>
    The name of the Index

EXAMPLES

    wp elasticwp reindex <indexName>

2.7. Queue Module

The main problem with existing plugins is when you have to deal with a ton of data and hundreds of editors which are working in parallel. We need to ensure 100%, that when clicking “save post” or “delete post”, that actually this will be synchronized with Elasticsearch.

By default we’re listening to the specific hooks for update/delete and trying to communicate with Elasticsearch. But we’re also supporting a complete Message-Queue-implementation in terms of “Elasticsearch is not reachable”, which pushes the current document into a WordPress cron by default and re runs until the push was successful. One can easily replace this implementation via Message, Handler, Producer and Consumer by using for example RabbitMQ to reduce the complete load in WordPress to a minimum.

Here’s a short example how to provide an own implementation to work asynchronous and send your queue to e.G. RabbitMQ:

<?php declare(strict_types=1); # -*- coding: utf-8 -*-

use ElasticWP\ElasticWP;
use ElasticWP\Queue\ProducerInterface;
use ElasticWP\Document\AsyncDocumentSync;
use ElasticWP\Document\DocumentSyncInterface;

add_action(
   'ElasticWP.boot',
   function (ElasticWP $plugin) {
        // Set the AsyncDocumentSync - default is synchronous
        $plugin->set(
            DocumentSyncInterface::class,
            function(ElasticWP $plugin): DocumentSyncInterface
            {
                return $plugin->get(AsyncDocumentSync::class);
            }
        );
        
        // Queue - Send to RabbitMQ
        $plugin->set(
            ProducerInterface::class,
            function(): ProducerInterface
            {
            
                return new YourAmqpProducer( ... );
            }
        );

   }
);

3. Current State and Future

In short: The plugin works.

We’re using this plugin already since a few months for customers projects with custom search integrations in a very stable, reliable and performant way.

Also, we created some custom properties which are often reused in different indices and provided some performance improvements in the past weeks.

Creating an index for an objectType can be done under one minute. And with complete Multisite-support and WP-CLI integration it is possible to reindex all documents within a wink.

Currently the plugin is only available internally via private repository to us. The documentation is complete and we have a pretty good test coverage. We’re currently planning some quiet nice features on top of this awesome plugin. So, if you want some more information, leave a comment or contact us. We’ll get in touch with you. 🙂

At the end … some stats for geeks:

Time invested : 160 working hours (total)

Commits: 153 (total)

Lines of code written: 21.663 (total)

Lines of code deleted: 9.834 (total)

Unit Tests:

Inpsyde Elasticsearch Plugin Unit Tests

Integration Tests:

Inpsyde Elasticsearch Plugin Integration Tests

5 comments

  1. George Mamadashvili

    Hello,

    Thanks for great article.

    Is there a plan to release this plugin in future? or is it for internal usage only?

    1. Sebastian Pajor

      Hi George,

      thanks for reading and the interest on our elastic plugin. We currently testing it on client projects. This was the initial reason why we did it.

      But as we currently see, the interest is rising to have a “public version”. We definitely have that in mind.

      The current roadmap foresees that we will expand this solution to have a much bigger benefit if you aren’t a dev and still wanna use elastic.

      So to sum it up:

      Yes, we currently think of that option but it needs work to have something for the public what is usable for a “non-hardcore-dev” MAYBE late 2019.

      Cheers,
      Sebastian

      PS
      If you don’t wanna miss announcements like this please check out this page regularly or https://twitter.com/inpsyde

  2. Bilal

    This post and part one are amazing! Would be great if you guys released this public.

  3. Leho Kraav @lkraav

    Thanks for writing this up.

    ElasticPress 3.0 released in May 2019 seems to have made a strong move by providing Indexables concept, which allows you to now index almost anything(?) in WP.

    What is your take on this advancement? Is ElasticPress maybe again better than your own plugin, or?

    It’d be really useful if all these genius brainpowers worked on one superproduct, instead of fragmenting know-how and attention. What do you think?

    1. Christian Leucht

      Howdy!

      Thanks a lot for your response! 🙂

      Yes, ElasticPress 3.0 has made a big move forward. But the Plugin is still doing “magic auto-replacing” of WordPress queries. With the new version of ElasticPress, you can create an Index for Posts and for Users. But they are created automatically, bascially it’s more or less a 1:1 mapping of the SQL database tables in Elasticsearch. Also Taxonomy-Terms and Comments are missing. Not really flexible, but a interesting step.

      Our approach is in first place to do “less”, but provide “more” possibilities when you’re using it. We’re not creating automatically something and we’re not automatically replacing queries. Elasticsearch is for searching, not as “an additional cache-layer”.

      It’d be really useful if all these genius brainpowers worked on one superproduct, instead of fragmenting know-how and attention. What do you think?

      Yes and no. There we’ll never be a “super product” out there which covers everything. I would even say, that comparing ElasticPress with our ElasticWP is not possible or even right.
      Our focus is on building strong, fast and stable searches for WordPress with Elasticsearch, while ElasticPress wants in general more control over the normal SQL queries made from WP by providing – in my opinion – a new layer of “caching” which has as well the sideeffect, that the normal WordPress search will be more accurate.

Leave a reply

Your email address will not be published. Required fields are marked *