Magento 2 Indexing Architecture issue

Yegor Shytikov
3 min readMay 15, 2021

--

Indexing is how Magento transforms data such as products and categories, to improve the performance of your storefront. As data changes, the transformed data must be updated or reindexed. Magento has a very obfuscated legacy architecture that stores lots of merchant data (including catalog data, prices, users, and stores) in many database tables. To obfuscate and break Magento completely Adobe introduces Indexes. Magento accumulates final data into special tables using indexers.

For example, if you change the price of an item from $5.99 to $7.99. Magento must reindex the price change to display it on your storefront.

Without indexing, Magento would have to calculate the price of every product on the fly, taking into account shopping cart price rules, bundle pricing, discounts, tier pricing, etc.

Index Trigger Events

Reindexing triggers:

Product Prices:

  • Add customer group
  • Change configuration settings

Flat catalog product data:

  • Add store
  • Add store group
  • Add, edit, or delete attribute (for searching and filtering)

Flat catalog category data:

  • Add store
  • Add store group
  • Add, edit, or delete attribute (for searching and filtering)

Catalog category/product index:

  • Add, edit, or delete products (single, mass, and import)

Change product-to-category relations:

  • Add, edit, or delete categories
  • Add or delete stores
  • Delete store groups
  • Delete websites

Catalog search index:

  • Add, edit, or delete products (single, mass, and import)
  • Add or delete stores
  • Delete store groups
  • Delete websites

Stock status index:

  • Change inventory configuration settings.

Category permissions index:

  • Add store, add store group, add or delete or update attribute (for searching and filtering)

An indexer can run in either Update on Save or Update on Schedule mode. The Update on Save mode immediately indexes whenever your catalog or other data changes. This mode assumes a low intensity of update and browsing operations in your store. It can lead to significant delays and data unavailability during high loads. Magento recommends using Update on Schedule mode in production, because it stores information about data updates and performs indexation by portions in the background through a specific cron job. You can change the mode of each Magento indexer separately on the System > Tools > Index Management configuration page.

Magento crone design issues

We incurred a lot of operational overhead with Cron. We have near two hundred production jobs that are run thousands of times a day, at different frequencies (e.g., minutely, hourly, weekly). Job failure is common. The on call person had to manually restart failed jobs several times a day, sometimes after midnight

We had little visibility for production jobs during runtime. There was not easy way to know what jobs were running or whether they succeeded.

Cron tasks are fine for what they’re good for: Relatively cheap single tasks that run at a specific time or times. Platform.sh has supported custom cron jobs since the stone age. (That’s about 3 years in Internet time.) They have a number of limitations, though:

Cron tasks on Platform.sh run on the same container as your application instance, which means they’re competing for resources.

A Cron task succeeds or fails entirely, making it a poor fit for a task that can be worked on incrementally.

We limit cron jobs to running no more often than every 5 minutes, which means a task that needs to be done “now, but not in the web request” may happen as long as 5 minutes later.

A running cron task blocks a new code deploy

If a cron task is still running when it’s next triggered they may step on each other’s toes and confuse the application state unless you’re very very careful about how it’s written.

One method to improve cron on a Magento is by using the node-cron microservice. This library uses the crontab syntax, which may be familiar to users with previous experience with using cron in Unix-like operating systems. NodeJS has much better performance and doesn’t create bottlenecks.

--

--

Yegor Shytikov
Yegor Shytikov

Written by Yegor Shytikov

True Stories about Magento 2. Melting down metal server infrastructure into cloud solutions.

No responses yet