How to upgrade my sharded environment to Liferay 7.x?

Two approaches with the same goal

Hi Liferay Community,

Before responding this question I would like to explain what's sharding first: to overcome the horizontal scalability concerns of open source databases at the time (circa 2008), Liferay implemented physical partitioning support.  The solution allowed administrators to configure portal instances to be stored in different database instances and database server processes.

This feature was originally named "sharding" although "data partitioning" is more accurate since it requires a small amount of information sharing to coordinate partitions.

Thus, beginning in 7.0, Liferay removed its own physical partitioning implementation in favor of the capabilities provided natively by database vendors. Please, notice that logical partitioning via the "portal instance" concept (logical set of data grouped by the companyId column with data security at portal level) is not affected by this change and it's available in current Liferay versions.

Having explained this, the answer to this question is simple, just the follow the official procedure to do it:
https://dev.liferay.com/discover/deployment/-/knowledge_base/7-0/upgrading-sharded-environment

So Liferay 7.x provides a process which will convert all shards in independent database schemas after the upgrade. This can be suitable for thoses cases where you need to keep information separated for legal reasons. However if you can not afford to maintain one complete environment for every of those independent databases you could try another approach: disable staging by merging all shards into just one database schema before performing the upgrade to Liferay 7.x.

The option of merging all shard schemas into the default one is feasible because sharding generates unique ids per every row among all databases. These are the steps you should follow to achieve this:

  1. Create a backup for the shard database schemas in the production environment.
  2. Copy the content of every table in the non default shards into the default shard. It's recommended to create an SQL script to automate this process.
  3. If a unique index is violated, analyze the data for the two records which cause the issue and remove one of them since it's not necessary anymore (different reasons could cause the creation of data in the incorrect shard in the past such as wrong configuration, a bug, issues with custom developments, etc.)
  4. Resume this process from the last point of failure.
  5. Repeat 3 and 4 until the default shard database contains all data from the other shards.
  6. Clean up the Shard table except for the default shard record.
  7. Startup a Liferay server using this database without the sharding portal.properties:
    1. Remove all database connections except for the default one.
    2. Comment the line META-INF/shard-data-source-spring.xml in the spring.configs property.
  8. Ensure that everything works well and you can access to the different portal instances. 

It is recommended that you keep record of the changes made in the step 3 and 6 since you will need to repeat this process once you decide to go live after merging all databases in the default shard. It is also advisable to do this as a separate project before performing the upgrade to Liferay 7.x. Once you have completed this process you will just need to execute the upgrade as a regular non-shared environment:
https://dev.liferay.com/en/discover/deployment/-/knowledge_base/7-1/upgrading-to-liferay-71

This alternative to upgrade sharded environments is not officially supported but it has been executed succesfully in a couple of installations. For that reason, if you have any question regarding it please write a comment in the this blog entry or open a new thread in the community forums, other members of the community and I will try to assist you during this process.

Blogs

Hello Alberto,

I tested out your step-by-step-guide for merging the sharded databases and almost got to the finish line. I did hit a speed bump at the ResourcePermisson-table though... When comparing the tables from my two shards I see that roughly the first 200 entries have the same resourcePermissonId, but different data in all the other columns. So it isn't that clear which ones I can delete without removing important data.

Do you have any clue to how to think when deciding which ones that needs to go? How important are these entries to the Liferay portal?

 

Thank you for any input in this matter!

Regards,

Andreas

Hi Andreas,

 

Apologize about the delay, I didn't subscribe myself to the comments and i've just realized about this message. In the case you have mentioned I would insert those 200 entries in the resourcePermission table with a different resourcePermissonId. For that you can create an script which reads thos records and generate the new records so that you use the Counter table to get the new ids.

 

I hope it helps, maybe is late for you :-( but at least for others.

 

Best regards.