Closed
Description
Refs DEV-2528
We have a user report concerning slow performance for the deleteLogDataForDeletedSites scheduled task when more than one measurable has been deleted.
One of the queries that took really long to execute was the following:
DELETE log_form_field FROM matomo_log_form_field log_form_field LEFT JOIN matomo_log_form log_form ON log_form_field.idlogform = log_form.idlogform LEFT JOIN matomo_log_visit log_visit ON log_form.idvisit = log_visit.idvisit WHERE log_visit.idsite in (15,11)
This ran for over an hour and a half before they decided to kill it and instead run the following query for each ID site individually:
MariaDB [matomo]> DELETE log_form_field FROM matomo_log_form_field log_form_field LEFT JOIN matomo_log_form log_form ON log_form_field.idlogform = log_form.idlogform LEFT JOIN matomo_log_visit log_visit ON log_form.idvisit = log_visit.idvisit WHERE log_visit.idsite = 15;
Query OK, 0 rows affected (4.264 sec)
MariaDB [matomo]> DELETE log_form_field FROM matomo_log_form_field log_form_field LEFT JOIN matomo_log_form log_form ON log_form_field.idlogform = log_form.idlogform LEFT JOIN matomo_log_visit log_visit ON log_form.idvisit = log_visit.idvisit WHERE log_visit.idsite = 11;
Query OK, 0 rows affected (0.010 sec)
As you can see the above queries for each individual site executed very quickly.
They tried again after this to manually execute the query that was previously running for both ID sites, after 10 minutes they killed that manual query again because it was clearly a lot slower than just doing it individually for each ID site.
Activity
bx80 commentedon Jan 17, 2022
Looking at the query plan differences between the two statements, the
IN (15,11)
clause is causing the use of a join buffer. Since this is an infrequently executed task, a simple solution would be to adjust thedeleteLogDataForDeletedSites
task to execute individual delete statements for each site.