Improve performance in deleteLogDataForDeletedSites when deleting data for more than one site · Issue #18618 · matomo-org/matomo · GitHub
Skip to content

Improve performance in deleteLogDataForDeletedSites when deleting data for more than one site #18618

Closed
@Starker3

Description

@Starker3

Refs DEV-2528

We have a user report concerning slow performance for the deleteLogDataForDeletedSites scheduled task when more than one measurable has been deleted.
One of the queries that took really long to execute was the following:

DELETE log_form_field FROM matomo_log_form_field log_form_field LEFT JOIN matomo_log_form log_form ON log_form_field.idlogform = log_form.idlogform LEFT JOIN matomo_log_visit log_visit ON log_form.idvisit = log_visit.idvisit WHERE log_visit.idsite in (15,11)

This ran for over an hour and a half before they decided to kill it and instead run the following query for each ID site individually:

MariaDB [matomo]> DELETE log_form_field FROM matomo_log_form_field log_form_field LEFT JOIN matomo_log_form log_form ON log_form_field.idlogform = log_form.idlogform LEFT JOIN matomo_log_visit log_visit ON log_form.idvisit = log_visit.idvisit WHERE log_visit.idsite = 15;
Query OK, 0 rows affected (4.264 sec)

MariaDB [matomo]> DELETE log_form_field FROM matomo_log_form_field log_form_field LEFT JOIN matomo_log_form log_form ON log_form_field.idlogform = log_form.idlogform LEFT JOIN matomo_log_visit log_visit ON log_form.idvisit = log_visit.idvisit WHERE log_visit.idsite = 11;
Query OK, 0 rows affected (0.010 sec)

As you can see the above queries for each individual site executed very quickly.

They tried again after this to manually execute the query that was previously running for both ID sites, after 10 minutes they killed that manual query again because it was clearly a lot slower than just doing it individually for each ID site.

Activity

added
Potential BugSomething that might be a bug, but needs validation and confirmation it can be reproduced.
on Jan 14, 2022
bx80

bx80 commented on Jan 17, 2022

@bx80
Contributor

Looking at the query plan differences between the two statements, the IN (15,11) clause is causing the use of a join buffer. Since this is an infrequently executed task, a simple solution would be to adjust the deleteLogDataForDeletedSites task to execute individual delete statements for each site.

added
BugFor errors / faults / flaws / inconsistencies etc.
and removed
Potential BugSomething that might be a bug, but needs validation and confirmation it can be reproduced.
on Jan 17, 2022
added
c: PerformanceFor when we could improve the performance / speed of Matomo.
and removed
BugFor errors / faults / flaws / inconsistencies etc.
on Jan 17, 2022
added this to the 4.8.0 milestone on Jan 17, 2022
modified the milestones: 4.8.0, 4.9.0 on Feb 9, 2022
changed the title deleteLogDataForDeletedSites slow when deleting data for more than one site Improve performance in deleteLogDataForDeletedSites when deleting data for more than one site on Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    c: PerformanceFor when we could improve the performance / speed of Matomo.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Improve performance in deleteLogDataForDeletedSites when deleting data for more than one site · Issue #18618 · matomo-org/matomo