Faster User ID reports processing: use ranking query by diosmosis · Pull Request #14551 · matomo-org/matomo · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster User ID reports processing: use ranking query #14551

Merged
merged 6 commits into from
Jun 26, 2019
Merged

Conversation

diosmosis
Copy link
Member

No description provided.

@diosmosis diosmosis added the Needs Review PRs that need a code review label Jun 21, 2019
);

if ($query === false) {
return;
}

$rowsCount = 0;
while ($row = $query->fetch()) {
foreach ($query as $row) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't know that worked 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither did I :)

@tsteur
Copy link
Member

tsteur commented Jun 21, 2019

haven't tested it but looks good.
A test is still failing: https://travis-ci.org/matomo-org/matomo/jobs/548604098#L801

This test seems unrelated https://travis-ci.org/matomo-org/matomo/jobs/548604100#L925

@mattab mattab added this to the 3.10.0 milestone Jun 22, 2019
@mattab
Copy link
Member

mattab commented Jun 22, 2019

Be great if we can merge this soon and release rc2 👍

@mattab mattab added the c: Performance For when we could improve the performance / speed of Matomo. label Jun 22, 2019
$configLimit = $configGeneral['archiving_ranking_query_row_limit'];
$limit = $configLimit == 0 ? 0 : max(
$configLimit,
$configGeneral['datatable_archiving_maximum_rows_standard']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • let's introduce a different setting eg datatable_archiving_maximum_rows_userid
  • set this to a higher value like 5000 rows (i expect users will want to archive more rows in this report.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattab changed & build fixed. Had to add another order by to make the results deterministic.

<nb_visits_converted>0</nb_visits_converted>

</row>
<row>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be useful if the tests could set a lower limit like 10, and show the behavior works (show the row Others)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// deal w/ ranking query summary row
$rankingQuerySummaryRow = $dataTable->getRowFromLabel(DataTable::LABEL_SUMMARY_ROW);
if ($rankingQuerySummaryRow) {
$rankingQuerySummaryRowId = $dataTable->getRowIdFromLabel(DataTable::LABEL_SUMMARY_ROW);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why this code is needed, and do we do this kind of logic in other archivers too (which ones)? if not, why is it needed in this archiver?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ranking query creates a row w/ label -1, this is added to the DataArray which is just an array, not a datatable so it has no concept of a summary row. When DataArray::asDataTable() is called, it adds the row as a regular row and not as a summary row.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, wondering why is this logic not needed for other plugins?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required, Actions does it here: https://github.com/matomo-org/matomo/blob/3.x-dev/plugins/Actions/ArchivingHelper.php#L383

It's not used in Events/Contents, but that's likely a bug that wasn't dealt with since it's not tested in BlobReportLimitingTest. I'll add it to my list to fix that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diosmosis do you mind creating an issue for this bug in case it later slips off the list? unless you will open the PR this week or so. Thanks!

@mattab mattab merged commit b107e19 into 3.x-dev Jun 26, 2019
@mattab mattab deleted the user-id-max-rows branch June 26, 2019 02:43
@mattab mattab changed the title Use ranking query on userid reports. Faster User ID reports processing: use ranking query Jun 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants