Tweak `MetricsUpdateTask` performance #1481

nscuro · 2022-03-19T23:14:37Z

I was profiling the MetricsUpdateTask class due to some folks reporting performance concerns and managed to identify a few screws we could tighten to make the task more resource efficient.

Profiling

Setup

DT 4.4.2 running on Temurin 11 on macOS.
Postgres database running in Docker, using the postgres:14.2-alpine image.
50 projects, all using the DT API Server 4.4.2 BOM. 7850 components in total.
MetricsUpdateTask running in a JerseyTest without any resources deployed. The task is running pretty much isolated.
Because no other tasks are running, no data is changing, thus the metrics update task won't perform any INSERTs, just UPDATEs.
Using Async Profiler in IntelliJ.

Results

The current implementation is performing a lot of allocations in the updateComponentMetrics method. Profiling results showed that the biggest chunk is allocated in AbstractAlpineQueryManager.persist. More specifically, in DataNucleus' preCommit method, where DN maintains its L1 cache:

This is also where a lot of CPU time is spent:

The DN docs say WRT the L1 cache:

Each PersistenceManager maintains a cache of the objects that it has encountered (or have been "enlisted") during its lifetime. [...]
Objects are placed in the L1 cache (and updated there) during the course of the transaction. This provides rapid access to the objects in use in the users application and is used to guarantee that there is only one object with a particular identity at any one time for that PersistenceManager. When the PersistenceManager is closed the L1 cache is cleared.

Because DT currently uses a single PersistenceManager for the entire portfolio metrics update run, the L1 cache keeps on growing, and DN has to do a lot of work in order to maintain it. Performance deteriorates with every project being processed. So far, this matches what @hostalp was reporting in #1210 (comment).

Another factor here is what AbstractAlpineQueryManager.persist does with objects passed to it:

Start a transaction
Make the object persistent (it may or may not be transient until now)
Commit the transaction (this detaches the object again)
Refresh the detached object, making it persistent again

Especially for larger objects, that's quite a bit of work being performed. In cases like metrics updates, we only want to INSERT or UPDATE something, but we don't care about the object after that, we don't need it to be persistent.

There are cases in the current logic where persist is called even if it isn't necessary, causing some additional overhead. One of these cases is when updating the lastInheritedRiskScore field of Component, when the metrics themselves didn't change (see here). Semantically, this is (almost?) always a no-op. But in the background, the persistence layer is producing garbage.

When working on objects that already are persistent we could achieve the desired effect using something like this as well:

qm.getPersistenceManager().currentTransaction().begin();
component.setLastInheritedRiskScore(last.getInheritedRiskScore());
qm.getPersistenceManager().currentTransaction();

That safes us the unnecessary calls to makePersistent and refresh.

Tweaks

I didn't want to change too much here, so I only introduced two changes:

Every updateProjectMetrics method gets its own QueryManager. This ensures that resources like the L1 cache are regularly cleared. We do want to take advantage of DN's caching to some degree, so using a different QueryManager for each updateComponentMetrics method is not reasonable. According to DN's docs, creating PersistenceManager instances is a cheap operation though.
lastInheritedRiskScore fields of projects and components are only updated when they actually changed. In the majority of cases, this won't be the case and we're saving quite a bit of computing power.

Results

Allocations are reduced by more than 50%:

CPU time spent is drastically reduced as well:

With the original code, the task ran for 3m30s. With the changes outlined above, it finished just under 2m30s!

There are probably a few more places throughout the codebase where similar minor changes could be made to achieve such results.

…update run This ensures that the L1 and L2 cache of the DN persistence manager are regularly cleared and other related resources are released. Profiling indicates that this cuts the amount of object allocations roughly in half. CPU time spent on cache maintenance is also drastically reduced. Signed-off-by: nscuro <nscuro@protonmail.com>

…actually changed. Especially for components, this is the most common scenario. By not performing this operation when nothing changed, we're saving unnecessary transactions and persist->refresh cycles. Signed-off-by: nscuro <nscuro@protonmail.com>

nscuro · 2022-03-19T23:25:14Z

Will hopefully also have an impact on #1370.

nscuro added 2 commits March 19, 2022 21:24

stevespringett merged commit 7c483b5 into DependencyTrack:master Mar 31, 2022

nscuro mentioned this pull request Apr 29, 2022

Decouple metrics update into standalone service #1210

Open

github-actions bot locked as resolved and limited conversation to collaborators Apr 30, 2022

nscuro deleted the tweak-metrics-update-performance branch May 8, 2022 13:28

nscuro added this to the 4.5 milestone May 10, 2022

nscuro added the enhancement New feature or request label May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweak `MetricsUpdateTask` performance #1481

Tweak `MetricsUpdateTask` performance #1481

nscuro commented Mar 19, 2022

nscuro commented Mar 19, 2022

Tweak MetricsUpdateTask performance #1481

Tweak MetricsUpdateTask performance #1481

Conversation

nscuro commented Mar 19, 2022

Profiling

Setup

Results

Tweaks

Results

nscuro commented Mar 19, 2022

Tweak `MetricsUpdateTask` performance #1481

Tweak `MetricsUpdateTask` performance #1481