Refs #373 -- Added Model._is_pk_set(). #18450

csirmazbendeguz · 2024-08-05T14:20:31Z

[GSoC-2024]

Trac ticket number

ticket-373

Branch description

Added the Model._is_pk_set() function as suggested by @charettes and @LilyFoote (#18056 (comment)).

It's a simple utility function for checking if the primary key has been set on an object.

It's a pre-requisite for composite primary keys (#18056).

This function will get modified by the composite primary keys patch.

Checklist

This PR targets the main branch.
The commit message is written in past tense, mentions the ticket number, and ends with a period.
I have checked the "Has patch" ticket flag in the Trac system.
I have added or updated relevant tests.
I have added or updated relevant docs, including release notes if applicable.
I have attached screenshots in both light and dark modes for any UI changes.

nessita · 2024-08-05T19:42:02Z

Thank you @csirmazbendeguz for this PR. I have read the referenced comment and while I understand the rationale, I have some concerns about the chosen name (though I acknowledge that this PR follows the proposed name).

When I read this, I thought that is_set was referring to "isinstance(..., set)", i.e. that it answers "is this a set" (instead of the current semantic which is "is this set"). Risking some bikeshedding, I was thinking that perhaps a better name would be:

is_defined
value_set
is_value_set
has_value_set

cc/ @LilyFoote @charettes

csirmazbendeguz · 2024-08-06T05:07:01Z

Thanks @nessita! It's not bikeshedding at all, I submitted this PR separately so it's easier to review and so we can have a discussion about it without worrying about all the other details of composite primary keys.

I like is_value_set. 👍

shangxiao · 2024-08-06T05:52:25Z

When I read this, I thought that is_set was referring to "isinstance(..., set)", i.e. that it answers "is this a set" (instead of the current semantic which is "is this set"). Risking some bikeshedding, I was thinking that perhaps a better name would be:

I haven't been part of the composite thread but when I saw this my first thought was similar: "has the value on this field been set?" … though when I glanced at the is_value_set() I still thought it had the same semantic just named differently 😅

There are 2 hard problems in computer science:

Naming
Caching things
Off-by-1 errors

csirmazbendeguz · 2024-08-06T06:45:40Z

Yes it feels a little awkward.
This function is supposed to be an abstraction for checking if the pk has been set on an object.
So maybe:

has_null?
is_null?

sarahboyce · 2024-08-07T15:50:54Z

We do already have a Field.is_set() so is_set() would be consistent naming 🤷

I think I like is_null as I could see a similar is_empty definition which checks whether a value is in Field.empty_values

django/db/models/base.py

csirmazbendeguz · 2024-08-07T18:43:15Z

Thanks @sarahboyce ! I adjusted it.

If we can't come up with a good name everybody likes then maybe this is not the best abstraction in which case I would be open to alternatives.

The problem I'm trying to solve is:

Since composite primary keys are tuples, checking if pk is None is not enough. If a model has a composite primary key, we must check if the pk tuple contains a null value instead.
Since there are multiple pk null checks, it makes sense to put this in a function. The question is where do we place this function and what do we call it?

I hope that makes sense.

I'm happy to discuss ideas.

LilyFoote

is_null sounds good to me (though I don't have a problem with the other names either).

nessita · 2024-08-08T18:10:08Z

Thank you everyone for engaging in the naming conversation. I agree that is_null is a good option! +1 to that choice.

csirmazbendeguz · 2024-08-09T01:09:18Z

I wonder if this could be mistaken for the getter of Field.null.
Since this function is for checking the pk value, we could include pk or pk_value in the name.
There's already a closely related Field.get_pk_value_on_save function.

Here's a couple of other options:

is_none
is_pk_none
is_null_value
is_pk_null_value
is_pk_value_null
is_pk_value_none
is_pk_value_set
is_pk_value_defined
is_nullval
is_pk_nullval
contains_null
pk_contains_null
pk_value_contains_null
contains_none
pk_contains_none
pk_value_contains_none
pk_value_has_none
pk_value_has_null
has_none
pk_has_none

I think I like is_pk_value_set the most. I also like pk_value_contains_null, pk_value_contains_none.

Also, it doesn't have to be a classmethod, since get_pk_value_on_save is not a classmethod either. It's more flexible.

sarahboyce

Agreed, given the context and what it is used for, also like is_pk_value_set

nessita · 2024-08-09T13:29:07Z

After a more throrough review I have posted this in PR #18056:

I'm honestly confused about why we would put the "responsibility" of knowing about a PK value in a Field and not in the Model instance. More so, the new method as proposed does not use anything from the Field class (and I think it will not either in the CompositePrimaryKey override).
Could we do something like this for ModelBase (hand wavy untested code)?

for main:

    def is_pk_set(self):
        return self.pk is not None

this branch:

    def is_pk_set(self):
        if isinstance(self.pk, CompositePrimaryKey):
            return all(value is not None for value in self.pk)
        return self.pk is not None

csirmazbendeguz · 2024-08-09T13:45:36Z

Yes @nessita , that could work. I agree it looks more readable as a Model function.

LilyFoote · 2024-08-10T10:57:34Z

One theoretical downside of the Model function I just thought of is that third-party fields can't opt-in to custom handling here. That said, I can't think of a third-party field that would want to do that.

csirmazbendeguz · 2024-08-10T13:23:48Z

One theoretical downside of the Model function I just thought of is that third-party fields can't opt-in to custom handling here. That said, I can't think of a third-party field that would want to do that.

Yes, I also brought this up on Discord, this is called the open-closed principle. I agree with @nessita though, in this case the Model function is more intuitive, which is more important than extensibility.

nessita

Thank you @csirmazbendeguz! I have thoroughly reviewed this proposal. I think it looks good but I do have a few questions:

Shall the if self.pk is None: in __hash__ (db/models/base.py:604) be replaced as well?
Same question for these in the same file:

if obj.pk is None: in _prepare_related_fields_for_save
if obj and obj.pk is None: in the same method _prepare_related_fields_for_save
if self.pk is None: in delete
if self.pk is None in prepare_database_save
model_class_pk is not None which comes from model_class_pk = self._get_pk_val(model_class._meta) in _perform_unique_checks
self.pk is not None in _perform_date_checks

I see more matches in other files (when running grep -nR "pk.*None"), I know not all of them are suitable to use the helper but could you please confirm if you have reviewed each of these matches to evaluate replacement on a case by case basis?

Thank you!

nessita · 2024-08-15T16:17:06Z

Yes @collinanderson, that's correct. We could add this to the release notes maybe? @nessita wdyt?

I'm a little torn by this, let me explain where my head is at:

This PR started as a refactor to isolate the logic for checking if a Model's PK is set or not
My understanding is that this helper would be used internally only
Later on, in the Composite PK branch, this helper would be extended to "learn" how to handle composite PKs

The point raised by @collinanderson makes sense in that I can see how third party apps would benefit from using "a blessed method to check if a PK, compound or not, is set". With this information, I see two options:

We leave this work as is, considering _is_pk_set an internal, undocumented method, that is used to facilitate the progression of Composite PKs. Then, once composite PK is ready/landed, we revisit this goal of allowing third party apps to easily and robustly check for set PKs. Pro: we would expose a new method we know it works since it has already a IRL proof of usage with Composite PKs. Con is that this could delay adoption of the feature for third party apps.

Or:

We create a new feature ticket for providing a public and documented method to check if a PK is set and fully flesh the design, API, name, docs, etc in the ticket, for then repurpose this PR to solve that ticket. Pro: API for public consumption is ready faster and earlier. Con: this may be a "slower" path for composite PK feature as a whole.

@LilyFoote @charettes @carltongibson Would you have an opinion on the above?

LilyFoote · 2024-08-15T16:26:33Z

I think we can land this now as an internal API and also open a new ticket about a public API, which can progress independently of the composite PK work.

csirmazbendeguz · 2024-08-15T17:17:32Z

We can rename this _is_pk_set -> is_pk_set, add some docs and that's it right? I can open a new ticket if it's necessary, but I'm not sure if there's much else to flesh out.

I assume it's going to take a while for maintainers to review the composite primary keys PR, so we might as well make this function a public API.

Also don't let this not being merged discourage you from reviewing #18056 , the commits are separated so you can review it separately.

carltongibson · 2024-08-16T05:34:29Z

I'm struggling to see an easy way around the issue here. (Reminder to self: Issue is that I can't keep using obj.pk to test whether the object has been saved.)

Elsewhere I've used obj._state.adding but I don't think documenting that is the way forward.

So, I'd suggest just adding is_pk_set() a public documented property, and telling folks code will need to adopt to that if they want to support models with composite keys. It's Refs #373 ... — surely there has to be some disruption 😅

@collinanderson Would that be OK for you do you think? (I don't want to have missed your concern.)

Having read the thread here I'd say, add the new method as public to begin. I've likely missed the reason for the initial private phase there though. (Please continue if that's so.)

csirmazbendeguz · 2024-08-16T10:47:13Z

@carltongibson , I agree, but note that

Issue is that I can't keep using obj.pk to test whether the object has been saved.

This is close to the truth, but it can also used for other purposes, e.g. to check if default should be used for the pk field.
It's not a matter of obj._state.adding vs obj.is_pk_set(), the two are sometimes used together in checks.

`obj._state.adding`	`obj.is_pk_set()`	`obj.save()`
`True`	`True`	`obj` is created with provided `pk`
`True`	`False`	`obj` is created with default `pk`
`False`	`True`	`obj` is updated
`False`	`False`	`obj` is cloned

collinanderson · 2024-08-16T17:07:43Z

@collinanderson Would that be OK for you do you think? (I don't want to have missed your concern.)

Yeah I'm realizing I don't actually have a super strong or informed opinion, and yes, it helps to keep in mind that obj.pk is None isn't a reliable way to catch new/creating/inserting cases. I suspect a lot of my code that generically checks for obj.pk is None should probably be checking obj._state.adding. I mentioned 3rd party libraries, but I don't actually have one in mind or a specific use case, so there's a chance a public api isn't actually needed.

Maybe just try to make it clear in the Changes to *pk* documentation (again, out of scope for this PR), that for composite fields obj.pk will always be a tuple (even in a (None, None) case) and doesn't ever magically appear as obj.pk == None or a falsey value.

And now that I say that, I'm thinking of another example of where doing some magic with pk could help, like setting obj.pk = None to reliably clear out the pk and use database defaults instead, (often used for copying objects). https://hackernoon.com/the-smart-way-to-clone-django-instance . But even that is unclear how copying an object generically should work in a composite field case, like a many to many table, because you might only want to clear out one of the fields. And again, that's separate from this is_pk_set() case.

I'm trying not to get too involved here but just wanted to encourage backward compatibility when possible, to help composite fields play well with the existing ecosystem. I'm happy with whatever happens, and I don't want to get in the way of progress on Django's oldest open ticket. This is awesome and really exciting. Thank you all.

csirmazbendeguz · 2024-08-16T17:20:45Z

Thanks for this @collinanderson !

It's good that you brought this up, because some backends do depend on this. Here's an example from microsoft/mssql-django.

There is a special case for obj.pk = None (composite.py:30), this is to retain compatibility with regular pks and to avoid changes to deletes. I agree with the sentiment that backward compatibility is good.

To set the defaults, there's get_pk_value_on_save (composite.py:105).

csirmazbendeguz · 2024-08-16T20:13:30Z

pk would need to return None if any of the fields is None. It's not ideal IMHO.

I was against doing this at first, but now that @collinanderson made me think about it, I can't come up with any arguments against it other than "I don't like it".

On one hand:
```
>>> obj_1.pk
(None, None)
>>> obj_2.pk
(123, None)
```
This is more consistent with the idea of composite primary keys being tuples.
On the other hand:
```
>>> obj_1.pk
None
>>> obj_2.pk
None
```
This gives us compatibility with obj.pk is None checks. We can document that for compatibility reasons, obj.pk is always None unless all fields are set.

So I would like to ask everyone to give me your opinion. What do you think? Which one is better?

collinanderson · 2024-08-16T20:20:30Z

Could you ever have a composite-primary-key where one or both of the columns are nullable? In which case my idea of returning None would probably be bad. Too much "magic" can bad.

csirmazbendeguz · 2024-08-17T06:33:58Z

Could you ever have a composite-primary-key where one or both of the columns are nullable?

A primary key column cannot be nullable (composite or not).

LilyFoote · 2024-08-19T20:25:21Z

I think if we do special-case here we should only do so for the case where every field is None, not when some are set.

Beyond that, I can see arguments either way and I'm not sure which is better.

csirmazbendeguz · 2024-08-20T05:33:23Z

Thanks @LilyFoote !

If we returned None only if e.g. obj.pk == (None, None) that wouldn't be enough, we would still need to add the Model.is_pk_set() function.

carltongibson · 2024-08-21T07:07:57Z

So I would like to ask everyone to give me your opinion. What do you think? Which one is better?

I can't say definitively but, I'm leaning to the less magic, more explicit tuple response, even for the (None, None) cases, despite that needing the new API (which probably should be used anyway right...? — If I followed that last twist in the conversation right.)

I can't shake the feeling that if we convert to None just in the one case (or at all TBH) we'll be saving up some horrible need for a workaround much later.

🤷

csirmazbendeguz · 2024-08-21T07:54:24Z

despite that needing the new API (which probably should be used anyway right...? — If I followed that last twist in the conversation right.

What I meant is if we're going to move this is_pk_set logic to the new descriptor (composite.py:24: CompositeAttribute.__get__) we gotta check if any of the fields is None, not only if all the fields are None.

charettes · 2024-08-25T03:55:35Z

Thanks for the ping and sorry for the late answer. I'll try to summarize all the points brought so far to

Managing to preserve pk is None behavior would reduce backward compatibility concerns at the cost of possibly unintuitive magic (if any member of the composition are None)
Should we have a public of private _?is_pk_set method (assuming we can agree on a name) instead

While I think that we should strive to reduce the backward compatibility burden imposed by the introduction of composite primary keys I don't think that focusing on making pk return None in this case is worthy for two reasons.

The first one being that one way or the other third-party app that ought to support composite primary keys will have to be adapted to expect pk to return a tuple under some conditions, in this sense pk is None is just one pattern that will require adjustments.

The second one is that pk is None has always been an implicit way through convention to get an idea of how to treat a Model instance. A pattern, just like checking for _state.adding, that has been replicated as a symptom of a lack of a blessed way to identify the persistence state of a Model instance. In this sense I think that we should see the introduction of composite primary keys as an opportunity to direct the ecosystem towards a blessed way of doing things instead of encouraging this convention further.

I really like the table that @csirmazbendeguz put together in #18450 (comment) as I believe it is the closest to what I would expect a proper public API to look like if we want to commit to one; something that resolves both the pk is None and _state.adding ambiguity at the same time instead of focusing solely on the notion of pk.

The approach I would favor here is to start with a _is_pk_set API merged into main to validate its merit through documentation but mention it is likely to be superseded eventually through a deprecation period. If we're able to come up with a solution that supersedes both pk is None and _state.adding checks in a nice way before its release in a distinct work stream it won't be an issue otherwise we'll get to learn through before committing to a solution.

nessita

Thank you @csirmazbendeguz! This is almost there! 🏆

I added a few comments and I have this "philosophical" concern after close code inspection:

There are so many occurrences of if NOT obj._is_pk_set() (in fact the majority of this diff is about checking "is not set" vs "is set) which makes me wonder if we should also provide:

def _is_pk_unset(self):
    return self.pk is None  # pseudocode

thinking also in the future implementation for composite PKs being (note usage of all vs any):

def _is_pk_unset(self):
    return any(i is None for i in self.pk)  # pseudocode

which would not require to check every element in the whole tuple, just to negate the result in the call site. Does this make sense? What do you think?

django/db/models/base.py

django/db/models/fields/related.py

docs/ref/models/instances.txt

docs/releases/5.2.txt

collinanderson · 2024-08-29T13:52:28Z

I'm also realizing it could be more clear to call this method all_pk_set() to be clear it's not any_pk_set() but kinda bikeshedding at some point.

csirmazbendeguz · 2024-08-29T13:56:20Z

Thanks @nessita !

I addressed your comments.

This is just my opinion but _is_pk_set reads better, I wouldn't add _is_pk_unset.

which would not require to check every element in the whole tuple

My plan is to have the following logic in the composite pk patch (it won't check every element):

def _is_pk_set(self, meta=None):
    pk_val = self._get_pk_val(meta)
    return not (
        pk_val is None
        or (isinstance(pk_val, tuple) and any(f is None for f in pk_val))
    )

I'm also realizing it could be more clear to call this method all_pk_set() to be clear it's not any_pk_set() but kinda bikeshedding at some point.

I personally like _is_pk_set more - the words all or any wouldn't make sense until composite primary keys are merged.

… Model's PK is set.

nessita

Thank you @csirmazbendeguz for this work and your patience bearing with us in the developing of the final solution. 🌟

I have pushed minor edits and I will merge once CI is green. Onward to the composite PK PR!

csirmazbendeguz · 2024-09-10T05:18:49Z

Thanks @nessita ! This is great news. 🎉 I rebased the CompositePrimaryKey patch, it's ready for review. This is the PR that's going to lay the foundation for composite primary keys in Django. I'm not sure if you're gonna have the time to review it though, it's quite a large patch. But any feedback is appreciated if someone here has the time for it.

csirmazbendeguz mentioned this pull request Aug 5, 2024

Fixed #373 -- Added CompositePrimaryKey. #18056

Open

6 tasks

csirmazbendeguz changed the title ~~Refs #373 - Added Field.is_set().~~ Refs #373 - Added Field.is_value_set(). Aug 6, 2024

sarahboyce reviewed Aug 7, 2024

View reviewed changes

django/db/models/base.py Outdated Show resolved Hide resolved

LilyFoote approved these changes Aug 7, 2024

View reviewed changes

csirmazbendeguz changed the title ~~Refs #373 - Added Field.is_value_set().~~ Refs #373 - Added Field.is_null(). Aug 8, 2024

csirmazbendeguz changed the title ~~Refs #373 - Added Field.is_null().~~ Refs #373 - Added Field.is_pk_value_set(). Aug 9, 2024

csirmazbendeguz force-pushed the ticket_373_is_set branch from 8c333a6 to c567fbe Compare August 9, 2024 01:36

sarahboyce force-pushed the ticket_373_is_set branch from c567fbe to 90ddaa7 Compare August 9, 2024 08:30

sarahboyce changed the title ~~Refs #373 - Added Field.is_pk_value_set().~~ Refs #373 -- Added Field.is_pk_value_set(). Aug 9, 2024

sarahboyce approved these changes Aug 9, 2024

View reviewed changes

csirmazbendeguz force-pushed the ticket_373_is_set branch from 90ddaa7 to 57d0086 Compare August 9, 2024 12:55

csirmazbendeguz changed the title ~~Refs #373 -- Added Field.is_pk_value_set().~~ Refs #373 -- Added Model.is_pk_set(). Aug 9, 2024

csirmazbendeguz force-pushed the ticket_373_is_set branch 3 times, most recently from b256dfa to 98ba84b Compare August 9, 2024 16:30

csirmazbendeguz changed the title ~~Refs #373 -- Added Model.is_pk_set().~~ Refs #373 -- Added Model._is_pk_set(). Aug 9, 2024

nessita reviewed Aug 12, 2024

View reviewed changes

csirmazbendeguz force-pushed the ticket_373_is_set branch from f27df26 to 4681952 Compare August 13, 2024 03:59

csirmazbendeguz force-pushed the ticket_373_is_set branch from dd0c3d5 to ddfc01e Compare August 16, 2024 15:40

csirmazbendeguz changed the title ~~Refs #373 -- Added Model._is_pk_set().~~ Refs #373 -- Added Model.is_pk_set(). Aug 16, 2024

csirmazbendeguz force-pushed the ticket_373_is_set branch 3 times, most recently from abca59d to e8a88d5 Compare August 16, 2024 16:20

csirmazbendeguz force-pushed the ticket_373_is_set branch from e8a88d5 to c11d18b Compare August 16, 2024 17:30

nessita reviewed Aug 29, 2024

View reviewed changes

csirmazbendeguz changed the title ~~Refs #373 -- Added Model.is_pk_set().~~ Refs #373 -- Added Model._is_pk_set(). Sep 5, 2024

Refs django#373 -- Added Model._is_pk_set() abstraction to check if a…

02d901f

… Model's PK is set.

nessita force-pushed the ticket_373_is_set branch from 11485c1 to 02d901f Compare September 9, 2024 19:29

nessita approved these changes Sep 9, 2024

View reviewed changes

nessita merged commit 5865ff5 into django:main Sep 9, 2024
36 checks passed

Refs #373 -- Added Model._is_pk_set(). #18450

Refs #373 -- Added Model._is_pk_set(). #18450

Conversation

csirmazbendeguz commented Aug 5, 2024 • edited Loading

Trac ticket number

Branch description

Checklist

nessita commented Aug 5, 2024

csirmazbendeguz commented Aug 6, 2024

shangxiao commented Aug 6, 2024

csirmazbendeguz commented Aug 6, 2024

sarahboyce commented Aug 7, 2024

csirmazbendeguz commented Aug 7, 2024 • edited Loading

LilyFoote left a comment

Choose a reason for hiding this comment

nessita commented Aug 8, 2024

csirmazbendeguz commented Aug 9, 2024 • edited Loading

sarahboyce left a comment

Choose a reason for hiding this comment

nessita commented Aug 9, 2024

csirmazbendeguz commented Aug 9, 2024

LilyFoote commented Aug 10, 2024

csirmazbendeguz commented Aug 10, 2024

nessita left a comment • edited Loading

Choose a reason for hiding this comment

nessita commented Aug 15, 2024

LilyFoote commented Aug 15, 2024

csirmazbendeguz commented Aug 15, 2024 • edited Loading

carltongibson commented Aug 16, 2024

csirmazbendeguz commented Aug 16, 2024 • edited Loading

collinanderson commented Aug 16, 2024 • edited Loading

csirmazbendeguz commented Aug 16, 2024 • edited Loading

csirmazbendeguz commented Aug 16, 2024 • edited Loading

collinanderson commented Aug 16, 2024

csirmazbendeguz commented Aug 17, 2024

LilyFoote commented Aug 19, 2024

csirmazbendeguz commented Aug 20, 2024 • edited Loading

carltongibson commented Aug 21, 2024

csirmazbendeguz commented Aug 21, 2024 • edited Loading

charettes commented Aug 25, 2024 • edited Loading

nessita left a comment

Choose a reason for hiding this comment

collinanderson commented Aug 29, 2024

csirmazbendeguz commented Aug 29, 2024 • edited Loading

nessita left a comment

Choose a reason for hiding this comment

csirmazbendeguz commented Sep 10, 2024 • edited Loading

csirmazbendeguz commented Aug 5, 2024 •

edited

Loading

csirmazbendeguz commented Aug 7, 2024 •

edited

Loading

csirmazbendeguz commented Aug 9, 2024 •

edited

Loading

nessita left a comment •

edited

Loading

csirmazbendeguz commented Aug 15, 2024 •

edited

Loading

csirmazbendeguz commented Aug 16, 2024 •

edited

Loading

collinanderson commented Aug 16, 2024 •

edited

Loading

csirmazbendeguz commented Aug 16, 2024 •

edited

Loading

csirmazbendeguz commented Aug 16, 2024 •

edited

Loading

csirmazbendeguz commented Aug 20, 2024 •

edited

Loading

csirmazbendeguz commented Aug 21, 2024 •

edited

Loading

charettes commented Aug 25, 2024 •

edited

Loading

csirmazbendeguz commented Aug 29, 2024 •

edited

Loading

csirmazbendeguz commented Sep 10, 2024 •

edited

Loading