Refs #373 -- Added Model._is_pk_set(). by csirmazbendeguz · Pull Request #18450 · django/django · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refs #373 -- Added Model._is_pk_set(). #18450

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

csirmazbendeguz
Copy link
Contributor

@csirmazbendeguz csirmazbendeguz commented Aug 5, 2024

[GSoC-2024]

Trac ticket number

ticket-373

Branch description

Added the Model._is_pk_set() function as suggested by @charettes and @LilyFoote (#18056 (comment)).

It's a simple utility function for checking if the primary key has been set on an object.

It's a pre-requisite for composite primary keys (#18056).

This function will get modified by the composite primary keys patch.

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

@nessita
Copy link
Contributor

nessita commented Aug 5, 2024

Thank you @csirmazbendeguz for this PR. I have read the referenced comment and while I understand the rationale, I have some concerns about the chosen name (though I acknowledge that this PR follows the proposed name).

When I read this, I thought that is_set was referring to "isinstance(..., set)", i.e. that it answers "is this a set" (instead of the current semantic which is "is this set"). Risking some bikeshedding, I was thinking that perhaps a better name would be:

  • is_defined
  • value_set
  • is_value_set
  • has_value_set

cc/ @LilyFoote @charettes

@csirmazbendeguz
Copy link
Contributor Author

Thanks @nessita! It's not bikeshedding at all, I submitted this PR separately so it's easier to review and so we can have a discussion about it without worrying about all the other details of composite primary keys.

I like is_value_set. 👍

@csirmazbendeguz csirmazbendeguz changed the title Refs #373 - Added Field.is_set(). Refs #373 - Added Field.is_value_set(). Aug 6, 2024
@shangxiao
Copy link
Contributor

When I read this, I thought that is_set was referring to "isinstance(..., set)", i.e. that it answers "is this a set" (instead of the current semantic which is "is this set"). Risking some bikeshedding, I was thinking that perhaps a better name would be:

I haven't been part of the composite thread but when I saw this my first thought was similar: "has the value on this field been set?" … though when I glanced at the is_value_set() I still thought it had the same semantic just named differently 😅

There are 2 hard problems in computer science:

  1. Naming
  2. Caching things
  3. Off-by-1 errors

@csirmazbendeguz
Copy link
Contributor Author

Yes it feels a little awkward.
This function is supposed to be an abstraction for checking if the pk has been set on an object.
So maybe:

  • has_null?
  • is_null?

@sarahboyce
Copy link
Contributor

We do already have a Field.is_set() so is_set() would be consistent naming 🤷

I think I like is_null as I could see a similar is_empty definition which checks whether a value is in Field.empty_values

django/db/models/base.py Outdated Show resolved Hide resolved
@csirmazbendeguz
Copy link
Contributor Author

csirmazbendeguz commented Aug 7, 2024

Thanks @sarahboyce ! I adjusted it.

If we can't come up with a good name everybody likes then maybe this is not the best abstraction in which case I would be open to alternatives.

The problem I'm trying to solve is:

  1. Since composite primary keys are tuples, checking if pk is None is not enough. If a model has a composite primary key, we must check if the pk tuple contains a null value instead.
  2. Since there are multiple pk null checks, it makes sense to put this in a function. The question is where do we place this function and what do we call it?

I hope that makes sense.

I'm happy to discuss ideas.

Copy link
Contributor

@LilyFoote LilyFoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_null sounds good to me (though I don't have a problem with the other names either).

@csirmazbendeguz csirmazbendeguz changed the title Refs #373 - Added Field.is_value_set(). Refs #373 - Added Field.is_null(). Aug 8, 2024
@nessita
Copy link
Contributor

nessita commented Aug 8, 2024

Thank you everyone for engaging in the naming conversation. I agree that is_null is a good option! +1 to that choice.

@csirmazbendeguz
Copy link
Contributor Author

csirmazbendeguz commented Aug 9, 2024

I wonder if this could be mistaken for the getter of Field.null.
Since this function is for checking the pk value, we could include pk or pk_value in the name.
There's already a closely related Field.get_pk_value_on_save function.

Here's a couple of other options:

  • is_none
  • is_pk_none
  • is_null_value
  • is_pk_null_value
  • is_pk_value_null
  • is_pk_value_none
  • is_pk_value_set
  • is_pk_value_defined
  • is_nullval
  • is_pk_nullval
  • contains_null
  • pk_contains_null
  • pk_value_contains_null
  • contains_none
  • pk_contains_none
  • pk_value_contains_none
  • pk_value_has_none
  • pk_value_has_null
  • has_none
  • pk_has_none

I think I like is_pk_value_set the most. I also like pk_value_contains_null, pk_value_contains_none.

Also, it doesn't have to be a classmethod, since get_pk_value_on_save is not a classmethod either. It's more flexible.

@csirmazbendeguz csirmazbendeguz changed the title Refs #373 - Added Field.is_null(). Refs #373 - Added Field.is_pk_value_set(). Aug 9, 2024
@sarahboyce sarahboyce changed the title Refs #373 - Added Field.is_pk_value_set(). Refs #373 -- Added Field.is_pk_value_set(). Aug 9, 2024
Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, given the context and what it is used for, also like is_pk_value_set

@csirmazbendeguz csirmazbendeguz changed the title Refs #373 -- Added Field.is_pk_value_set(). Refs #373 -- Added Model.is_pk_set(). Aug 9, 2024
@nessita
Copy link
Contributor

nessita commented Aug 9, 2024

After a more throrough review I have posted this in PR #18056:

I'm honestly confused about why we would put the "responsibility" of knowing about a PK value in a Field and not in the Model instance. More so, the new method as proposed does not use anything from the Field class (and I think it will not either in the CompositePrimaryKey override).
Could we do something like this for ModelBase (hand wavy untested code)?

  • for main:
    def is_pk_set(self):
        return self.pk is not None
  • this branch:
    def is_pk_set(self):
        if isinstance(self.pk, CompositePrimaryKey):
            return all(value is not None for value in self.pk)
        return self.pk is not None

@csirmazbendeguz
Copy link
Contributor Author

Yes @nessita , that could work. I agree it looks more readable as a Model function.

@csirmazbendeguz csirmazbendeguz force-pushed the ticket_373_is_set branch 3 times, most recently from b256dfa to 98ba84b Compare August 9, 2024 16:30
@csirmazbendeguz csirmazbendeguz changed the title Refs #373 -- Added Model.is_pk_set(). Refs #373 -- Added Model._is_pk_set(). Aug 9, 2024
@LilyFoote
Copy link
Contributor

One theoretical downside of the Model function I just thought of is that third-party fields can't opt-in to custom handling here. That said, I can't think of a third-party field that would want to do that.

@csirmazbendeguz
Copy link
Contributor Author

One theoretical downside of the Model function I just thought of is that third-party fields can't opt-in to custom handling here. That said, I can't think of a third-party field that would want to do that.

Yes, I also brought this up on Discord, this is called the open-closed principle. I agree with @nessita though, in this case the Model function is more intuitive, which is more important than extensibility.

Copy link
Contributor

@nessita nessita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @csirmazbendeguz! I have thoroughly reviewed this proposal. I think it looks good but I do have a few questions:

  1. Shall the if self.pk is None: in __hash__ (db/models/base.py:604) be replaced as well?
  2. Same question for these in the same file:
  • if obj.pk is None: in _prepare_related_fields_for_save
  • if obj and obj.pk is None: in the same method _prepare_related_fields_for_save
  • if self.pk is None: in delete
  • if self.pk is None in prepare_database_save
  • model_class_pk is not None which comes from model_class_pk = self._get_pk_val(model_class._meta) in _perform_unique_checks
  • self.pk is not None in _perform_date_checks

I see more matches in other files (when running grep -nR "pk.*None"), I know not all of them are suitable to use the helper but could you please confirm if you have reviewed each of these matches to evaluate replacement on a case by case basis?

Thank you!

@nessita
Copy link
Contributor

nessita commented Aug 15, 2024

Yes @collinanderson, that's correct. We could add this to the release notes maybe? @nessita wdyt?

I'm a little torn by this, let me explain where my head is at:

  • This PR started as a refactor to isolate the logic for checking if a Model's PK is set or not
  • My understanding is that this helper would be used internally only
  • Later on, in the Composite PK branch, this helper would be extended to "learn" how to handle composite PKs

The point raised by @collinanderson makes sense in that I can see how third party apps would benefit from using "a blessed method to check if a PK, compound or not, is set". With this information, I see two options:

  • We leave this work as is, considering _is_pk_set an internal, undocumented method, that is used to facilitate the progression of Composite PKs. Then, once composite PK is ready/landed, we revisit this goal of allowing third party apps to easily and robustly check for set PKs. Pro: we would expose a new method we know it works since it has already a IRL proof of usage with Composite PKs. Con is that this could delay adoption of the feature for third party apps.

Or:

  • We create a new feature ticket for providing a public and documented method to check if a PK is set and fully flesh the design, API, name, docs, etc in the ticket, for then repurpose this PR to solve that ticket. Pro: API for public consumption is ready faster and earlier. Con: this may be a "slower" path for composite PK feature as a whole.

@LilyFoote @charettes @carltongibson Would you have an opinion on the above?

@LilyFoote
Copy link
Contributor

I think we can land this now as an internal API and also open a new ticket about a public API, which can progress independently of the composite PK work.

@csirmazbendeguz
Copy link
Contributor Author

csirmazbendeguz commented Aug 15, 2024

We can rename this _is_pk_set -> is_pk_set, add some docs and that's it right? I can open a new ticket if it's necessary, but I'm not sure if there's much else to flesh out.

I assume it's going to take a while for maintainers to review the composite primary keys PR, so we might as well make this function a public API.

Also don't let this not being merged discourage you from reviewing #18056 , the commits are separated so you can review it separately.

@carltongibson
Copy link
Member

I'm struggling to see an easy way around the issue here. (Reminder to self: Issue is that I can't keep using obj.pk to test whether the object has been saved.)

Elsewhere I've used obj._state.adding but I don't think documenting that is the way forward.

So, I'd suggest just adding is_pk_set() a public documented property, and telling folks code will need to adopt to that if they want to support models with composite keys. It's Refs #373 ... — surely there has to be some disruption 😅

@collinanderson Would that be OK for you do you think? (I don't want to have missed your concern.)

Having read the thread here I'd say, add the new method as public to begin. I've likely missed the reason for the initial private phase there though. (Please continue if that's so.)

@csirmazbendeguz
Copy link
Contributor Author

csirmazbendeguz commented Aug 16, 2024

@carltongibson , I agree, but note that

Issue is that I can't keep using obj.pk to test whether the object has been saved.

This is close to the truth, but it can also used for other purposes, e.g. to check if default should be used for the pk field.
It's not a matter of obj._state.adding vs obj.is_pk_set(), the two are sometimes used together in checks.

obj._state.adding obj.is_pk_set() obj.save()
True True obj is created with provided pk
True False obj is created with default pk
False True obj is updated
False False obj is cloned

@csirmazbendeguz csirmazbendeguz changed the title Refs #373 -- Added Model._is_pk_set(). Refs #373 -- Added Model.is_pk_set(). Aug 16, 2024
@csirmazbendeguz csirmazbendeguz force-pushed the ticket_373_is_set branch 3 times, most recently from abca59d to e8a88d5 Compare August 16, 2024 16:20
@collinanderson
Copy link
Contributor

collinanderson commented Aug 16, 2024

@collinanderson Would that be OK for you do you think? (I don't want to have missed your concern.)

Yeah I'm realizing I don't actually have a super strong or informed opinion, and yes, it helps to keep in mind that obj.pk is None isn't a reliable way to catch new/creating/inserting cases. I suspect a lot of my code that generically checks for obj.pk is None should probably be checking obj._state.adding. I mentioned 3rd party libraries, but I don't actually have one in mind or a specific use case, so there's a chance a public api isn't actually needed.

Maybe just try to make it clear in the Changes to *pk* documentation (again, out of scope for this PR), that for composite fields obj.pk will always be a tuple (even in a (None, None) case) and doesn't ever magically appear as obj.pk == None or a falsey value.

And now that I say that, I'm thinking of another example of where doing some magic with pk could help, like setting obj.pk = None to reliably clear out the pk and use database defaults instead, (often used for copying objects). https://hackernoon.com/the-smart-way-to-clone-django-instance . But even that is unclear how copying an object generically should work in a composite field case, like a many to many table, because you might only want to clear out one of the fields. And again, that's separate from this is_pk_set() case.

I'm trying not to get too involved here but just wanted to encourage backward compatibility when possible, to help composite fields play well with the existing ecosystem. I'm happy with whatever happens, and I don't want to get in the way of progress on Django's oldest open ticket. This is awesome and really exciting. Thank you all.

@csirmazbendeguz
Copy link
Contributor Author

csirmazbendeguz commented Aug 16, 2024

Thanks for this @collinanderson !

It's good that you brought this up, because some backends do depend on this. Here's an example from microsoft/mssql-django.

There is a special case for obj.pk = None (composite.py:30), this is to retain compatibility with regular pks and to avoid changes to deletes. I agree with the sentiment that backward compatibility is good.

To set the defaults, there's get_pk_value_on_save (composite.py:105).

@csirmazbendeguz
Copy link
Contributor Author

csirmazbendeguz commented Aug 16, 2024

pk would need to return None if any of the fields is None. It's not ideal IMHO.

I was against doing this at first, but now that @collinanderson made me think about it, I can't come up with any arguments against it other than "I don't like it".

  • On one hand:

    >>> obj_1.pk
    (None, None)
    >>> obj_2.pk
    (123, None)
    

    This is more consistent with the idea of composite primary keys being tuples.

  • On the other hand:

    >>> obj_1.pk
    None
    >>> obj_2.pk
    None
    

    This gives us compatibility with obj.pk is None checks. We can document that for compatibility reasons, obj.pk is always None unless all fields are set.

So I would like to ask everyone to give me your opinion. What do you think? Which one is better?

@collinanderson
Copy link
Contributor

Could you ever have a composite-primary-key where one or both of the columns are nullable? In which case my idea of returning None would probably be bad. Too much "magic" can bad.

@csirmazbendeguz
Copy link
Contributor Author

Could you ever have a composite-primary-key where one or both of the columns are nullable?

A primary key column cannot be nullable (composite or not).

@LilyFoote
Copy link
Contributor

I think if we do special-case here we should only do so for the case where every field is None, not when some are set.

Beyond that, I can see arguments either way and I'm not sure which is better.

@csirmazbendeguz
Copy link
Contributor Author

csirmazbendeguz commented Aug 20, 2024

Thanks @LilyFoote !

If we returned None only if e.g. obj.pk == (None, None) that wouldn't be enough, we would still need to add the Model.is_pk_set() function.

@carltongibson
Copy link
Member

So I would like to ask everyone to give me your opinion. What do you think? Which one is better?

I can't say definitively but, I'm leaning to the less magic, more explicit tuple response, even for the (None, None) cases, despite that needing the new API (which probably should be used anyway right...? — If I followed that last twist in the conversation right.)

I can't shake the feeling that if we convert to None just in the one case (or at all TBH) we'll be saving up some horrible need for a workaround much later.

🤷

@csirmazbendeguz
Copy link
Contributor Author

csirmazbendeguz commented Aug 21, 2024

despite that needing the new API (which probably should be used anyway right...? — If I followed that last twist in the conversation right.

What I meant is if we're going to move this is_pk_set logic to the new descriptor (composite.py:24: CompositeAttribute.__get__) we gotta check if any of the fields is None, not only if all the fields are None.

@charettes
Copy link
Member

charettes commented Aug 25, 2024

Thanks for the ping and sorry for the late answer. I'll try to summarize all the points brought so far to

  1. Managing to preserve pk is None behavior would reduce backward compatibility concerns at the cost of possibly unintuitive magic (if any member of the composition are None)
  2. Should we have a public of private _?is_pk_set method (assuming we can agree on a name) instead

While I think that we should strive to reduce the backward compatibility burden imposed by the introduction of composite primary keys I don't think that focusing on making pk return None in this case is worthy for two reasons.

The first one being that one way or the other third-party app that ought to support composite primary keys will have to be adapted to expect pk to return a tuple under some conditions, in this sense pk is None is just one pattern that will require adjustments.

The second one is that pk is None has always been an implicit way through convention to get an idea of how to treat a Model instance. A pattern, just like checking for _state.adding, that has been replicated as a symptom of a lack of a blessed way to identify the persistence state of a Model instance. In this sense I think that we should see the introduction of composite primary keys as an opportunity to direct the ecosystem towards a blessed way of doing things instead of encouraging this convention further.

I really like the table that @csirmazbendeguz put together in #18450 (comment) as I believe it is the closest to what I would expect a proper public API to look like if we want to commit to one; something that resolves both the pk is None and _state.adding ambiguity at the same time instead of focusing solely on the notion of pk.

The approach I would favor here is to start with a _is_pk_set API merged into main to validate its merit through documentation but mention it is likely to be superseded eventually through a deprecation period. If we're able to come up with a solution that supersedes both pk is None and _state.adding checks in a nice way before its release in a distinct work stream it won't be an issue otherwise we'll get to learn through before committing to a solution.

Copy link
Contributor

@nessita nessita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @csirmazbendeguz! This is almost there! 🏆

I added a few comments and I have this "philosophical" concern after close code inspection:

There are so many occurrences of if NOT obj._is_pk_set() (in fact the majority of this diff is about checking "is not set" vs "is set) which makes me wonder if we should also provide:

def _is_pk_unset(self):
    return self.pk is None  # pseudocode

thinking also in the future implementation for composite PKs being (note usage of all vs any):

def _is_pk_unset(self):
    return any(i is None for i in self.pk)  # pseudocode

which would not require to check every element in the whole tuple, just to negate the result in the call site. Does this make sense? What do you think?

django/db/models/base.py Outdated Show resolved Hide resolved
django/db/models/fields/related.py Outdated Show resolved Hide resolved
docs/ref/models/instances.txt Outdated Show resolved Hide resolved
docs/ref/models/instances.txt Outdated Show resolved Hide resolved
docs/releases/5.2.txt Outdated Show resolved Hide resolved
docs/releases/5.2.txt Outdated Show resolved Hide resolved
@collinanderson
Copy link
Contributor

I'm also realizing it could be more clear to call this method all_pk_set() to be clear it's not any_pk_set() but kinda bikeshedding at some point.

@csirmazbendeguz
Copy link
Contributor Author

csirmazbendeguz commented Aug 29, 2024

Thanks @nessita !

I addressed your comments.

This is just my opinion but _is_pk_set reads better, I wouldn't add _is_pk_unset.

which would not require to check every element in the whole tuple

My plan is to have the following logic in the composite pk patch (it won't check every element):

def _is_pk_set(self, meta=None):
    pk_val = self._get_pk_val(meta)
    return not (
        pk_val is None
        or (isinstance(pk_val, tuple) and any(f is None for f in pk_val))
    )

I'm also realizing it could be more clear to call this method all_pk_set() to be clear it's not any_pk_set() but kinda bikeshedding at some point.

I personally like _is_pk_set more - the words all or any wouldn't make sense until composite primary keys are merged.

@csirmazbendeguz csirmazbendeguz changed the title Refs #373 -- Added Model.is_pk_set(). Refs #373 -- Added Model._is_pk_set(). Sep 5, 2024
Copy link
Contributor

@nessita nessita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @csirmazbendeguz for this work and your patience bearing with us in the developing of the final solution. 🌟

I have pushed minor edits and I will merge once CI is green. Onward to the composite PK PR!

@nessita nessita merged commit 5865ff5 into django:main Sep 9, 2024
36 checks passed
@csirmazbendeguz
Copy link
Contributor Author

csirmazbendeguz commented Sep 10, 2024

Thanks @nessita ! This is great news. 🎉 I rebased the CompositePrimaryKey patch, it's ready for review. This is the PR that's going to lay the foundation for composite primary keys in Django. I'm not sure if you're gonna have the time to review it though, it's quite a large patch. But any feedback is appreciated if someone here has the time for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants