An exercise to help build the right mental model for Python data.

The “Solution” link visualizes execution and reveals what’s actually happening using 𝗺𝗲𝗺𝗼𝗿𝘆_𝗴𝗿𝗮𝗽𝗵: https://github.com/bterwijn/memory_graph

  • JATothrim_v2@programming.dev
    link
    fedilink
    arrow-up
    6
    ·
    19 hours ago

    For immutable types it is the same though.

    The most twisted thing I learned is that all ints below a fixed limit share the same id() result, so

    >>> x = 1
    >>> id(x)
    135993914337648
    >>> y = 1
    >>> id(y)
    135993914337648
    

    But then suddenly:

    >>> x = 1000000
    >>> id(x)
    135993893250992
    >>> y = 1000000
    >>> id(y)
    135993893251056
    

    Using id() as a key in dict() may get you into trouble.

    • AckPhttt@beehaw.org
      link
      fedilink
      arrow-up
      1
      ·
      9 hours ago

      Using id() as a key in dict() may get you into trouble.

      IMO, using id() as a key would never be a good idea under any circumstance.

      Two different (and even unequal) objects can have the same id():

      >>> x = [1]
      >>> id(x)
      4527263424
      >>> del x
      >>> x = [2]
      >>> id(x)
      4527263424
      >>> del x
      >>> y = [3]
      >>> id(y)
      4527263424
      

      Note - a dictionary lookup already looks up the key by id() first as a shortcut (under-the-hood), so there’s no need to try doing this as an optimization.

      Edit: in case it wasn’t clear above, the object with the same id()s don’t all exist at the same time; but if you store their ids as a key, you’d have to ensure the object lifetimes are identical to be sure the ids could identify the same stored value. The dictionary does this for you when you use the key object, but it’s not automatic when using the id of the key.

      Other Note - Since you phrased it as “all ints below a fixed limit share the same id() result”, I’d suggest a better way to semantically think of it is that certain constant objects are pre-allocated, and thus are kinda like singletons. There is usually only one int(1) object, and the language keeps a pre-allocated pool of these common small ints (since they are used so often for indexing anyway).

      Similarly, many short string constants are ‘interned’ in a similar way; they aren’t pre-created at startup, but once they are created by the user declaring a string constant when the code is run, it saves memory to check and only keep one copy of those string objects, as the string constants can be checked at byte-compile time. But if you construct the string with code, it doesn’t bother to check all the strings to see if there exists an identical one. So for example:

      >>> x = 'ab'
      >>> y = 'ab'
      >>> id(x) == id(y)
      True
      >>> s = 'a'
      >>> s = s + 'b'
      >>> id(s) == id(x)
      False
      >>> s == x == y
      True
      

      But you can force it to make this check; it’s something they made more tedious to do in Python3 since it’s really an implementation detail:

      >>> import sys
      >>> s = sys.intern(s)
      >>> id(s) == id(x)
      True
      

      Sorry for the verbose reply; hope it helped.