Here's a fuller interactive session that will help me explain what's going on (Python 2.6 on Windows XP 32-bit, but it doesn't matter really):
>>> import sys
>>> sys.getsizeof([])
36
>>> sys.getsizeof([1])
40
>>> lst = []
>>> lst.append(1)
>>> sys.getsizeof(lst)
52
>>>
Note that the empty list is a bit smaller than the one with [1]
in it. When an element is appended, however, it grows much larger.
The reason for this is the implementation details in Objects/listobject.c
, in the source of CPython.
Empty list
When an empty list []
is created, no space for elements is allocated - this can be seen in PyList_New
. 36 bytes is the amount of space required for the list data structure itself on a 32-bit machine.
List with one element
When a list with a single element [1]
is created, space for one element is allocated in addition to the memory required by the list data structure itself. Again, this can be found in PyList_New
. Given size
as argument, it computes:
nbytes = size * sizeof(PyObject *);
And then has:
if (size <= 0)
op->ob_item = NULL;
else {
op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);
if (op->ob_item == NULL) {
Py_DECREF(op);
return PyErr_NoMemory();
}
memset(op->ob_item, 0, nbytes);
}
Py_SIZE(op) = size;
op->allocated = size;
So we see that with size = 1
, space for one pointer is allocated. 4 bytes (on my 32-bit box).
Appending to an empty list
When calling append
on an empty list, here's what happens:
PyList_Append
calls app1
app1
asks for the list's size (and gets 0 as an answer)
app1
then calls list_resize
with size+1
(1 in our case)
list_resize
has an interesting allocation strategy, summarized in this comment from its source.
Here it is:
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
/* check for integer overflow */
if (new_allocated > PY_SIZE_MAX - newsize) {
PyErr_NoMemory();
return -1;
} else {
new_allocated += newsize;
}
Let's do some math
Let's see how the numbers I quoted in the session in the beginning of my article are reached.
So 36 bytes is the size required by the list data structure itself on 32-bit. With a single element, space is allocated for one pointer, so that's 4 extra bytes - total 40 bytes. OK so far.
When app1
is called on an empty list, it calls list_resize
with size=1
. According to the over-allocation algorithm of list_resize
, the next largest available size after 1 is 4, so place for 4 pointers will be allocated. 4 * 4 = 16 bytes, and 36 + 16 = 52.
Indeed, everything makes sense :-)