Python 源码剖析 - PyListObject 的创建与维护

时间：12-14来源：作者：点击数：

2.1 创建

Python 中只提供了唯一一种创建 PyListObject 对象的方法— PyList_New：

[listobject.c] 
PyObject* PyList_New(int size)
{
    PyListObject *op;
    size_t nbytes;
 
    nbytes = size * sizeof(PyObject *);
    /* Check for overflow */
    if (nbytes / sizeof(PyObject *) != (size_t)size)
        return PyErr_NoMemory();
 
    // 为 PyListObject 申请空间
    if (num_free_lists) {
        // 使用缓冲池
        num_free_lists--;
        op = free_lists[num_free_lists];
        _Py_NewReference((PyObject *)op);
    } else {
        // 缓冲池中没有可用的对象，创建对象
        op = PyObject_GC_New(PyListObject, &PyList_Type);
    }
    // 为PyListObject对象中维护的元素列表申请空间
    if (size <= 0)
        op->ob_item = NULL;
    else {
        op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);
        memset(op->ob_item, 0, nbytes);
    }
    op->ob_size = size;
    op->allocated = size;
    _PyObject_GC_TRACK(op);
    return (PyObject *) op;
}

非常清晰的创建过程，首先进行一些例行检查；然后为 PyListObject 申请内存空间，创建 PyListObject 对象。再成功创建 PyListObject 对象之后，就需要为这个对象申请存储PyObject*的内存空间，内存空间的大小由传入的参数确定，传入的参数决定了新创建的 PyListObject 可以容纳多少个 PyObject*。最后将 PyListObject 对象的存储区域清空，并将 ob_size 和allocated都设置为size，为内存管理做好准备。

我们看到，在 list 的实现中，同样用到了缓冲池的技术，创建PyListObject的时候会首先检查free_lists中是否还有没有使用的PyListObject，如果有就直接使用，只有在 free_lists中的 PyListObject 全部用完之后才会通过malloc在堆上创建新的PyListObjec。free_lists的默认大小为80，对于一般的小程序而言，基本上只会使用到 PyListObject 缓冲池。

/* Empty list reuse scheme to save calls to malloc and free */ 
#define MAXFREELISTS 80 
static PyListObject *free_lists[MAXFREELISTS]; 
static int num_free_lists = 0;

有一点非常奇特的是，在free_lists中缓存的只是PyListObject*，那么这个缓冲池里的PyListObject究竟是在什么地方被创建的呢？

列位看官，花开两朵，各表一只。我们先把这个问题放一放，看一看，在Python开始运行后，当第一个PyListObject对象被创建时的情形。嗯，这有点像上帝创世纪，挺有趣的，不是吗？

在第一个PyListObject创建的时候，我们看到这时num_free_lists是0，所以会调用PyObject_GC_New在堆上创建一个新的PyListObject对象，假设我们创建的PyListObject是包含6个元素的PyListObject。

一个什么东西都没有的 List 当然是很无趣的，我们来尝试向里边添加一点东西，把一个整数对象100放到第3个位置上去：

[listobject.c] 
int PyList_SetItem(register PyObject *op, register int i, register PyObject   *newitem)
{
    register PyObject *olditem;
    register PyObject **p;
    if (!PyList_Check(op)) {
        ……
    }
    if (i < 0 || i >= ((PyListObject *)op) -> ob_size) {
        Py_XDECREF(newitem);
        PyErr_SetString(PyExc_IndexError,
                "list assignment index out of range");
        return -1;
    }
    p = ((PyListObject *)op) -> ob_item + i;
    olditem = *p;
    *p = newitem;
    Py_XDECREF(olditem);
    return 0;
}

先是进行类型检查，然后进行索引的有效性检查，当一切都OK后，将待加入的PyObject指针放到指定的位置，然后将这个位置原来存放的对象的引用计数减1。这里的olditem很可能会是NULL，比如向一个新创建的PyListObject对象加入元素，就会碰到这样的情况，所以这里必须使用Py_XDECREF。

好了，现在我们的PyListObject对象再不是当年那个一穷二白的可怜虫了：

2.2 添加

接下来我们再试着向这个PyListObject中插入一个元素，好吧，就在100之前插入99，99确实是在100之前的，这个地球人都知道

[listobject.c] 
int PyList_Insert(PyObject *op, int where, PyObject *newitem)
{
    ......//类型检查
    return ins1((PyListObject *)op, where, newitem);
}
static int ins1(PyListObject *self, int where, PyObject *v)
{
    int i, n = self->ob_size;
    PyObject **items;
    ......
    if (list_resize(self, n+1) == -1)
        return -1;
    if (where < 0) {
        where += n;
        if (where < 0)
            where = 0;
    }
    if (where > n)
        where = n;
    items = self->ob_item;
    for (i = n; --i >= where; )
        items[i+1] = items[i];
    Py_INCREF(v);
    items[where] = v;
    return 0;
}

首先当然是检查指针的有效性，然后是检查当前PyListObject中已经有多少个元素了，如果这个元素个数已经达到了INT_MAX，那么很遗憾，再不能插入任何元素了。

在通过了检查之后，我们看到，调用了一个list_resize函数，从函数名我们可以想象，这个函数改变了PyListObject所维护的PyObject*列表的大小：

[listobject.c]
static int list_resize(PyListObject *self, int newsize)
{
    PyObject **items;
    size_t new_allocated;
    int allocated = self->allocated;
    /* Bypass realloc() when a previous overallocation is large enough
       to accommodate the newsize.  If the newsize falls lower than half
       the allocated size, then proceed with the realloc() to shrink the list.
    */
    if (allocated >= newsize && newsize >= (allocated >> 1)) {
        assert(self->ob_item != NULL || newsize == 0);
        self->ob_size = newsize;
        return 0;
    }
    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
     */
    new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6) + newsize;
    if (newsize == 0)
        new_allocated = 0;
    items = self->ob_item;
    if (new_allocated <= ((~(size_t)0) / sizeof(PyObject *)))
        PyMem_RESIZE(items, PyObject *, new_allocated);
    else
        items = NULL;
    if (items == NULL) {
        PyErr_NoMemory();
        return -1;
    }
    self->ob_item = items;
    self->ob_size = newsize;
    self->allocated = new_allocated;
    return 0;
}

插入的时候，Python 分四种情况处理：

newsize > ob_size && newsize < allocated ：简单调整 ob_size 值。
newsize < ob_size && newsize > allocated/2 ：简单调整 ob_size 值。
newsize < ob_size && newsize < allocated/2 ：调用 realloc，重新分配空间。
newsize > ob_size && newsize > allocated ：调用 realloc，重新分配空间。

可以看出，Python 对内存可谓是殚精竭虑了，在某种newsize < ob_size的情况下还会重新申请内存空间。

将 PyListObject 的空间调整之后，接着就应该搬动元素了，才好挪出一个位置，插入我们想要插入的元素。在Python中，list支持一个很有趣的特性，就是负值索引，比如一个 n 个元素的 list，lst[n]，那么 lst[-1] 就是 lst[n-1]。这一点在插入元素时得到了体现。

static int ins1(PyListObject *self, int where, PyObject *v)
{
    ......
    if (where < 0) {
        where += n;
        if (where < 0)
            where = 0;
    }
    if (where > n)
        where = n;
    items = self->ob_item;
    for (i = n; --i >= where; )
        items[i+1] = items[i];
    Py_INCREF(v);
    items[where] = v;
    return 0;
}

可以看到，不管你插入在什么位置，对于Python来说，都是合法的，它会自己调整插入的位置。在确定了插入的位置之后，会将插入点之后的所有元素向下挪动一个位置，这样，在插入点就能空出一个位置来，于是大功告成，我们想插入的元素有了容身之地了。

熟悉C++的朋友一定看出来了，这种处理插入的方法实际上与C++中的vector完全一致。实际上，Python中的PyListObject与C++中的vector非常相似，相反，它和C++中的list却是大相径庭的。

值得注意的是，通过与vector类似的内存管理机制，现在，PyListObject的allocated已经变成10了，而ob_size却只有7。

Python 中，list 有一个被广泛使用的操作append。这个操作与上面所描述的插入操作非常类似：

[listobject.c]
int PyList_Append(PyObject *op, PyObject *newitem)
{
    if (PyList_Check(op) && (newitem != NULL))
        return app1((PyListObject *)op, newitem);
    PyErr_BadInternalCall();
    return -1;
}
static PyObject* listappend(PyListObject *self, PyObject *v)
{
    if (app1(self, v) == 0)
        Py_RETURN_NONE;
    return NULL;
}
static int app1(PyListObject *self, PyObject *v)
{
    int n = PyList_GET_SIZE(self);
......
    if (list_resize(self, n+1) == -1)
        return -1;
    Py_INCREF(v);
    PyList_SET_ITEM(self, n, v);
    return 0;
}

只是需要注意的是，在进行 append 动作的时候，添加的元素是添加在第ob_size个位置上的，而不是第allocated个位置上。下图展示了append元素101之后的PyListObject对象：

在 app1 中调用 list_resize 时，由于 newsize（8）在7和10之间，所以不需要再分配内存空间。直接将101放置在第8个位置上即可。

2.3 删除

除了创建，插入这些操作，一个容器至少还应该有删除操作，PyListObject自然也不会例外。图5展示了一个使用PyListObject中删除元素功能的例子：

当在 Python 交互环境中输入 l.remove(3) 时，PyListObject 中的 listremove 操作会被激活：

[listobject.c] 
static PyObject * listremove(PyListObject *self, PyObject *v)
{
    int i;
    for (i = 0; i < self->ob_size; i++) {
        int cmp = PyObject_RichCompareBool(self->ob_item[i], v, Py_EQ);
        if (cmp > 0) {
            if (list_ass_slice(self, i, i+1,(PyObject *)NULL) == 0)
                Py_RETURN_NONE;
            return NULL;
        }
        else if (cmp < 0)
            return NULL;
    }
    PyErr_SetString(PyExc_ValueError, "list.remove(x): x not in list");
    return NULL;
}

在遍历 PyListObject 中所有元素的过程中，将待删除的元素与PyListObject中的每个元素一一进行比较，比较操作是通过PyObject_RichCompareBool完成的。如果发现了匹配，则调用 list_ass_slice 进行删除操作：

[listobject.c]
static int list_ass_slice(PyListObject *a, int ilow, int ihigh, PyObject *v)
{
   PyObject *recycle_on_stack[8];
    PyObject **recycle = recycle_on_stack; /* will allocate more if needed */ 
    PyObject **item;
    int n; /* # of elements in replacement list */
    int norig; /* # of elements in list getting replaced */
    int d; /* Change in size */
    #define b ((PyListObject *)v)
    if (v == NULL)
        n = 0;
    else {
    ……
    }
    norig = ihigh - ilow;
    d = n - norig;
    item = a->ob_item;
    //[1] 
    s = norig * sizeof(PyObject *); 
    if (s > sizeof(recycle_on_stack)) { 
        recycle = (PyObject **)PyMem_MALLOC(s); 
        if (recycle == NULL) { 
            PyErr_NoMemory(); 
            goto Error; 
        } 
    } 
    memcpy(recycle, &item[ilow], s); 
    //[2] 
        if (d < 0) { /* Delete -d items */
            memmove(&item[ihigh+d], &item[ihigh],
    (a->ob_size - ihigh)*sizeof(PyObject *));
            list_resize(a, a->ob_size + d);
            item = a->ob_item;
        }
    ……
    //[3] 
    for (k = norig - 1; k >= 0; --k) 
        Py_XDECREF(recycle[k]);
    #undef b
}

当传入的v是NULL时，就会进行删除的动作，可以看到，这正是listremove期望的动作。首先会获得需要删除的元素个数，这是通过ihigh-ilow得到的，在删除元素这种情况下，这个值显然是1。在获得了需要删除的元素个数之后，在代码的[2]处，list_ass_slice通过memmove来达到删除元素的目的。

很明显了，在 PyListObject 中，如果是在对象维护的列表中部删除元素的话，一定会引起内存的搬移动作，这一点跟C++中的vector是完全一致的，而与C++中的list则完全不同。

图6展示了删除元素 100 的过程：

值得注意的一点是，实际上，list_ass_slice不仅仅是用做删除元素，它还可以进行插入元素的动作。它的完整功能如下：

a[ilow:ihigh] = v if v != NULL. 
del a[ilow:ihigh] if v == NULL.

图7展示了一个 list_ass_slice 进行插入元素操作的例子，有兴趣的朋友可以对list_ass_slice进行深入研究。

方便获取更多学习、工作、生活信息请关注本站微信公众号 城东书院微信服务号

来顶一下

返回首页

上一篇:Python 源码剖析 - 对象机制下一篇:Python 源码剖析 - PyListObject 对象

考了这些证书的可享个	怎样用 PS 把电子文档
记录在本地电脑部署自	围棋入门教程十天即