Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
291 views
in Technique[技术] by (71.8m points)

python - PyArray_Check gives Segmentation Fault with Cython/C++

Thank you all in advance.

I am wondering what's the right way to #include all numpy headers and what's the right way to use Cython and C++ to parse numpy arrays. Below is attempt:

// cpp_parser.h 
#ifndef _FUNC_H_
#define _FUNC_H_

#include <Python.h>
#include <numpy/arrayobject.h>

void parse_ndarray(PyObject *);

#endif

I know this might be wrong, I also tried other options but none of them works.

// cpp_parser.cpp
#include "cpp_parser.h"
#include <iostream>

using namespace std;

void parse_ndarray(PyObject *obj) {
    if (PyArray_Check(obj)) { // this throws seg fault
        cout << "PyArray_Check Passed" << endl;
    } else {
        cout << "PyArray_Check Failed" << endl;
    }
}

The PyArray_Check routine throws Segmentation Fault. PyArray_CheckExact doesn't throw, but it is not what I wanted exactly.

# parser.pxd
cdef extern from "cpp_parser.h": 
    cdef void parse_ndarray(object)

and the implementation file is:

# parser.pyx
import numpy as np
cimport numpy as np

def py_parse_array(object x):
    assert isinstance(x, np.ndarray)
    parse_ndarray(x)

The setup.py script is

# setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize

import numpy as np

ext = Extension(
    name='parser',
    sources=['parser.pyx', 'cpp_parser.cpp'],
    language='c++',
    include_dirs=[np.get_include()],
    extra_compile_args=['-fPIC'],
)

setup(
    name='parser',
    ext_modules=cythonize([ext])
    )

And finally the test script:

# run_test.py
import numpy as np
from parser import py_parse_array

x = np.arange(10)
py_parse_array(x)

I have created a git repo with all the scripts above: https://github.com/giantwhale/study_cython_numpy/

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Quick Fix (read on for more details and a more sophisticated approach):

You need to initialize the variable PyArray_API in every cpp-file in which you are using numpy-stuff by calling import_array():

//it is only a trick to ensure import_array() is called, when *.so is loaded
//just called only once
int init_numpy(){
     import_array(); // PyError if not successful
     return 0;
}

const static int numpy_initialized =  init_numpy();

void parse_ndarraray(PyObject *obj) { // would be called every time
    if (PyArray_Check(obj)) {
        cout << "PyArray_Check Passed" << endl;
    } else {
        cout << "PyArray_Check Failed" << endl;
    }
}

One could also use _import_array, which returns a negative number if not successful, to use a custom error handling. See here for definition of import_array.

Warning: As pointed out by @isra60, _import_array()/import_array() can only be called, once Python is initialized, i.e. after Py_Initialize() was called. This is always the case for an extension, but not always the case if the python interpreter is embedded, because numpy_initialized is initialized before main-starts. In this case, "the initialization trick" should not be used but init_numpy() called after Py_Initialize().


Sophisticated solution:

NB: For information, why setting PyArray_API is needed, see this SO-answer: in order to be able to postpone resolution of symbols until running time, so numpy's shared object aren't needed at link time and must not be on dynamic-library-path (python's system path is enough then).

The proposed solution is quick, but if there are more than one cpp using numpy, one have a lot of instances of PyArray_API initialized.

This can be avoided if PyArray_API isn't defined as static but as extern in all but one translation unit. For those translation units NO_IMPORT_ARRAY macro must be defined before numpy/arrayobject.h is included.

We need however a translation unit in which this symbol is defined. For this translation unit the macro NO_IMPORT_ARRAY must not be defined.

However, without defining the macro PY_ARRAY_UNIQUE_SYMBOL we will get only a static symbol, i.e. not visible for other translations unit, thus the linker will fail. The reason for that: if there are two libraries and everyone defines a PyArray_API then we would have a multiple definition of a symbol and the linker will fail, i.e. we cannot use these both libraries together.

Thus, by defining PY_ARRAY_UNIQUE_SYMBOL as MY_FANCY_LIB_PyArray_API prior to every include of numpy/arrayobject.h we would have our own PyArray_API-name, which would not clash with other libraries.

Putting it all together:

A: use_numpy.h - your header for including numpy-functionality i.e. numpy/arrayobject.h

//use_numpy.h

//your fancy name for the dedicated PyArray_API-symbol
#define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API 

//this macro must be defined for the translation unit              
#ifndef INIT_NUMPY_ARRAY_CPP 
    #define NO_IMPORT_ARRAY //for usual translation units
#endif

//now, everything is setup, just include the numpy-arrays:
#include <numpy/arrayobject.h>

B: init_numpy_api.cpp - a translation unit for initializing of the global MY_PyArray_API:

//init_numpy_api.cpp

//first make clear, here we initialize the MY_PyArray_API
#define INIT_NUMPY_ARRAY_CPP

//now include the arrayobject.h, which defines
//void **MyPyArray_API
#inlcude "use_numpy.h"

//now the old trick with initialization:
int init_numpy(){
     import_array();// PyError if not successful
     return 0;
}
const static int numpy_initialized =  init_numpy();

C: just include use_numpy.h whenever you need numpy, it will define extern void **MyPyArray_API:

//example
#include "use_numpy.h"

...
PyArray_Check(obj); // works, no segmentation error

Warning: It should not be forgotten, that for initialization-trick to work, Py_Initialize() must be already called.


Why do you need it (kept for historical reasons):

When I build your extension with debug symbols:

extra_compile_args=['-fPIC', '-O0', '-g'],
extra_link_args=['-O0', '-g'],

and run it with gdb:

 gdb --args python run_test.py
 (gdb) run
  --- Segmentation fault
 (gdb) disass

I can see the following:

   0x00007ffff1d2a6d9 <+20>:    mov    0x203260(%rip),%rax       
       # 0x7ffff1f2d940 <_ZL11PyArray_API>
   0x00007ffff1d2a6e0 <+27>:    add    $0x10,%rax
=> 0x00007ffff1d2a6e4 <+31>:    mov    (%rax),%rax
   ...
   (gdb) print $rax
   $1 = 16

We should keep in mind, that PyArray_Check is only a define for:

#define PyArray_Check(op) PyObject_TypeCheck(op, &PyArray_Type)

That seems, that &PyArray_Type uses somehow a part of PyArray_API which is not initialized (has value 0).

Let's take a look at the cpp_parser.cpp after the preprocessor (compiled with flag -E:

 static void **PyArray_API= __null
 ...
 static int
_import_array(void)
{
  PyArray_API = (void **)PyCapsule_GetPointer(c_api,...

So PyArray_API is static and is initialized via _import_array(void), that actually would explain the warning I get during the build, that _import_array() was defined but not used - we didn't initialize PyArray_API.

Because PyArray_API is a static variable it must be initialized in every compilation unit i.e. cpp - file.

So we just need to do it - import_array() seems to be the official way.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...