Scipy implements standard N-dimensional convolutions, so that the matrix to be convolved and the kernel are both N-dimensional.
A quick fix would be to add an extra dimension to Y
so that Y
is 3-Dimensional:
result = signal.convolve(X, Y[..., None], 'valid')
I'm assuming here that the last axis corresponds to the image index as in your example [width, height, image_idx]
(or [height, width, image_idx]
). If it is the other way around and the images are indexed in the first axis (as it is more common in C-ordering arrays) you should replace Y[..., None]
with Y[None, ...]
.
The line Y[..., None]
will add an extra axis to Y
, making it 3-dimensional [kernel_width, kernel_height, 1]
and thus, converting it to a valid 3-Dimensional convolution kernel.
NOTE: This assumes that all your input mini-batches have the same width x height
, which is standard in CNN's.
EDIT: Some timings as @Divakar suggested.
The testing framework is setup as follows:
def test(S, N, K):
""" S: image size, N: num images, K: kernel size"""
a = np.random.randn(S, S, N)
b = np.random.randn(K, K)
valid = [slice(K//2, -K//2+1), slice(K//2, -K//2+1)]
%timeit signal.convolve(a, b[..., None], 'valid')
%timeit signal.fftconvolve(a, b[..., None], 'valid')
%timeit ndimage.convolve(a, b[..., None])[valid]
Find bellow tests for different configurations:
Varying image size S
:
>>> test(100, 50, 11) # 100x100 images
1 loop, best of 3: 909 ms per loop
10 loops, best of 3: 116 ms per loop
10 loops, best of 3: 54.9 ms per loop
>>> test(1000, 50, 11) # 1000x1000 images
1 loop, best of 3: 1min 51s per loop
1 loop, best of 3: 16.5 s per loop
1 loop, best of 3: 5.66 s per loop
Varying number of images N
:
>>> test(100, 5, 11) # 5 images
10 loops, best of 3: 90.7 ms per loop
10 loops, best of 3: 26.7 ms per loop
100 loops, best of 3: 5.7 ms per loop
>>> test(100, 500, 11) # 500 images
1 loop, best of 3: 9.75 s per loop
1 loop, best of 3: 888 ms per loop
1 loop, best of 3: 727 ms per loop
Varying kernel size K
:
>>> test(100, 50, 5) # 5x5 kernels
1 loop, best of 3: 217 ms per loop
10 loops, best of 3: 100 ms per loop
100 loops, best of 3: 11.4 ms per loop
>>> test(100, 50, 31) # 31x31 kernels
1 loop, best of 3: 4.39 s per loop
1 loop, best of 3: 220 ms per loop
1 loop, best of 3: 560 ms per loop
So, in short, ndimage.convolve
is always faster, except when the kernel size is very large (as K = 31
in the last test).