Is there a way to vectorize recursive formulations using NumPy?

mamckay · July 27, 2017, 7:28am

Is there a way to push the following for loop down to the numpy level in the case of generating a sequence based on previous values?

n = int(10**5)
x = np.empty(n+1)
x[0] = 0.1
for t in range(n):
    x[t+1] = 4 * x[t] * (1 - x[t])

I have been looking at np.nditer and @guvectorize decorator as one possible solution. This example above is taken from the need_for_speed python lecture.

There are a lot of cases where it would be nice to pre-allocate an empty array in memory and then role through the same array populating values based on a function and some initial values for generating sequences.

As demonstrated by the lecture, jit is a very useful tool in this case to speed up and optimize the loop. Is it the case then that I should always use jit for recursive formulations? Is this a good general rule of thumb?

I am thinking about this question as @natashawatkins sent me an interesting SciPy video that demonstrated the use of @vectorize from the numba package and got some good performance values using generated custom ufuncs here. I guess even if you could generate generalized ufuncs that loop over arrays, they probably won’t be faster than jit solution anyway.

mamckay · July 28, 2017, 7:31am

I have been doing some research on this today and here is what I have concluded so far.

Vectorization is used relatively loosely to describe two main scenarios:

undertake an operation on an array in low level code (i.e. C etc) to remove the loop at the python level.
make use of vectorized compiler instructions to process more than one number on a single cpu at a time in the registers.

According to this link numpy makes use of vectorized instructions at the CPU level. Therefore recursive formulations (like the one above) would either a) not be possible if these cpu performance strategies are used as previous values are required to compute future values in the sequence and this can’t be done in parallel, or b) it might run a loop at the c level but without being able to use cpu level vectorization benefits which may be faster but isn’t really all that different to what jit would do.

My Conclusions So Far:
While it might be possible to come up with a numpy formulation to remove the for t in range(n) loop at the python level to populate a pre-allocated array recursively it probably isn’t worth doing. It may run the loop faster using C but it wouldn’t be able to use cpu vectorization due to the inability compute the problem in parallel. jit compilation brings the loop closer to the machine level using llvm in a more straight forward way.

To understand vectorization benefits I found this little example very interesting:

Have a look at float32. This is from this stackoverflow question

Other resources:

http://www.nersc.gov/users/computational-systems/edison/programming/vectorization/ (thanks @natashawatkins)

john.stachurski · July 29, 2017, 7:24pm

Thanks @mamckay, very useful.

Topic		Replies	Views
Nopython mode inside loop	1	793	September 4, 2019
@jit with parallelization in Python	8	3550	August 16, 2021
Efficient value function iteration in Julia Lectures	9	2352	January 24, 2022
MultiDimensional Interpolation with Numba	3	2511	November 26, 2020
MPI.jl or pmap Lecture? Lectures	6	2239	May 29, 2017

Is there a way to vectorize recursive formulations using NumPy?

Related Topics