Is there a way to vectorize recursive formulations using NumPy?


#1

Is there a way to push the following for loop down to the numpy level in the case of generating a sequence based on previous values?

n = int(10**5)
x = np.empty(n+1)
x[0] = 0.1
for t in range(n):
    x[t+1] = 4 * x[t] * (1 - x[t])

I have been looking at np.nditer and @guvectorize decorator as one possible solution. This example above is taken from the need_for_speed python lecture.

There are a lot of cases where it would be nice to pre-allocate an empty array in memory and then role through the same array populating values based on a function and some initial values for generating sequences.

As demonstrated by the lecture, jit is a very useful tool in this case to speed up and optimize the loop. Is it the case then that I should always use jit for recursive formulations? Is this a good general rule of thumb?

I am thinking about this question as @natashawatkins sent me an interesting SciPy video that demonstrated the use of @vectorize from the numba package and got some good performance values using generated custom ufuncs here. I guess even if you could generate generalized ufuncs that loop over arrays, they probably won’t be faster than jit solution anyway.


#2

I have been doing some research on this today and here is what I have concluded so far.

Vectorization is used relatively loosely to describe two main scenarios:

  1. undertake an operation on an array in low level code (i.e. C etc) to remove the loop at the python level.
  2. make use of vectorized compiler instructions to process more than one number on a single cpu at a time in the registers.

According to this link numpy makes use of vectorized instructions at the CPU level. Therefore recursive formulations (like the one above) would either a) not be possible if these cpu performance strategies are used as previous values are required to compute future values in the sequence and this can’t be done in parallel, or b) it might run a loop at the c level but without being able to use cpu level vectorization benefits which may be faster but isn’t really all that different to what jit would do.

My Conclusions So Far:
While it might be possible to come up with a numpy formulation to remove the for t in range(n) loop at the python level to populate a pre-allocated array recursively it probably isn’t worth doing. It may run the loop faster using C but it wouldn’t be able to use cpu vectorization due to the inability compute the problem in parallel. jit compilation brings the loop closer to the machine level using llvm in a more straight forward way.

To understand vectorization benefits I found this little example very interesting:

Have a look at float32. This is from this stackoverflow question

Other resources:

http://www.nersc.gov/users/computational-systems/edison/programming/vectorization/ (thanks @natashawatkins)


#3

Thanks @mamckay, very useful.