# Is there a way to vectorize recursive formulations using NumPy?

Is there a way to push the following `for` loop down to the `numpy` level in the case of generating a sequence based on previous values?

``````n = int(10**5)
x = np.empty(n+1)
x = 0.1
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
``````

I have been looking at np.nditer and @guvectorize decorator as one possible solution. This example above is taken from the `need_for_speed` python lecture.

There are a lot of cases where it would be nice to `pre-allocate` an empty array in memory and then role through the same array populating values based on a function and some initial values for generating sequences.

As demonstrated by the lecture, `jit` is a very useful tool in this case to speed up and optimize the `loop`. Is it the case then that I should always use `jit` for recursive formulations? Is this a good general rule of thumb?

I am thinking about this question as @natashawatkins sent me an interesting SciPy video that demonstrated the use of `@vectorize` from the `numba` package and got some good performance values using generated custom `ufuncs` here. I guess even if you could generate generalized `ufuncs` that loop over arrays, they probably won’t be faster than `jit` solution anyway.

I have been doing some research on this today and here is what I have concluded so far.

`Vectorization` is used relatively loosely to describe two main scenarios:

1. undertake an operation on an array in low level code (i.e. C etc) to remove the `loop` at the python level.
2. make use of vectorized compiler instructions to process more than one number on a single `cpu` at a time in the registers.

According to this link numpy makes use of vectorized instructions at the CPU level. Therefore recursive formulations (like the one above) would either a) not be possible if these `cpu` performance strategies are used as previous values are required to compute future values in the sequence and this can’t be done in parallel, or b) it might run a loop at the `c` level but without being able to use `cpu` level vectorization benefits which may be faster but isn’t really all that different to what `jit` would do.

My Conclusions So Far:
While it might be possible to come up with a numpy formulation to remove the `for t in range(n)` loop at the python level to populate a pre-allocated array recursively it probably isn’t worth doing. It may run the loop faster using C but it wouldn’t be able to use `cpu vectorization` due to the inability compute the problem in parallel. `jit` compilation brings the loop closer to the machine level using `llvm` in a more straight forward way.

To understand `vectorization` benefits I found this little example very interesting:

Have a look at `float32`. This is from this stackoverflow question

Other resources:

Thanks @mamckay, very useful.