Is there a way to push the following for
loop down to the numpy
level in the case of generating a sequence based on previous values?
n = int(10**5)
x = np.empty(n+1)
x[0] = 0.1
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
I have been looking at np.nditer and @guvectorize decorator as one possible solution. This example above is taken from the need_for_speed
python lecture.
There are a lot of cases where it would be nice to pre-allocate
an empty array in memory and then role through the same array populating values based on a function and some initial values for generating sequences.
As demonstrated by the lecture, jit
is a very useful tool in this case to speed up and optimize the loop
. Is it the case then that I should always use jit
for recursive formulations? Is this a good general rule of thumb?
I am thinking about this question as @natashawatkins sent me an interesting SciPy video that demonstrated the use of @vectorize
from the numba
package and got some good performance values using generated custom ufuncs
here. I guess even if you could generate generalized ufuncs
that loop over arrays, they probably won’t be faster than jit
solution anyway.
I have been doing some research on this today and here is what I have concluded so far.
Vectorization
is used relatively loosely to describe two main scenarios:
- undertake an operation on an array in low level code (i.e. C etc) to remove the
loop
at the python level.
- make use of vectorized compiler instructions to process more than one number on a single
cpu
at a time in the registers.
According to this link numpy makes use of vectorized instructions at the CPU level. Therefore recursive formulations (like the one above) would either a) not be possible if these cpu
performance strategies are used as previous values are required to compute future values in the sequence and this can’t be done in parallel, or b) it might run a loop at the c
level but without being able to use cpu
level vectorization benefits which may be faster but isn’t really all that different to what jit
would do.
My Conclusions So Far:
While it might be possible to come up with a numpy formulation to remove the for t in range(n)
loop at the python level to populate a pre-allocated array recursively it probably isn’t worth doing. It may run the loop faster using C but it wouldn’t be able to use cpu vectorization
due to the inability compute the problem in parallel. jit
compilation brings the loop closer to the machine level using llvm
in a more straight forward way.
To understand vectorization
benefits I found this little example very interesting:
Have a look at float32
. This is from this stackoverflow question
Other resources:
http://www.nersc.gov/users/computational-systems/edison/programming/vectorization/ (thanks @natashawatkins)
Thanks @mamckay, very useful.