Modified Policy Iteration


I would like to solve the income fluctuation problem using modified policy iteration, i.e. where the algorithm iterates over both the Bellman Operator and the Coleman Operator, whose outputs update each other’s inputs.

Can anyone please give me guidance on how to link up these two operators?

The Bellman Operator can output both the policy function and the value function, so I can use BO’s policy function guess as an input to the Coleman Operator. However, the Coleman Operator, as written in the IFP lecture, only outputs a policy function. How can I transform this output to update the inputs into the Bellman Operator?

NOTE: There is a discussion of this idea here and here, but there is no associated code.

Additional question: Has anyone applied the endogenous grid method to the income fluctuation problem? There is a paper published in the Journal of Economic Dynamics and Control that discusses applying the method of endogenous grid points to problems with occasionally binding constraints.


Hi @skeet.singleton,

my understanding is that combining the Bellman operator and the Coleman operator is different to modified policy iteration. Those two operators are topologically conjugate in a certain sense that essentially means they are equivalent up to the numerical implementation. When the Coleman operator is feasible it should probably be preferred, since the numerical implementation is more efficient, at least in principle.

Regarding your second question, I really want to put together a lecture on solving the income fluctuation problem using the endogenous grid method! It can certainly be done. I just need to find the time.


Hi John!

I am so psyched you replied to my question. I love QuantEcon, and I just ordered a copy of your textbook from the library (sorry you won’t get any royalties on that…).

So, the Coleman Operator as written in the lecture on income fluctuation only returns the policy function. An intermediate question I have is: how do I get the optimal value function from the Coleman Operator below?

The approximate Coleman operator.

Iteration with this operator corresponds to policy function
iteration. Computes and returns the updated consumption policy
c.  The array c is replaced with a function cf that implements
univariate linear interpolation over the asset grid for each
possible value of z.

##### Arguments

- `cp::CareerWorkerProblem` : Instance of `CareerWorkerProblem`
- `c::Matrix`: Current guess for the policy function
- `out::Matrix` : Storage for output

##### Returns

None, `out` is updated in place to hold the policy function

function coleman_operator!(cp::ConsumerProblem, c::Matrix, out::Matrix)
    # simplify names, set up arrays
    R, Pi, bet, b = cp.R, cp.Pi,, cp.b
    asset_grid, z_vals = cp.asset_grid, cp.z_vals
    z_idx = 1:length(z_vals)
    gam = R * bet
    # policy function when the shock index is z_i
    cf = interp(asset_grid, c)

    # compute lower_bound for optimization
    opt_lb = 1e-8

    for (i_z, z) in enumerate(z_vals)
        for (i_a, a) in enumerate(asset_grid)
            function h(t)
                cps = cf.(R*a+z-t, z_idx) # c' for each z'
                expectation = dot(du.(cps), Pi[i_z, :])
                return abs(du(t) - max(gam * expectation, du(R*a+z+b)))
            opt_ub = R*a + z + b  # addresses issue #8 on github
            res = optimize(h, min(opt_lb, opt_ub - 1e-2), opt_ub,
            out[i_a, i_z] = Optim.minimizer(res)

Furthermore, I am under the impression that if one iterates back and forth between updating the policy function and the value function, then one can increase the speed of convergence (if it exists). I think this is what Kenneth Judd is saying on pages 416-417 in Numerical Methods in Economics. Here is a section of example code somebody else wrote (that I don’t entirely understand):

i = 0;
maxiter = 1000;
vdiff = 100000;
maxdiff = 1e-7;
while (vdiff>maxdiff) & (i<=maxiter)
  i = i+1
  display(['Iteration ',num2str(i)])
  vprime = v';
  for j = 1:nstates
    expectation = BETA*P(j,:)*vprime;
    for k=1:npoints
      maximand = R(k,:,j) + expectation;
      [Tv(k,j) u(k,j)] = max(maximand,[],2);
  v = Tv;
  if howard_switch
    for kk=1:50
        for j = 1:nstates
            for k = 1:npoints
                 vnew(k,j) = R(k,u(k,j),j)+BETA*P(j,:)*transpose(v(u(k,j),:));
        v = vnew;
  vdiff = max(max(abs(v-vprime')));
  display(['Max change in V is ' num2str(vdiff,14)])

Really, where I am trying to get is to solve a portfolio choice problem with a risk free-rate, two idiosyncratically risky assets and an aggregate shock to inflation. Because of the curse of dimensionality, this problem is computationally intensive, so I am trying to learn how to speed up the solution to the simpler IFP problem before implementing a similar solution on a harder problem.

Also, I would like to contribute a sub-lecture to QuantEcon on this problem once I am done writing this paper for a class I am taking. I am not sure how you handle accepting contributions.



how do I get the optimal value function from the Coleman Operator below?

I just have time for a quick answer right now, but one way to do this is to repeatedly apply a "pseudo-"Bellman operator where instead of taking the max to find the policy you always use the policy returned by the Coleman operator. If you do this until you reach a fixed point in the value function (remember the bellman operator takes a guess for the value function tomorrow and returns a value function today), you will have obtained the value associated with the policy given you by the Coleman operator.


So does this mean, instead of solving the optimization (see below), I will plug the output from the Coleman Operator into the objective(?). Then I take the resulting value function to update my guess for the value function, iterating to convergence?

    for i_z = 1:length(grid_z)
        % Given level of productivity.
        z = grid_z(i_z);
        for i_a = 1:length(grid_a)
            % Given level of assets.
            a = grid_a(i_a);
            % Find acceptable range for c.
            cMin = 1e-6;
            cMax = R*a+z+b;

            % Solve for extremal value of functional, 
            % i.e. V(a_t) = max( u(c) + betta*E[V(a_t+1)] ),
            % given a particular level of assets and productivity.
            [X, FVAL] = fminbnd(@(c) objective(c,w_func,u,betta,a,R,z,grid_z,i_z,PI), cMin, cMax);

            % Return values.
            sigma(i_a,i_z) = X;     % Policy function.
            Tw(i_a,i_z)    = -FVAL; % Value function.
        end % End asset-level for-loop.
    end % End productivity-level for-loop.

function out = objective(c,w_func,u,betta,a,R,z,grid_z,i_z,PI)
    EV  = w_func({R*a+z-c, grid_z}) * PI(i_z,:)';
    out = -u(c) - betta * EV;

How can I use this process to increase solution time?

By the way, how do I specify the language of the code I copy into a post, e.g. MATLAB?


Hi @skeet.singleton, I’m really glad that the lectures are useful to you ;-)

Your problem sounds interesting. I’m eager to get more sophisticated dynamic programming problems up on the lecture site. I’ve been tied up with research of late — excited about new problems — but I’ll get back to writing new lectures soon. (In the meantime various people on the team have been working on the infrastructure side, introducing code testing and so on.)

So, the idea with modified value function iteration is to combine value function iteration with Howard’s policy iteration routine. While time iteration with the Coleman operator is essentially isomorphic to value function iteration (there’s a discussion of that fact in this lecture — it’s not widely known, and I thought about publishing that discussion as a research paper but never found the time), Howard’s policy iteration routine is substantially different. The basic idea is:

  1. take a guess of the policy function
  2. compute the lifetime value obtained by following that policy forever
  3. treat that lifetime value as a guess of the value function and compute the corresponding optimal policy
  4. go to step one, taking this policy as the new guess.

Modified policy iteration, which is what Ken Judd is referring to, modifies step 2 of the algorithm above, computing only an approximation of the lifetime value.

There’s a (fairly theoretical) discussion of these algorithms here.

Howard’s policy iteration routine converges faster than value function in the neighborhood of the true optimal policy, so it’s definitely worth thinking about for your problem. At the same time, if you can figure out how to employ the endogenous grid method combined with the Coleman operator that should be very fast too.

Please let us know how you get on.