Cass-Koopmans Model: Setting up the "Planning Problem"

Hello! I have troubles following the setup in Cass-Koopmans Model - Planning Problem. In particular:

  • why are \mu_t are required to be non-negative and what’s their interpretation?
  • why are \beta^t multiplied by \mu_t \left(F(K_t,1) + (1-\delta) K_t- C_t - K_{t+1} \right) in Lagrangian? Isn’t it a utility discount factor? How can it interact with the allocation constraint?
  • why do we get a min-max problem, i.e. what’s the reason for minimizing the Lagrangian with respect to \vec{\mu}?

Thank you!

(edit)

  • And a further question: what’s the interpretation of requiring K_{T+1} = 0? Could not we have the boundary to be K_{T+1} = K_{T} (or something similar) to not force the capital be used up at the end of planning period?

Hi artemsolod,

I might rearrange the order of your questions to follow the logic I used.

why do we get a min-max problem, i.e. what’s the reason for minimizing the Lagrangian with respect to \vec{\mu}?

This min-max problem that aims to reflect the feasibility constraint C_t + K_{t+1} \leq F(K_t,N_t) + (1-\delta) K_t \quad \text{for all } t \in \{0, 1, \ldots, T\}. \mathcal{L}(\vec{C} ,\vec{K} ,\vec{\mu} ) = \sum_{t=0}^T \beta^t\left\{ u(C_t)+ \mu_t \left(F(K_t,1) + (1-\delta) K_t- C_t - K_{t+1} \right)\right\} is constructed to find the optimal allocation {\vec{C},\vec{K}} that maximizes the Lagrangian, while at the same time finding the \vec{μ} ​ that minimizes this maximum value to enforce the constraint we have (everything on the right of the inequality).

why are \mu_t are required to be non-negative and what’s their interpretation?

The multipliers capture the rate of the change in the solution to the constraint maximization problem as the constraint changes. Here, μ_t essentially measures the shadow price or the marginal utility of constraint being relaxed marginally at time t. Therefore, it makes economic sense that it should be non-negative. In terms of the math, the dual feasibility condition of KKT enforces the \mu_t to be non-negative.

why are \beta^t multiplied by \mu_t \left(F(K_t,1) + (1-\delta) K_t- C_t - K_{t+1} \right) in Lagrangian? Isn’t it a utility discount factor? How can it interact with the allocation constraint?

Yes, it is a utility discounting factor. It is there because we want to find the optimal allocation over a finite time horizon T.

what’s the interpretation of requiring K_{T+1} = 0?

I assume that you are referring to the section on the shooting algorithm. It is derived from the slackness condition of KKT in (41.13).

Please feel free to post more, and happy to discuss further.

1 Like

I can only post two links in one post, so I migrated some links here : )

I think most of your questions are related to KKT. Read more about it here and a simpler example can be found here).

Hope these answers help.

Thank you so much for your answer! I have looked into KKT a little more and have some follow up questions:

  • Section 41.3.2 is titled “First-order necessary conditions” however it appears to be solving using just Lagrange multipliers and simply setting derivative wrt \mu to 0 (41.11). As for the KKT conditions wouldn’t those be:
    \nabla U(\vec{C}) - \sum_{t=0}^T \mu_t \nabla \left(F(K_t,1) + (1-\delta) K_t- C_t - K_{t+1} \right) = \vec{0}
    plus corresponding feasibility and slackness conditions? With those we do not immediately get capital constraint transforming into equality which to me makes sense as in principle we could have had optimal \vec{C} and \vec{K} inside the constraint region and not on the boundary.

  • Opposite point on a higher level note: why do we plan both \vec{C} and \vec{K}? There appears to be no 3rd option for output to go (like burn it down). Even if there is, we can judge by monotonicity of the utility function that it would be strictly better to consume and, hence, get a (simpler?) problem that has equality constraint on capital and simple dynamic on consumption (i.e. output - investment).

  • Another thing: \beta is not a part of the constraints so I do not see why we have it in the Lagrangian. Without it 4.11 becomes \mu_t\left[(1-\delta)+f'(K_t)\right] - \mu_{t-1}=0

  • I now see that K_{T+1} = 0 makes sense but is this what we want from the modeling prospective? This creates consumption spikes at the end of the period a la “There is no tomorrow”. In the concluding section we are shooting towards steady state - could not we have tried to do that from the very start by setting boundary condition K_{T+1} = (1-\delta) * K_T?

Thank you again!
PS I have sent a typo fix pull request for this lecture to lecture-python.myst repository - hope that’s the right place.

Hi artemsolod,

It’s great to hear that that KKT conditions helped : )

As for the KKT conditions wouldn’t those be: \nabla U(\vec{C}) - \sum_{t=0}^T \mu_t \nabla \left(F(K_t,1) + (1-\delta) K_t- C_t - K_{t+1} \right) = \vec{0}

For question 1, in first-order condition, we are not (just) solving derivatives wrt \mu but wrt each variable + \mu. This is actually one part of the KKT condition (think how this relates to the Stationarity condition you listed here).

There appears to be no 3rd option for output to go (like burn it down). Even if there is, we can judge by monotonicity of the utility function that it would be strictly better to consume and, hence, get a (simpler?) problem that has equality constraint on capital and a simple dynamic on consumption (i.e. output - investment).

For this question, I am not too sure if I understand your question correctly, so my answer might be slightly off. I guess what you are asking is why we are not using all the budget we have to increase our utility so that we have an equality constraint that has consumption equal to budget (please correct me if. I understand it incorrectly). In the model section section, you can find our setup and see that monotonicity is not the only thing that shapes the utility function. There is also curvature associated with the utility function and production function. It is less obvious that we can use equality to capture the optimality. Using Lagrange multipliers allows for a more general and systematic approach that can handle a variety of constraints, including both equality and inequality constraints, simultaneously.

Without it 4.11 becomes \mu_t\left[(1-\delta)+f'(K_t)\right] - \mu_{t-1}=0

I think it is 4.10? 4.10 is the first-order condition wrt to K_t. To find the first-order condition with respect to K_t, we differentiate the Lagrangian function with respect to K_t, and set the result equal to zero. Note that K_t is dependent on the capital a period before (this is also why \beta here makes sense).

but is this what we want from the modeling prospective? This creates consumption spikes at the end of the period a la “There is no tomorrow”.

First of all, this is a finite state case. The terminal condition K_{T+1}=0 is derived from the first-order necessary condition involving the Lagrange multiplier. It also makes economic sense as it is not optimal to leave any leftover capital that isn’t converted to utility. We want to spend what we have and convert it into utility before the end.

I understand that you might not be happy with the consumption pattern (i.e., spending too much at the end). You can add some other factors into the model, for example, adding a bequest to the next generation and giving the bequest some utility so that it gives some incentives for agents to hold back consumption and preserve some capital at the end. There are other ways you can adjust the model so that it can capture features that you would like to have.

Hope this helps. Very happy to discuss further.

1 Like

Thank you again for looking into my questions! Apologies for bothering you so much, I really want to understand this. May be we can look at one thing at a time?

It would be super instructive to see how (41.11)
\mu_t:\qquad F(K_t,1)+ (1-\delta) K_t - C_t - K_{t+1}=0 \qquad \text{for all } \quad t=0,1,\dots,T
is derived using KKT.
I can understand how it could have been \mu_t \times (F(K_t,1)+ (1-\delta) K_t - C_t - K_{t+1})=0 - looks exactly like complementary slackness from wiki to me – but don’t get how KKT allowed us to drop \mu_t – some could have been 0 it seems.

PS I’m having problems posting links for some reason. `Karush–Kuhn–Tucker_conditions#Necessary_conditions are what I mean by “complementary slackness from wiki”

Hi artemsolod,

First of all, let’s clarify one thing:

\mu_t:\qquad F(K_t,1)+ (1-\delta) K_t - C_t - K_{t+1}=0 \qquad \text{for all } \quad t=0,1,\dots,T

is FOC of \mu_t at time t.

So, we can solve it using

\mathcal{f}(\mu_t) = u(C_t) + \mu_t \left(F(K_t,1) + (1-\delta) K_t- C_t - K_{t+1} \right)
\frac{\partial \mathcal{f}(\mu_t)}{\partial \mu_t} = F(K_t,1) + (1-\delta) K_t- C_t - K_{t+1} = 0

This gives the FOC of \mu_t. Here, we did not use complementary slackness.

Why do we need to derive FOC w.r.t. each variable in the lecture? As I hinted in my previous answer (not fully elaborated), the KKT condition Lagrange stationarity states that any optimal primal point minimizes the partial Lagrangian, and thus, it must be equal to the unique minimizer.

You can refer to the lecture note here and page 18-19 of this nice survey paper below:

It has some notes on the role of complementary slackness, and primal and dual optimality conditions in Lagrangian.

Hope this helps. Very happy to discuss this further.

1 Like

Thank yo HumphreyYang!

I will give it the last chance for now, I must be missing something obvious. From my understanding of the problem and KKT :

  • Stationary condition only has gradients with respect to decision variables, not w.r.t. Lagrange multipliers. This is written explicitly (Lagrangian stationarity) (13.2) in the lecture notes on KKT you have shared)
  • \mu is a Lagrange multiplier in the lecture, decision variable is \{K, C\} (x in the notes on KKT)
  • Hence, I fail to see how F(K_t,1)+ (1-\delta) K_t - C_t - K_{t+1}=0 can be derived by Lagrangian stationarity alone. Setting gradient \frac{\partial \mathcal{f}(\mu_t)}{\partial \mu_t} = F(K_t,1) + (1-\delta) K_t- C_t - K_{t+1} = 0 does not seem justified from the stationarity condition as it is presented in these lecture notes.

Could you please clarify?

Hi artemsolod,

Thanks for the follow-up question. Apologize that I should have given the entire setup and stated everything more clearly into two sets of variables : )

So, we are facing a dual optimization problem here.

First, you are right about the Lagrangian stationarity with respect to primal variables. It is related to why we need FOC for primal variables in the formula. I should have noted this in my previous answer.

For the dual variable, If you refer to section 4.7 of the paper, specifically formula (69) - (72), you can find that the optimal can also be computed using a FOC and “always gives back the corresponding constraints in the primal optimization problem” (p.19).

Note that here, we are not computing the optimal allocations and lagrangian multipliers by hand. We aim to derive an equation that helps us derive a sequence of optimal allocation. So, we might find that it does not strictly follow the procedure of section 4.8 of the paper.

Apologize that I cannot give a full explanation of KKT and Lagrangian in a forum post, so I am relying on external material in my explanations. Hope this clarifies the point you raised. Let me know if you would like to discuss this further.

1 Like

Thank you HumphreyYang for your help and patience! I feel like I’ve narrowed down the point of the problem quite a lot.

Now I guess I do not see how “we can ignore the constraint of the dual problem” from explanation of (69) in 4.7 of the tutorial can be justified. I think after we find dual variables with zero gradients we a) need to check they are non-negative; b) if there are no non-negative duals, we have to check duals equal to 0.

I have a toy example to illustrate this. I believe it is convex and satisfies Slater’s condition. Say we try to minimize f(x) = x^2 subject to g(x) = x - 1 \leq 0. We know the solution is x=0 and it lies inside the feasibility region. Now the Lagrangian is L = f(x) + \mu g(x) = x^2 + \mu (x-1).
Following the tutorial we set \frac{\partial{L}}{\partial{x}}=0 to get x^* = -\frac{\mu}{2}. Now we plug x^* into L and get L(\mu, x^*) = -\frac{\mu^2}{4}-\mu. Differentiate w.r.t. \mu and set it to zero, arriving at \mu=-2. This fails dual feasibility test. Now neither the procedure in the tutorial nor the lecture handles this issue explicitly or at least I do not see it.

If we were to simply set partials of the Lagrangian w.r.t. \mu and x to zero it would also yield the same result: \mu^*=-2 and x^*=1.

Thank you!

Hi artemsolod,

Thanks for the follow-up questions. I think it is actually a two-part question. I will address these two questions separately.

The first question is related

how “we can ignore the constraint of the dual problem” from explanation of (69) in 4.7 of the tutorial can be justified.

If you refer to section 4.4, you can see why the constraint in this problem is already satisfied by the dual feasibility in the KKT conditions. It specifically defined that the dual variables for inequalities must be nonnegative.

The second part is related to the negative multiplier you get in the toy example. The negative value of \lambda^* suggests that the constraint you provided does not affect the optimal solution. The minimum of x^2 is not constrained by your condition because it always falls under the constraint. \lambda^* should be set to zero in this case as the constraint is inactive. Setting \lambda^*=0, you can get the problem minimized at x^*=0. The complementary slackness condition is related to this as it essentially says that for each constraint, either the constraint should not be binding, or if it is binding, its corresponding dual variable should be non-zero. (Think about whether this makes economic sense using the interpretation of Lagrangian multipliers we discussed before)

Hope this helps. Let me know if you have further questions.

1 Like

Thank you as always HumphreyYang! Does the lecture in question (41. Cass-Koopmans Model — Intermediate Quantitative Economics with Python) explicitly verify the \mu's are non-negative? I know they are positive due to the shape of the particular utility function used but I guess it should be explicitly verified (as in your reply to my toy example).

Hi artemosolod,

Yes, it specifies that Lagrangian multipliers are non-negative in the second line of the section on model setup. It can be visually verified in the graphs below. Moreover, if constraints are active as it is in the setup of the lecture, it should be positive.