I would like to ask a question about Dynamic Programming Squared (I want to be sure, that I understand this concept properly).
My understanding is, that Dynamic Programming Squared problems include two or more agents, whose behavior is described by Bellman equations, and some variables in each agent’s Bellman equation are results of optimization behavior of another agent, which makes both problems interdependent. I am thinking about a model with non-convex choice (two agents, whose choices are interdependent) set, so I can’t use usual approach, that combines first-order conditions with market-clearing conditions and law of motion equations, also I don’t want to use LQ approximation.
I plan to solve model directly using dynamic programming/value function iteration, firstly build “nested Bellman equation” -> pick one agent, and substitute the “strategic” variables in his optimization problem with argmax of another agent’s Bellman equation, discretize whole problem, and for each choice of “outer” agent solve “inner” Bellman equation. However, I am not sure, how to represent “all choices” of the outer agent. I think, that each “choice” to be whole policy function (to get the law of motion, subject to which “inner” agent optimize). Should I use full discretization, and then try all possible permutations of mapping from state (point) to choice (in discrete setting, number of these permutations will be finite)? Is this a correct way, how to solve a multiagent problem, or did I understand “DP^2” in an incorrect way?
Any guidance or reference will be very welcome!