Hi there, thanks for all your useful replies! I’ve been trying to solve my problem for a couple of weeks now. I finally, ended up going for a “write all my code” approach. So far it’s been very successful for finding an optimal policy. I plan to put to solution in github… not that is gonna be really elegant. Anyway, in my approach after I find the optimal policy, I plan on finding the stationary distribution. For this I simulate a bunch of agents, during a lot of years. After that I look at the distribution, and assume it should be stationary. The problem is that different runs give me slightly different distributions, and this should not be the case.

This is the simulation function:

n, is the number of states

time = number of periods that i’m simulating (its currently fixed a 10_000)

The states are employed and unemployed with a decision of how much to save for the next period(a_t), the indexing is as follows:

(a_0,employed)

(a_0,unemployed)

…

(a_n/2, unemployed)

P, its an array in which each element is the the probability of being employed the next period given the current state (it is not constant) P[0] will then be:

P(employed| a_0, employed)

and P[n-1] will be:

P(employed| a_n/2, unemployed)

Policy its a policy vector that tells the agent how much to save given his current state.

So the elements range from 0 to n/2 (the number of savings states).

Agents is an array of length n*10 in which each element represent an agent by a number that is the current state he is in.

```
@njit
def simulate(n, time, P, policy):
no_agents_total = 10 * n #10 agents per state
agents = np.zeros(no_agents_total) #every agent starts at state 0
for t in range(time):
randoms = np.random.uniform(0,1,no_agents_total)
for i in range(no_agents_total):
current_state = int(agents[i])
current_random = randoms[i]
if current_state % 2 == 0: # The agent is employed
if current_random > P[current_state]:
current_state += 1 #she became unemployed
else:
if current_random < P[current_state]: #the agent is unemployed
current_state -= 1 #she became employed
agents[i] = policy[current_state]*2 + current_state % 2 #the agents go to the new state that the policy tells him but keeps his employment status.
return agents
#simple function to count the number of agents in each state.
def distribution(agents,n):
distribution_a = np.zeros(n)
for i in agents:
current_i = int(i)
distribution_a[current_i] +=1
return (distribution_a/len(agents))
```

Anyway… Do you see why it does not convege to a constant stationary state?

Sorry for the sloppy code but this is the first time im doing anything in Python!

Thanks!