Limited support for microeconometrics in Julia

I’m impressed with the support QuantEcon provides for macro-oriented analyses in Julia but less so for micro-oriented analyses.

This would be less of an issue if Julia had a more developed ecosystem. Julia packages appropriate for panel data analysis are either very limited or non-existent. For example, to my knowledge, there is no package in Julia that supports lag operators on panel data. And, as far as I can tell, it is non-trivial to develop a module that extends the DataFrame type (from DataFrames.jl) and lag functions (from ShiftedArrays.jl) to handle panel data.

Julia also seems to suffer from the “scheme curse”, where there may be a large number of different packages all meant to solve the same problem, each suffering from distinct shortcomings.

So I have three questions:

(1) Are there plans for QuantEcon to extend coverage towards microeconometric topics, particularly for Julia?

(2) Might QuantEcon help to organize efforts towards developing unique definitive packages (thereby helping to avoid the “scheme curse”)

(2) Can the QuantEcon community recommend a suite of Julia packages for microeconometrics?

Thanks!

1 Like

That isn’t entirely true. GitHub - matthieugomez/FixedEffectModels.jl: Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables is pretty great for multi-way fixed effects.

But, to be a devil’s advocate, why is Julia an appropriate language for general microeconometrics? R (and Stata) are spectacular for pre-packaged microeconometrics, while Julia remains is a great choice for more computationally intensive (e.g. the 2-way fixed effects), Bayesian exploiting auto-differentiation, or more structural models where algorithms are required.

I think that for Julia to become competitive with R for cookie-cutter econometrics, there would need to be a huge amount of investment required that might be better spent on more specialized packages.

But that is just one opinion!

1 Like

That isn’t entirely true. GitHub - matthieugomez/FixedEffectModels.jl: Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables is pretty great for multi-way fixed effects.

That’s true. FixedEffectModels.jl works for fixed-effect models. But suppose I want to construct transition matrices for a categorical variable in panel data. It’s not hard to think about what a solution would like like but it hard to implement using standard data structures, like DataFrames. FixedEffectModels.jl doesn’t really help with that. What Julia needs is a Panel <: AbstractDataFrame type, on which all functions defined for a DataFrame type works and on which additional functions for Panel types would work. I suspect there is something out there that can already accomplish this, but sifting through the various similar packages can be very time consuming.

This is an appropriate use-case for Julia as the data I am working with is too large for python or [R] to be fast enough.

But, to be a devil’s advocate, why is Julia an appropriate language for general microeconometrics?

There are Two big problems with using a variety of different languages for different components of research:

(1) using multiple languages can be bad for project organization.

(A) it makes projects messier

(B) A research project using many programming languages is less reproducible, as it requires anyone using your code to use all of the programming languages involved.

(2) Using multiple languages can be bad for developing expertise.

(A) It is better to be an expert in one language and passable in others than it is to be mediocre in many. This is part of the appeal of Julia – it avoids the “two lanaguage” problem. I would rather invest heavily in mastering one language than trying to navigate the idiosyncrasies of Stata for this, [R] for that, Julia for this other thing, Python for this fourth thing, etc. All of these languages have problems that only an expert would know to steer clear of. If I am not an expert I am more likely to make big mistakes.

(B) Similarly, when working in a research organization where there is turnover in RA’s, getting RAs to contribute to projects written in several different languages can be unworkable.

Realistically, Julia is programming language that could replace all of these other lanaguages, but its package ecosystem isn’t there yet. Why deal with the nightmarish syntactical quarks and performance issues of [R] and the severe limitations of Stata when in principle, I could do everything in Julia?

If you are doing linear stuff, and working with big data, I don’t see why Julia would be faster than R and python. For that sort of stuff, python and R uses heavily optimized packages implemented in C, and have a great deal more support than Julia for too big for memory operations/etc. Julia’s DataFrames are perfectly fine for what they do, but querying them/etc. is frequently much slower than R. None of these are fundamental issues with Julia, it is just that the investment into julia’s DataFrames is infinitesimal compared to the investment into R or Pandas. Furthermore, since they all end up using compiled C code in the background for that sort of thing, you shouldn’t expect much performance difference for those tasks.

Not to mention that Pandas and dplyr are better in the near future for a data manipulation pipeline. And while GitHub - jmboehm/RegressionTables.jl: Journal-style regression tables is great, the specialized microeconometric packages in R (or stata) give you the exact sorts of tables and output that journals for heavily empirical microeconometric research would expect.

Of course, that all depends on whether all you are doing is a bunch of (linear) microeconometrics, or whether that is a small part of a bigger project where Julia’s benefits would start to shine (e.g. structural estimation, anything nonlinear, differential equations, dynamic programming, anything where you write an algorithm, etc.).

Sure. But (linear) microeconometrics isn’t really learning a language, it is about learning a bunch of packages and enough glue to hold them together. If you are running a bunch of regressions from packages people have already written, and manipulating/cleaning data, there is no reason you need a “serious” programming language.

I think the main issue with the two language problem is about writing algorithms where things are just too slow with higher level languages, so you ended up writing lower-level kernels in C/Fortran. If you are doing linear microeconometrics, then people have already done that work for you and you are basically just stitching together packages.

Hopefully in the long run, and I sympathize with all of your reasons to want to double-down on one language… I would love to use fewer languages. But once you know 2 it is pretty easy to pickup a 3rd (especially if you don’t intend to write algorithms in it!).

Julia is great at many things, but for now I feel there are plenty of places where you are better off using the more specialized tools that have the best packages (R if you are doing heavy econometrics/statistics, python if you are doing webscraping/neural networks/glue code, Julia for anything nonlinear or where you actually write your own algorithms, Stan for bayesian stuff that fits into its framework, dynare for DSGE stuff, etc.)

But, as I said, just take that as one opinion. But the reason I feel the need to state it is that there are a limited number of people and researcher time that can be put towards investing in Julia packages. Any time spent on reproducing things that R or Python already does very well, is maintenance and development resources that may be taken away from things where Julia is the state-of-the-art.

1 Like

This was helpful and what you have said makes sense.

On different projects, my team estimates machine learning models and structural models. A typical research project will also include some simpler models for descriptive statistics. We’ve mainly used python for data cleaning and neural nets, and we’ve used other languages for structural models. Julia is such a beautiful language, it would be nice if it did everything well, but, alas, you’ve convincingly argued that developing the Julia ecosystem for run-of-the-mill econometrics is not worth the opportunity cost, at least for now.

Thank you for your comments.

Thanks to you, these are very important questions to discuss.

For now. But the nice thing about Julia is that there is no reason that it can’t be in the medium-term. That is where the solution to the “two language” problem kicks in, and the flexibility of Julia’s metaprogramming means there are no fundamental limitations why it can’t provide a better implementation of all of these things! Lots of people will chip away at these sorts of packages and eventually I think your goal will become possible.

I disagree. JuMP.jl is a “killer app” for GMM and ML estimation when you have closed-form expressions; and I’m not aware of a good substitute in R or python. It also works beautifully with Knitro or Ipopt. For structural micro where you minimize black-box functions, it’s a good substitute for Fortran or C (though probably no better).

I do agree that the ecosystem is not as well developed as in R or python. But I hope that we’ll get there.

Yes, 100%. My point only applies to bread-and-butter linear microeconometrics. For nonlinear and structural stuff that is not pre-built (e.g. a logistic estimation is handled great in R) Julia is great, and the warts in the general data ecosystem are not an issue. I should point out that even in the linear world, julia can be the ideal solution for big and non-cookie-cutter problems where you need to implement your own algorithms.

1 Like

Agreed.

Although the original post is already a bit older, I wanted to point to Econometrics.jl (even though I haven’t really tried it myself). I also recently wrote a package that estimates GLMs with high-dimensional fixed effects (GLFixedEffectModels.jl) which may be useful for applied micro people. It builds on the same package as the excellent FixedEffectModels.jl.

1 Like