Limited support for microeconometrics in Julia

croberts · October 25, 2019, 6:36pm

I’m impressed with the support QuantEcon provides for macro-oriented analyses in Julia but less so for micro-oriented analyses.

This would be less of an issue if Julia had a more developed ecosystem. Julia packages appropriate for panel data analysis are either very limited or non-existent. For example, to my knowledge, there is no package in Julia that supports lag operators on panel data. And, as far as I can tell, it is non-trivial to develop a module that extends the DataFrame type (from DataFrames.jl) and lag functions (from ShiftedArrays.jl) to handle panel data.

Julia also seems to suffer from the “scheme curse”, where there may be a large number of different packages all meant to solve the same problem, each suffering from distinct shortcomings.

So I have three questions:

(1) Are there plans for QuantEcon to extend coverage towards microeconometric topics, particularly for Julia?

(2) Might QuantEcon help to organize efforts towards developing unique definitive packages (thereby helping to avoid the “scheme curse”)

(2) Can the QuantEcon community recommend a suite of Julia packages for microeconometrics?

Thanks!

jlperla · October 25, 2019, 6:54pm

That isn’t entirely true. GitHub - matthieugomez/FixedEffectModels.jl: Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables is pretty great for multi-way fixed effects.

But, to be a devil’s advocate, why is Julia an appropriate language for general microeconometrics? R (and Stata) are spectacular for pre-packaged microeconometrics, while Julia remains is a great choice for more computationally intensive (e.g. the 2-way fixed effects), Bayesian exploiting auto-differentiation, or more structural models where algorithms are required.

I think that for Julia to become competitive with R for cookie-cutter econometrics, there would need to be a huge amount of investment required that might be better spent on more specialized packages.

But that is just one opinion!

croberts · October 25, 2019, 7:24pm

That isn’t entirely true. GitHub - matthieugomez/FixedEffectModels.jl: Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables is pretty great for multi-way fixed effects.

That’s true. FixedEffectModels.jl works for fixed-effect models. But suppose I want to construct transition matrices for a categorical variable in panel data. It’s not hard to think about what a solution would like like but it hard to implement using standard data structures, like DataFrames. FixedEffectModels.jl doesn’t really help with that. What Julia needs is a Panel <: AbstractDataFrame type, on which all functions defined for a DataFrame type works and on which additional functions for Panel types would work. I suspect there is something out there that can already accomplish this, but sifting through the various similar packages can be very time consuming.

This is an appropriate use-case for Julia as the data I am working with is too large for python or [R] to be fast enough.

But, to be a devil’s advocate, why is Julia an appropriate language for general microeconometrics?

There are Two big problems with using a variety of different languages for different components of research:

(1) using multiple languages can be bad for project organization.

(A) it makes projects messier

(B) A research project using many programming languages is less reproducible, as it requires anyone using your code to use all of the programming languages involved.

(2) Using multiple languages can be bad for developing expertise.

(A) It is better to be an expert in one language and passable in others than it is to be mediocre in many. This is part of the appeal of Julia – it avoids the “two lanaguage” problem. I would rather invest heavily in mastering one language than trying to navigate the idiosyncrasies of Stata for this, [R] for that, Julia for this other thing, Python for this fourth thing, etc. All of these languages have problems that only an expert would know to steer clear of. If I am not an expert I am more likely to make big mistakes.

(B) Similarly, when working in a research organization where there is turnover in RA’s, getting RAs to contribute to projects written in several different languages can be unworkable.

Realistically, Julia is programming language that could replace all of these other lanaguages, but its package ecosystem isn’t there yet. Why deal with the nightmarish syntactical quarks and performance issues of [R] and the severe limitations of Stata when in principle, I could do everything in Julia?

jlperla · October 25, 2019, 8:36pm

If you are doing linear stuff, and working with big data, I don’t see why Julia would be faster than R and python. For that sort of stuff, python and R uses heavily optimized packages implemented in C, and have a great deal more support than Julia for too big for memory operations/etc. Julia’s DataFrames are perfectly fine for what they do, but querying them/etc. is frequently much slower than R. None of these are fundamental issues with Julia, it is just that the investment into julia’s DataFrames is infinitesimal compared to the investment into R or Pandas. Furthermore, since they all end up using compiled C code in the background for that sort of thing, you shouldn’t expect much performance difference for those tasks.

Not to mention that Pandas and dplyr are better in the near future for a data manipulation pipeline. And while GitHub - jmboehm/RegressionTables.jl: Journal-style regression tables is great, the specialized microeconometric packages in R (or stata) give you the exact sorts of tables and output that journals for heavily empirical microeconometric research would expect.

Of course, that all depends on whether all you are doing is a bunch of (linear) microeconometrics, or whether that is a small part of a bigger project where Julia’s benefits would start to shine (e.g. structural estimation, anything nonlinear, differential equations, dynamic programming, anything where you write an algorithm, etc.).

Sure. But (linear) microeconometrics isn’t really learning a language, it is about learning a bunch of packages and enough glue to hold them together. If you are running a bunch of regressions from packages people have already written, and manipulating/cleaning data, there is no reason you need a “serious” programming language.

I think the main issue with the two language problem is about writing algorithms where things are just too slow with higher level languages, so you ended up writing lower-level kernels in C/Fortran. If you are doing linear microeconometrics, then people have already done that work for you and you are basically just stitching together packages.

Hopefully in the long run, and I sympathize with all of your reasons to want to double-down on one language… I would love to use fewer languages. But once you know 2 it is pretty easy to pickup a 3rd (especially if you don’t intend to write algorithms in it!).

Julia is great at many things, but for now I feel there are plenty of places where you are better off using the more specialized tools that have the best packages (R if you are doing heavy econometrics/statistics, python if you are doing webscraping/neural networks/glue code, Julia for anything nonlinear or where you actually write your own algorithms, Stan for bayesian stuff that fits into its framework, dynare for DSGE stuff, etc.)

But, as I said, just take that as one opinion. But the reason I feel the need to state it is that there are a limited number of people and researcher time that can be put towards investing in Julia packages. Any time spent on reproducing things that R or Python already does very well, is maintenance and development resources that may be taken away from things where Julia is the state-of-the-art.

croberts · October 25, 2019, 8:57pm

This was helpful and what you have said makes sense.

On different projects, my team estimates machine learning models and structural models. A typical research project will also include some simpler models for descriptive statistics. We’ve mainly used python for data cleaning and neural nets, and we’ve used other languages for structural models. Julia is such a beautiful language, it would be nice if it did everything well, but, alas, you’ve convincingly argued that developing the Julia ecosystem for run-of-the-mill econometrics is not worth the opportunity cost, at least for now.

Thank you for your comments.

jlperla · October 25, 2019, 9:01pm

Thanks to you, these are very important questions to discuss.

For now. But the nice thing about Julia is that there is no reason that it can’t be in the medium-term. That is where the solution to the “two language” problem kicks in, and the flexibility of Julia’s metaprogramming means there are no fundamental limitations why it can’t provide a better implementation of all of these things! Lots of people will chip away at these sorts of packages and eventually I think your goal will become possible.

jmboehm · January 18, 2020, 4:48pm

I disagree. JuMP.jl is a “killer app” for GMM and ML estimation when you have closed-form expressions; and I’m not aware of a good substitute in R or python. It also works beautifully with Knitro or Ipopt. For structural micro where you minimize black-box functions, it’s a good substitute for Fortran or C (though probably no better).

I do agree that the ecosystem is not as well developed as in R or python. But I hope that we’ll get there.

jlperla · January 18, 2020, 6:01pm

Yes, 100%. My point only applies to bread-and-butter linear microeconometrics. For nonlinear and structural stuff that is not pre-built (e.g. a logistic estimation is handled great in R) Julia is great, and the warts in the general data ecosystem are not an issue. I should point out that even in the linear world, julia can be the ideal solution for big and non-cookie-cutter problems where you need to implement your own algorithms.

jmboehm · January 18, 2020, 11:07pm

Agreed.

Although the original post is already a bit older, I wanted to point to Econometrics.jl (even though I haven’t really tried it myself). I also recently wrote a package that estimates GLMs with high-dimensional fixed effects (GLFixedEffectModels.jl) which may be useful for applied micro people. It builds on the same package as the excellent FixedEffectModels.jl.

Topic		Replies	Views
QuanEcon DataScience in Julia Lectures	2	680	June 4, 2022
Dynare for Julia	11	7937	January 17, 2018
[ANN] Douglass.jl -- Stata-like interface to Julia DataFrames	4	1649	May 9, 2020
Contributing to QuantEcon Site Feedback	1	1295	December 17, 2018
Errors in Julia QuantEcon lectures Lectures	11	1254	April 2, 2020

Limited support for microeconometrics in Julia

Related topics