Hi all,
I’ve been using Julia for model estimation in a couple of papers, but always found myself having to go back-and-forth between Stata (for data manipulation) and Julia (for accessing JuMP or the solvers), sometimes using very ugly hacks.
So I’ve decided to write a little package, called Douglass.jl, that implements a Stata-like syntax to do basic data manipulation on Julia DataFrames. It parses the command and calls a macro that returns the corresponding code from DataFrames.jl or DataFramesMeta.jl that does the task. That means it lives in the current scope, and you can use any functions or variables in the expressions (think gen myvariable = myfunction(x)
). Besides that, you can use syntax that is very similar to Stata’s:
using Douglass, RDatasets
df = dataset("datasets", "iris")
# set the active DataFrame
Douglass.set_active_df(:df)
# create a variable `z` that is the sum of `SepalLength` and `SepalWidth`, for each row
d"gen :z = :SepalLength + :SepalWidth"
# replace `z` by the row index for the first 10 observations
d"replace :z = _n if _n <= 10"
# drop a variable
d"drop :z"
# construct the within-group mean for a subset of the observations
d"bysort :Species : egen :z = mean(:SepalLength) if :SepalWidth .> 3.0"
and so on.
The package is still in very early stages and hence not yet ready for use in research papers. I’m trying to get a sense for whether the package would be useful for other people besides me, and may or may not invest time into this depending on that. Please consider giving a ‘thumbs up’, Github star etc if you feel it could be useful. Of course I would also appreciate people trying it out and giving feedback. Please file bugs in the ‘issues’ tab on the Github repo, or post your thoughts below.
The package is not yet registered, so you have to install it with
] add https://github.com/jmboehm/Douglass.jl.git
Best, Johannes