Automating Your Work

Why and when would we want to use automation tricks, such as macros, loops, saved results or even writing a program by ourselves? Automating your work saves you from repeating very similar codes over and over again. It also reduces chances of mistakes whenever you try to tweak the codes in each step. Spending time learning the programming basics can do more for us.

Overview of Stata programming
UCLA: Statistical Consulting Group, Introduction to Stata Programming
Gabriel Rossman, Introduction to Stata Programming

Macros

Local and global macros

Macros are strings we assign to represent variables, values, texts, commands, statements etc. Macros can be local and global.

local macro

Local macros only work in the current do-file or program.
local localmacro = exp

This is how you refer to a local macro: `localmacro'. Note that the left side is a backtick(`) usually found on the top left corner of the keyboard, and the right side is a apostrophe(').

Usually we can define macros of strings with or without the quotation marks. However, if the strings contain spaces in between, we need to enclose `" "'. For instance,


local reason `" "Work on a class assignment/paper" "Use specialized databases (e.g. Bloomberg, Wind)" "'

global macro

Global macros work across the programs in Stata.
global globalmacro = exp

We use a dollar sign to refer to a global macro: $globalmacro.

Be careful with the global macros that are accessible from all do-files and programs. Make sure you remember all the global macros you created elsewhere.

modifying and dropping macros

To change the contents of a macro, we simply redefine the contents where they are.
. local container apple orange
. local container apple melon papaya
first defines the local macro container and then modifies the contents of container simply by reassigning the variables.

To drop a macro, use macro drop macro.
. macro drop container

Below we discuss some common scenarios where one can use macros to automate the analyses.

The many uses of macros

defining variable lists

A common use of macros is to hold variable lists for later use.

Below we created a local macro control to hold all control variables, and macros application, open_day and placement to hold variable lists on application, open day performance and placement tests.

We can then use the macros in the OLS regressions where application, open_day and placement have all the relevant independent variables and control has all the control variables.

. local control gender country
. local application personality academic
. local open_day writing interview participation
. local placement math english
. reg gpa `application' `open_day' `placement' `control'

In the case of holding long variable lists in a macro, we can extend the macro each time by referring to itself from the last iteration:
. local control gender country
. local control `control' high_school entrance_exam admission_track
. local control `control' major class

storing commands

Macros can hold commands.

To use macros to hold conditions:
. local condition “if level == 1 & track != 2”
. local placement math english
. summarize `placement' `condition'

To use macros to hold multiple options to make a graph:
. local option1 msymbol(o) mcolor(cranberry) clcolor(cranberry) connect(l)
. local option2 msymbol(th) mcolor(cranberry) clcolor(cranberry) connect(l)
. local option3 msymbol(o) mcolor(dknavy) clcolor(dknavy) connect(l)
. local option4 msymbol(th) mcolor(dknavy) clcolor(dknavy) connect(l)
. graph twoway (scatter length_video week [w = click_video] if group == 1, xlabel(0 (1) 10) `option1') ///
(scatter length_video week [w = click_video] if group == 2, xlabel(0 (1) 10) `option2') ///
(scatter length_text week [w = click_text] if group == 1, xlabel(0 (1) 10) `option3') ///
(scatter length_text week [w = click_text] if group == 2, xlabel(0 (1) 10) `option4'), ///
legend (order(1 "Average User: Video" 2 "Heavy User: Video" 3 "Average User: Text" 4 "Heavy User: Text"))

storing values

Macros can store values, especially constants, to be used in -

algorithms,
. local a = 1.232425
. local b = 3.899878
. local c = 2.566556
. display (-`b'+sqrt((`b')^2 - 4*`a'*`c'))/(2*`a')

or creating new variables.
. local 8hrs = 1000*60*60*8
. gen double gmt = utc + `8hrs'
which is equivalent to
. gen double gmt = utc + msofhours(8)
if the function msofhours() did not come to your mind.

model specifications

Using macros will give us a clear structure in model specifications.

When we need to add groups of variables in nested regression models:
. local control gender country
. local application personality academic
. local open_day writing interview participation
. local placement math english
. eststo clear
. eststo: reg gpa `control' `application'
. eststo: reg gpa `control' `application' `open_day'
. eststo: reg gpa `control' `application' `open_day' `placement'
. esttab, b(%9.1f) t(%9.1f) r2(%9.6f)

In different models:
. reg offer `rubric1' `rubric2' `rubric3' `control'
. logit offer `rubric1' `rubric2' `rubric3' `control'

We will see below how macros can be useful in loops and programs.

Loops

foreach

foreach stores the list and loops over the items.

foreach loopname in list{
...
}

foreach loopname of varlist varlist{
...
}

foreach loopname of numlist numlist{
...
}

foreach loopname of local localmacro{
...
}

foreach loopname of global globalmacro{
...
}

For each item of the list, the loop loopname executes the commands specified in the brackets for each value of the item (variable/number/local macro/global macro etc.).

When we need to refer to the specified loopname, we must use the pair of quotes `loopname'. Note the difference of the backtick (`) and the apostrophe (').

a quick note on the loop format

Each loop starts with an open curly bracket that must stay on the same line as foreach. Nothing should follow the bracket on that line.

Another curly bracket should appear on a line by itself to conclude the loop.

Between the brackets are the commands. You may notice the indentation before the commands: this is not mandatory for the commands to run, but it is a good programming habit to give yourself and readers a clear structure of the commands’ logic.

We will see examples below.

t-tests with the variable names displayed for each loop

Quite often we need to perform t-tests on a group of variables with very similar commands. Instead of changing the variable names one by one, we could simply write a loop to do the work for us.

. local course calculus writing1 writing2
. foreach var in `course'{
. display _newline "ttest `var', by(gender)"
. ttest `var', by(gender)
. }

Alternatively, we can say:
. local course calculus writing1 writing2
. foreach var of local course{
. display _newline "ttest `var', by(gender)"
. ttest `var', by(gender)
. }

The difference is that the second method is faster and uses less memory. Besides, in case one would like to change the contents of the local macro course, it also allows adding new elements to the end of the list in the loop.

appending many files

In some cases we have hundreds of files to append. Using a loop to automatically append all files in the directory for us could save us a lot of time otherwise spent on manually appending one file to another.

. local dtafiles: dir . files "*.dta"
. foreach file of local dtafiles{
. preserve
. use `file', clear
. save temp, replace
. restore
. append using temp
. }
. rm temp.dta
. save filename, replace

In this example, local dtafiles: dir . files "*.dta" lists all Stata files in the current working directory.

dir [“]dir[”] {files|dirs|other} [“]pattern[”] is the macro extended function for file names. Type help extended_fcn to find out more.

rm [“]filename[”] removes files stored on disk for Mac and Unix users. For Windows it is erase [“]filename[”].

We will explain preserve and restore below.

forvalues

forvalues loops over consecutive values.

forvalues loopname = range{
...
}

Suppose we want to perform a series of t-tests on a single variable for each of its level:
. forvalues level = 1/3{
. display _newline "ttest calculus, level = `level'"
. ttest calculus if cal_placement == `level', by(gender)
. }

For this specific example, in fact we have a more efficient way to loop over each level, which is to use levelsof. We will see how to do the magic in the nested loops and where we discuss levelsof.

a quick note on number lists

#1(#2)#3 from #1 to #2 with the increment of #3; #2 can be negative
#1/#2 #1 through #2
#1 #2 #3 #1, #2 and #3

Nested loops

Loops can be nested.

To explain how it works let's perform another t-test.

. local course calculus writing1 writing2
. levelsof HSrank, local(level)
. foreach x of local course{
. foreach y of local level{
. display _newline "ttest `x' if HSrank =`y',by(gender)"
. ttest `x' if HSrank == `y', by(gender)
. }
. }

Here we performed a t-test by gender for every course at each level of HSrank.

We will explain how levelsof works below.

Let’s take a look at another example using nested loops to clean each file before appending, expanding on the example we have seen above.

. local csvfiles: dir . files "tracking.log-*.csv"
. foreach file of local csvfiles {
. preserve
. import delimited `file',clear
. foreach id in "a" "b" "c" "d" "e" "f" "g"{
. drop if username == "`id'"
. }
. drop event* context* page session host
. save temp.dta, replace
. restore
. append using temp
. }
. rm temp.dta
. save test, replace

while

while runs and repeats the commands as long as the condition specified is true.

while exp{
...
}

Here we have an example in the simplest form of how we may use while.
. local i = 1
. while `i'<=3{
. display _newline "`i'"
. sum gpa if placement == `i'
. local i = `i'+1
. }

We can further include else statements.

while exp{
if{
...
}
else{
...
}
...
}

To expand on the last example:
. local i = 1
. while `i'<=5{
. if `i'<=3{
. sum writing if interview == `i'
. }
. else{
. sum application if interview == `i'
. }
. display _newline "`i'"
. local i = `i'+1
. }

Note the difference between the branching if in if `i'<=3{} and the conditional if in if rep78 == `i'.

We can include if/else statements within if/else statements.

while exp{
if{
…
}
else{
if{
…
}
else{
…
}
…
}
…
}

Stored Results

r-class and e-class commands

Results of calculations are stored by Stata commands so that they can be accessed in other commands and calculations later.

There are five classes of Stata commands:
r-class general commands that store results in r()
e-class estimation commands that store results in e()
s-class parsing commands that store results in s() used by programmers
n-class commands that do not store in r(), e(), or s()
c-class system parameters and settings that store in c()

Commands producing statistical results are either r-class or e-class: e-class for estimation results and r-class otherwise.

Following the r-class or e-class commands, we can obtain all the stored results of a command by typing return list or ereturn list respectively.

Results will be replaced the next time you execute another command of the same class. If we need to store the returned results, we need to use a macro. We will see an example below.

types of results

There are four types of results:
Scalars: numbers
Macros: strings
Matrices: e(b) coefficient vector and e(V) variance–covariance matrix of the estimates (VCE)
Functions: the only function existing is e(sample), which evaluates to 1 (true) if the observation was used in the previous estimation and to 0 (false) otherwise.

To see what the result lists actually look like, try typing
. sysuse auto
. regress price mpg
. ereturn list
which gives us the full lists of the results that the command regress offers.

Let’s review some examples below where we use the stored results for various purposes.

centering on the mean

. sysuse auto
. sum mpg
. gen mpg_c = mpg-r(mean)

The results will be lost the next time you run another r-class command. In order to be able to use the mean of mpg later, this is what we can do:
. sysuse auto
. sum mpg
. local mpg_mean = `r(mean)'
. gen mpg_c = mpg-`mpg_mean'

displaying and using the stored matrices

To list the matrices, we can
. sysuse auto
. reg price mpg
. matrix list e(b)
. matrix list e(V)

To use the matrices later, we need to first store the matrices:
. matrix b = e(b)
. local increment = b[1,1]
. display "increment `increment'"

using saved results in graphs

. sysuse auto
. reg mpg weight
. ereturn list
. local r2 = e(r2)
. twoway (lfitci mpg weight) (scatter mpg weight), note(R-squared =`r2')

adding stats from postestimation in tables

. sysuse auto
. reg mpg weight foreign
. test foreign weight
. outreg2 using myfile, adds(F-test, r(F), Prob > F, r(p)) replace

predicted probabilities on samples used in the previous estimation

. sysuse auto
. reg mpg weight if foreign & rep78 <=4
. predict fv if e(sample)

Coefficients and standard errors

We touched upon briefly how to use matrix to access the coefficients from a regression.

Alternatively, we can use Stata system variables to obtain coefficients and standard errors:
_b[var] coefficient
_se[var] standard error

For instance,
. sysuse auto
. reg price weight
. dis _b[weight]
. dis _se[weight]

levelsof

In loops, levelsof is useful to store the values of a variable in a local macro for later use by specifying the local() option.
. levelsof writing_level, local(wlevel)
. foreach x of local w1evel{
. ttest writing_gpa if writing_level == `x', by(gender)
. }

Outside loops, levelsof can display the unique values of a variable.
. levelsof writing_level
. display “`r(levels)'”

Macros

See storing commands and storing values using macros.

More on using stored results
UCLA: Statistical Consulting Group, How Can I Access Information Stored after I Run a Command in Stata (Returned Results)? | Stata FAQ

Useful Commands in Programs

quietly

quietly runs the command but suppresses the output.

Try running the commands below. Compare
. sysuse auto
. quietly reg price mpg if foreign
. predict fv if e(sample)

and
. reg price mpg if foreign
. predict fv if e(sample)

In this example we do not care so much about the regression outputs but rather the predictions in the estimated sample. Thus we asked Stata to suppress the reg outputs.

preserve; restore

In pair, preserve and restore preserves and restores data. This is useful when we want to keep the original data untouched when a program ends while some damage needs to be done while we run the program.

Compare what the dataset would be like with and without preserve and restore.
. sysuse citytemp
. preserve
. collapse (mean) heatdd, by(region division)
. list heatdd
. restore

Without preserve and restore, after a collapse the dataset will be aggregated to summary statistics and thus the original dataset will be damaged.
. collapse (mean) heatdd, by(region division)