Creating Dummy Variables

A dummy variable denotes whether something is true, which is 1, or false, which is 0. Dummy variables are also called indicator variables.

For instance, foreign in Stata's auto dataset is a dummy variable: 1 if the car is foreign made and 0 if domestic made.
. sysuse auto
. codebook foreign

foreign                                                                Car type

                  type:  numeric (byte)
                 label:  origin

                 range:  [0,1]                        units:  1
         unique values:  2                        missing .:  0/74

            tabulation:  Freq.   Numeric  Label
                            52         0  Domestic
                            22         1  Foreign

In Stata we can state something as true like below: use the dummy variable without explicitly specifying the condition but with the variable name alone. Stata will know that it means if foreign == 1 or if foreign ~= 1.
. list make if foreign
. list make if ~ foreign


One way to create a dummy variable is to use generate with an statement.
. sysuse auto
. gen rep78In = rep78 >= 5 if !missing(rep78)
The result would be 1 where the condition is true (repair record is more than or equal to 5) and 0 elsewhere.

In hierarchical data, in combination with the by prefix , generate and egen can be used to create indicator variables on lower levels.
. sysuse citytemp
. by region (division),sort: gen heat_Ind1 = heatdd > 8000
defines if each division, a subcategory under a region, has heating degree days larger than 8000.

. sysuse citytemp
. by region (division), sort: egen heat_Ind2 = max(heatdd > 8000)
defines if a region has divisions whose heating degree days are larger than 8000.


We can also use tabulate var, generate(newvar) to create a series of indicator variables.
. tab foreign, gen(import) generates two new variables import1, indicating whether the car is domestic, and import2, indicating whether the car is foreign made.

More on creating indicator variables:
William Gould, StataCorp, How do I create dummy variables?

Factor variables

Factor variables create indicator variables from categorical variables.

The example below contains several factor variables:
. sysuse auto
. reg price mpg c.weight##c.weight ib3.rep78 i.foreign

c.weight##c.weight gives us the squared weight, in addition to the main effect of weight.

ib3.rep78 sets the base value at rep78=3 and creates indicators at each value of rep78.

i.foreign creates indicators at each value of foreign.


i. indicates unique values/levels of a group
c. indicates a continuous variables
o. omits a variable or indicator
# specifies interactions
## specifies interactions including main effects

what factor variables actually mean

indicator variables for each level of foreign

indicator variables for all combinations of each value of foreign and rep78

the same as i.foreign i.rep78 i.foreign#i.rep78

indicator variables for all combinations of each value of foreign, rep78 and make (not saying i.make would make sense since it has 74 unique levels)

the same as i.foreign i.rep78 i.make i.foreign#i.rep78 i.rep78#i.make i.foreign#i.make i.foreign#i.rep78#i.make

variables created for the number of the levels of rep78. For each variable, it will be the value of mpg if at the level of rep78 and it will be 0 otherwise.

For instance, if the first observation has rep78=3 and mpg=22, then 3.rep78#c.mpg will be 22 and it will be 0 for 1b.rep78#c.mpg, 2.rep78#c.mpg, 4.rep78#c.mpg and 5.rep78#c.mpg.

selecting levels

3.rep78 at the level where rep78=3

i3.rep78 the same as above

i(2/4).rep78 selects the levels from rep78=2 through rep78=4

i(1 5).rep78 selects the levels where rep78=1 and rep78=5

o(1 5).rep78 omits the levels where rep78=1 and rep78=5

changing the base level

ib#.var changes the base level of the variable, where b is the marker indicating the base value.

For instance, ib3.rep78 sets the base value at rep78=3.

Author: Yun Dai, 2018