The do file below is self-contained. You can cut and paste it into your do-file editor and produce all the example tables.
For a PDF showing all the output, click here: tables_examples.pdf.
/* tables_examples.do
The tables command was changed substantially in Stata Version 17.
The new version can do **almost anything**, but it takes time
to master. These examples should help you get started with it.
There is also a *table editor*, which can be used to prepare tables
for publication (changing margins, fonts, separator lines, etc). This
do-file just illustrates how to get the table outline and content.
For more information, you can find videos on YouTube from StataCorp
and other users.
Paul Jargowsky, August 2022
Revised October 2022
Revised April 2023
*/
version 17
cls
clear
webuse nhanes2l
label var highbp “Blood Pressure”
label define highbp 0 “Normal” 1 “High”
label values highbp highbp
* General structure:
* table (row stuff) (column stuff) (subtables), statistic(….)
* Note: (subtables) repeats the row_x_column table for each
* level of the variables specified
* 1. Frequencies (counts) — the default
* race by sex, then table is repeated by region and total
table (race) (sex) (region)
* same variables, but in one table with region on rows
table (region) (race sex), nototals
* same variables, but in one table with region on columns
table (race sex) (region), nototals
* 2. Percentages: stat(percent, across(variables))
* In the table specification, “var” represents variables
* and “result” represents different statistics
* a) Cell percents across the entire table
* Adds to 100 at extreme lower right (last subtable)
table (race) (sex) (region), ///
statistic(percent) ///
nformat(%9.1f percent ) sformat(%s%% percent )
* Cell percents within the table/subtables
* Adds to 100 at lower right of each subtable
table (race) (sex) (), ///
statistic(percent) /// <– is default
nformat(%9.1f percent ) sformat(%s%% percent )
/* same as:
table (race) (sex) (region), ///
statistic(percent, across(race#sex))
nformat(%9.1f percent ) sformat(%s%% percent )
*/
* b) Row percents (over the columns)
* Adds to 100 at right of every row
table (race) (sex) (region), ///
statistic(percent, across(sex)) ///
nformat(%9.1f percent ) sformat(%s%% percent )
* c) Column percents (over all rows)
* Adds to 100 at bottow of every column
* i) When there is one table
table (race) (sex) (), ///
statistic(percent, across(race)) ///
nformat(%9.1f percent ) sformat(%s%% percent )
* ii) when there are subtables:
* Columns percents over all subtables
* Adds to 100 in total row of total subtable
table (race) (sex) (region), ///
statistic(percent, across(race#region)) ///
nformat(%9.1f percent ) sformat(%s%% percent )
* iii) Column percents over the rows w/in subtables
* Adds to 100 in total row of all subtables
table (race) (sex) (region), ///
statistic(percent, across(race)) ///
nformat(%9.1f percent ) sformat(%s%% percent )
* Note: If multiple vars on a row or column, can specify
* row or column percents over one or both
* Row percentages over race only
table (region) (sex race) (), ///
statistic(percent, across(race)) ///
nformat(%9.1f percent ) sformat(%s%% percent )
* Row percentages over sex only
table (region) (race sex) (), ///
statistic(percent, across(sex)) ///
nformat(%9.1f percent ) sformat(%s%% percent )
* Row percentages over both race and sex
table (region) (race sex) (), ///
statistic(percent, across(race#sex)) ///
nformat(%9.1f percent ) sformat(%s%% percent )
* 3. Descriptive Statistics
* Stats on variables (on rows, organized by vars in columns, )
table (race) (result var) (), ///
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f mean) nformat(%9.2f sd)
* Stats on variables (vars over results in columns )
table (race) (var result) (), ///
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f mean) nformat(%9.2f sd)
* Descriptive Statistics on variables
* Stats on variables (on rows, organized by stat)
table (result var) (sex) (), ///
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f mean) nformat(%9.2f sd)
* Stats on variables (on rows, organized by var)
table (var result) (sex) (), ///
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f mean) nformat(%9.2f sd)
* Stats on variables (vars on rows, sex over results on cols)
table (var) (sex result) (), ///
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f mean) nformat(%9.2f sd)
* Stats on variables, stats by table
table (var) (sex) (result), ///
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f mean) nformat(%9.2f sd)
* Stats on variables, vars by table
table (result) (sex) (var), ///
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f mean) nformat(%9.2f sd)
* results moved to column
table (race) (sex result) (), ///
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f mean) nformat(%9.2f sd)
* variables moved to columns
table (race sex) (var result) (), ///
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f mean) nformat(%9.2f sd)
* 4. Combine frequencies, percentages, stats
* Percents = % with & without HBP by Sex
* Uses a built in “style”. There are others.
table (var result) (sex highbp) (), totals(sex) ///
statistic(frequency) ///
statistic(percent, across(highbp)) ///
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f percent ) sformat(%s%% percent ) ///
nformat(%9.1f mean) nformat(%9.1f sd) style(Table-1)
* Just those with high blood pressure, by specifing
* highbp[1] shows only those with highbp==1
table (var result) ( highbp[1] sex ) (), totals(sex) ///
statistic(frequency) ///
statistic(percent, across(highbp)) ///
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f percent ) sformat(%s%% percent ) ///
nformat(%9.1f mean) nformat(%9.1f sd)
* Note: in this table, the percents are
* percent of males and females *with* high
* blood pressure (compare to previous table).
* It doesn’t add to 100 in any direction.
* You implicitly know that the percent w/out
* HBP is 1-p, but they are not shown.
* This table can’t be done using “if highbp==1”, because
* then base is not included (all males, all females)
* and the percentages would not be correctly calculated
* Use “if” to limit to people with HBP only
table (var result) (sex) () if highbp == 1, totals(sex) ///
statistic(frequency) ///
/* statistic(percent, across(highbp)) */ /// <- causes error
statistic(mean age height weight) ///
statistic(sd age height weight) ///
nformat(%9.1f percent) sformat(%s%% percent ) ///
nformat(%9.1f mean) nformat(%9.1f sd)
* You can’t get the incidence of HBP, but the
* statistics are correct. (Compare to above.)
* 5. (More advanced) Table of Hypothesis Tests
table (command) (result), ///
command(Males=r(P1) Females=r(P2) Difference=r(P_diff) r(p): ///
prtest diabetes, by(sex)) ///
command(Males=r(P1) Females=r(P2) Difference=r(P_diff) r(p): ///
prtest heartatk, by(sex)) ///
command(Males=r(P1) Females=r(P2) Difference=r(P_diff) r(p): ///
prtest highbp, by(sex)) ///
nformat(%5.3f) style(table-right)
* Fix up labels.
* “collect” command used to change labels,
* appearance, titles, etc.
collect label levels command 1 “Diabetes” 2 “Heart attack” 3 “High BP”, modify
collect label levels result p “p-value”, modify
collect title “T-Tests by Gender”
collect preview
* 6. More about totals
* a) No totals
table race (sex highbp) (), nototals
* b) Row totals (mention row variable)
* i) just one row total
table race (sex highbp) (), totals(race)
* ii) separate row totals by highbp combining male + female
table race (sex highbp) (), ///
totals(race#highbp )
* iii) separate row totals for highbp w/in male and female
table race (sex highbp) (), ///
totals(race#sex )
* iv) combining the above
table race (sex highbp) (), ///
totals(race#sex race#highbp race)
* Column totals (intersection of column vars)
table race (sex highbp) (), totals(sex#highbp)
* Both row and column totals
table race (sex highbp) (), ///
totals(race race#sex#highbp sex#highbp)
* Note the hole where the grand total should be!
* How to fix that shown below.
* Four-way table
* Table with no totals
table ( sex race ) ( highbp diabetes ) (), ///
nototals
* Row total across both race and sex
table ( sex race ) ( highbp diabetes ) (), ///
totals(sex#race)
* Column total across both highbp and diabetes
table ( sex race ) ( highbp diabetes ) (), ///
totals(highbp#diabetes)
* Both row and column totals
table ( sex race ) ( highbp diabetes ) (), ///
totals(sex#race highbp#diabetes )
* But again there is a whole where the grand total should be!
* “_cons” is secret code for the grand total in multi-way tables.
table ( sex race ) ( highbp diabetes ) (), ///
totals(sex#race highbp#diabetes _cons)
* I say secret, because it’s undocumented, though
* Stata tech says they will document it “soon”
* (see email from Stata tech support below!)
* You can also use this code to get percents
* or statistis for the total sample
table ( sex race ) ( highbp diabetes ) (), ///
totals(sex#race highbp#diabetes _cons) ///
stat(mean age) nformat(%9.1f mean)
table ( sex race ) ( highbp diabetes ) (), ///
totals(sex#race highbp#diabetes _cons) ///
stat(percent)
/*
Stata Technical Support <tech-support@stata.com>
Paul Jargowsky
Dear Paul,
We would like to thank you for bringing this issue to our attention. Our
editorial staff will include the -_cons- suboption with the -totals()-
in the PDF manual entry for -table-.
Again, we appreciate you bring this issue to our attention. Please let
us know if you have further questions.
Best regards,
Pei-Chun
*****************************
Pei-Chun Lai, Ph.D.
Staff Statistician I
tech-support@stata.com
StataCorp LLC
*****************************