Metadata record for AP VoteCast 2018
109687
Inter-university Consortium for Political and Social Research
ICPSR metadata records are licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.
V1
AP VoteCast 2018
109687
http://doi.org/10.3886/E109687V1
Trevor Tompson
Jennifer Benz
Please see full citation.
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
The Associated Press
Ann Arbor, MI: Inter-university Consortium for Political and Social Research
Tompson, Trevor, and Benz, Jennifer. AP VoteCast 2018. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2019-05-14. https://doi.org/10.3886/E109687V1
voters
elections
AP VoteCast is a survey of the American electorate conducted in all 50 states by NORC at the University of Chicago for The Associated Press and Fox News. The survey is funded by AP. The survey of 138,929 registered voters was conducted October 29 to November 6, 2018, concluding as polls closed on Election Day. Interviews were conducted via phone and web, with 11,059 completing by phone and 127,870 completing by web.AP VoteCast combines interviews with a random sample of registered voters drawn from state voter files; with self-identified registered voters conducted using NORC's probability-based AmeriSpeak® panel, which is designed to be representative of the U.S. population; and with self-identified registered voters selected from nonprobability online panels. Interviews were conducted in English and Spanish. Respondents received a small monetary incentive for completing the survey. Participants selected from state voter files were contacted by phone and mail, and had the opportunity to take the survey by phone or online.Note that the data file(s), codebook, and questionnaire used are all included in the .zip file.
50 U.S. states
State
Individuals
Registered voters in the 50 U.S. states
survey data
Probability-based
Registered Voter Sample
In each of the 25 states in which
VoteCast includes a probability-based sample, NORC obtained a sample of
registered voters from Catalist LLC’s registered voter database. This database
includes demographic information, as well as addresses and phone numbers for
registered voters, allowing potential respondents to be contacted via mail and
telephone. The sample was stratified by state, partisanship, age and race. In
addition, NORC attempted to match sampled records to a registered voter
database maintained by L2, which provided additional phone numbers and
demographic information. After the matching, NORC had phone numbers for 86 percent
of sampled records, including cell phone numbers for 60 percent of records with
a phone number. Prior to dialing, all probability sample records are mailed a
postcard inviting them to complete the survey either online using a unique PIN
or via telephone by calling a toll-free number. Postcards are addressed by name
to the sampled registered voter if that individual is under age 35; postcards
are addressed to “registered voter” in all other cases. Telephone interviews
are conducted with the adult that answers the phone. Both online and telephone
respondents provided confirmation of registered voter status in the state.
Nonprobability
Sample
Nonprobability participants were
provided via the Harris Panel, including members of its third-party panels. Digital
fingerprint software and panel-level ID validation is used to prevent
respondents from completing the VoteCast survey multiple times. Nonprobability
respondents provided confirmation of registered voter status in the state.
AmeriSpeak
Sample
During the initial recruitment
phase of the AmeriSpeak panel, randomly selected U.S. households were sampled
with a known, non-zero probability of selection from the NORC National Sample
Frame and then contacted by U.S. mail, email, telephone and field interviewers
(face-to-face). The panel provides sample coverage of approximately 97 percent
of the U.S. household population. Those excluded from the sample include people
with P.O. Box-only addresses, some addresses not listed in the USPS Delivery
Sequence File and some newly constructed dwellings. AmeriSpeak panelists
provided confirmation of registered voter status in the state.
computer-assisted telephone interview (CATI)
web-based survey
Registered voter self-report
Weighting Details
VoteCast employs a four-step
weighting approach that combines the probability sample with the nonprobability
sample, and refines estimates at a subregional level within each state. The 50
state surveys and the AmeriSpeak survey are weighted separately and then
combined into a survey representative of voters in all 50 states.
State
Surveys
First, weights are constructed
separately for the probability sample (when available) and the nonprobability
sample for each state survey. These weights are adjusted to population totals
to correct for demographic imbalances of the responding sample compared to the
population of registered voters in each state. The adjustment targets are
derived from a combination of data from the U.S. Census Bureau’s November 2016
Current Population Survey Voting and Registration Supplement, Catalist’s voter
file and the Census Bureau’s 2017 American Community Survey. The variables used
were:
- Sex (male, female)
- Age (18-34, 35-64, 65+)
- Race/ethnicity (Hispanic, NH-White,
NH-Black, All Other)
- Education (less than high school/high school
grad, some college, 4-year college grad, post-graduate)
- Age * race/ethnicity (18-34, 35-54, 55+ *
NH-White, All Other)
- Education * race/ethnicity (less than HS/HS
grad, some college, 4-year college grad+ * NH-White, All Other)
- Partisanship model score (strong Republican,
lean Republican, lean Democrat, strong Democrat). Probability sample only
- Income (<= 25K, 25-50K, 50-75K, 75-100K,
100+K) Non-probability sample only
- County grouping using AP’s party grouping
(variable “AP_PARTY_REGION”) Non-probability sample only
Prior to adjusting to population
totals, the probability-based registered voter list sample weights are adjusted
for differential non-response related to factors such as availability of phone
numbers, age, race, and partisanship.
Second, all non-probability sample
respondents receive a calibration weight. The calibration weight is designed to
ensure the non-probability sample is similar to a probability sample in regard
to variables that are predictive of vote choice that cannot be fully captured
through the prior demographic adjustments. The calibration benchmarks are based
on county level estimates from a multilevel regression and poststratification
model that incorporates all probability and non-probability cases
nationwide. A national level logistic
regression model was fitted using data from all states (both probability and
non-probability samples) and AmeriSpeak to make predictions for registered
voters at the state-level for Party ID (Democrat, Independent, Republican) and
Country on Right/Wrong Track. These state-level predicted estimates are used as
calibration benchmarks for the non-probability sample for all states. For Party ID, separate models were fitted for
predicting the proportion of Democrats and proportion of Republicans. In
addition, five separate models were fitted based on how the county voted in the
2016 Presidential election (i.e., based on % Trump vote for county/town).
Models included the following individual level variables and county/town level
variables:
- Flag for 18-34 year old registered voter
- Flag for 65+ year old registered voter
- Flag for female registered voter
- Flag for voting for Trump in 2016
Presidential election
- Proportion of non-Hispanic non-White in
county/town
- Proportion 25+ years who are college
educated in county/town
- Population density in county/town
- Median household income in county/town
Third, all respondents in each
state are weighted to improve estimates for substate geographic regions. This
weight combines the weighted probability sample (if available) and the
calibrated non-probability sample, and then uses a small area model to improve
the estimate within subregions of a state. We created between 8 and 30 regions
(county groupings ) for each state based on AP political and geographic strata,
vote choice in previous elections, demographics, and the number of expected
survey completes in each county. We then used these groupings to generate
model-based estimates of vote choice among likely voters. For states with two
or more statewide races, the small domain model was applied to the primary
race.
For each state, there were two
models: 1) predicting percent of vote share that goes for either of the two
major parties’ candidates, 2) predicting percent of major party vote share that
goes for the Democratic/Republican candidate. The following variables were used
as potential covariates in the model: 2016 Presidential election results,
population density, median income, percent below poverty line, percent
unemployed, percent college degree, portion on public assistance, percent
insurance coverage, percent nonwhite, percent citizen, percent 18-34 years old,
percent 65 and older, and percent who have not moved in last year. For each
state, we included in the models: 1) the 2016 presidential vote choice, and
based on model fit, 2) a measure of socioeconomic status, 3) at least one
demographic or geographic measure.
Fourth, the survey results are
weighted to the actual vote count following the completion of the election.
This weighting is done in 8-30 sub-state regions within each state.
National
Survey
The national survey is weighted to
combine the 50 state surveys with the nationwide AmeriSpeak survey. Each of the
state surveys is weighted as described. The AmeriSpeak survey receives a
nonresponse-adjusted weight that is then adjusted to national totals for
registered voters derived from the U.S. Census Bureau’s November 2016 Current
Population Survey Voting and Registration Supplement, the Catalist voter file
and the Census Bureau’s 2017 American Community Survey. The state surveys are
further adjusted to represent their appropriate proportion of the registered
voter population for the country and combined with the AmeriSpeak survey. After
all votes are counted, the national data file is adjusted to match the national
vote for members of the U.S. House of Representatives within each state.
Using Weights
AP VoteCast is designed to be analyzed
using weighted data. The data file includes different weights for different
types of analyses.
- To produce estimates at the state level (e.g.,
percent of Californians who approve of President Trump), the state weights
should be used.
- To produce estimates at the national level
(e.g., the percent of registered voters nationwide who voted for a Democratic
candidate for the House), the national-level weights should be used.
Additionally, the data file
includes weights that represent results at two different stages of data
collection.
- The FINALVOTE weights should be used to produce
estimates that are adjusted to reflect the final vote counts in addition to
demographic, geographic, and calibration adjustments. Certified vote count data
was provided by AP. AP VoteCast recommends using these weights for most
analyses.
-
The POLLCLOSE weights can be used to produce
estimates prior to any adjustments to final vote counts. These weights are
provided for transparency of the methodology to permit comparison of the survey’s
estimates at poll close but prior to adjusting the survey outcome to match the
final vote count.
To reproduce
estimates in AP’s publically-available VoteCast crosstabs of voters and
estimates of voter demographics nationwide, limit analysis to LIKELYVOTER=1 and
cases that are not missing RACE5_VOTE. The FINALVOTE_NATIONAL_WEIGHT variable
should be used for weights.
4.2 percent for the probability sample drawn from the state voter files.