A New Era of Selection Bias
Predicting 2022, Part 2: We are in The Awareness Era of Public Opinion. Technology has led to lots of polling which media outlets have woven into political narratives, and social media has created a world where many are aware of those public opinion numbers and narratives. The result is amplified selection bias that has been the primary culprit of the "bad polls" over the past 10 years.
Why do people take polls? The answer depends on the type of poll. For probability polls, polling can represent society’s voice to the current/future elected officials and the unelected bureaucrats who make up the government. So, the main incentive is simple: be heard and achieve better representation. Pew Research Center, which I would recommend for first-rate public opinion content, sums it up very well in the FAQ section of their website:
Why Should I Participate in Surveys?
Polls are a way for you to express your opinions to the nation’s leaders and the country as a whole. Public officials and other leaders pay attention to the results of polls and often take them into account in their decision-making. If certain kinds of people do not participate in the surveys, then the results won’t represent the full range of opinions in the nation.
For non-probability polling conducted using panels or online/mobile app intercepts, the incentive to take a poll is to receive immediate compensation in the form of money or content. When predicting elections, pollsters historically have been hesitant to use non-probability data collection methods for various reasons, but mainly because pollsters generally don’t trust these methods to create a representative sample of the population. (We will explore this more later in the article.)
So, why have polls (that have largely used probability sampling) been so wrong for the last decade? Our theory is that new technology has fueled a type of incentive to participate in polls that is more attractive to a certain subset of the population– driving higher participation rates by a group of people very hard to define and control with the traditional approach to stratified probability sampling that most pollsters employ in their methodologies.
The Awareness Era of Public Opinion
Have you witnessed the media use polling on an issue or an election to contribute to a narrative that supports its particular worldview, either on the right or left? If you are thinking “yes,” then you are with the majority (80%) of Americans who said they agree that this is happening. Now imagine that you went back in time to the mid-1990s and were asked the same question. At that time, there was no social media or smartphones, and polls were slow, expensive, and didn’t grow on trees as they do on sites like FiveThirtyEight and Real Clear Politics today. Would most voters back then have considered polling data as an ever-present ingredient in daily political narratives?
The why behind this travel-back-in-time exercise is to illustrate the point that many millions of voters are currently exposed to political narratives that are supported by polling and that this is a reality that didn’t exist in the golden days of landline-based, probability polling. This new reality matters because it has changed the incentive structure for responding to probability polling.
Imagine you identify as an Independent and you are very upset about the Dobbs ruling that sent abortion policy to the states. You see a headline that reads, “The majority of Americans do not approve of the Dobbs Ruling,” and the following day, “Due to Dobbs, Democrats are surging in polls and may keep the Senate,” – you might feel validated, right? This feeling is the new incentive to contribute to that narrative the next time you receive a text message or a phone call asking you to take a poll. Likewise, if you are a Republican, do headlines about Biden’s record-low approval ratings or the forthcoming Red Wave have value to you? Is this feeling an incentive to participate in a poll? These events in polls are the first turn of a Flywheel of Expression that ends up building on their own momentum.
We are going to focus mostly on this Flywheel of Expression theory for the rest of the article, but we do think it’s important to quickly address two other forces of selection bias that are being amplified in this new era. The first is the Spiral of Silence theory that people with minority opinions are less likely to express those opinions. Over the last couple of years this was on full display, however, we don’t think it is a large force driving selection bias or lower participation rates amongst people with minority opinions. On the other hand, this Spiral of Silence theory shouldn’t be confused with what we call the rebel mentality, by which the original incentive to achieve better representation is altogether eliminated for certain groups of voters. This force in selection bias is something we wrote about in 2020 and has been more recently discussed by Trafalgar Group and other pollsters as the “Submerged Trump Voter.” The idea is that people who truly feel they are the enemy of the state due to their beliefs begin to lose the incentive to communicate those beliefs through polling to achieve better representation. We will discuss this force a little more in Part 4 of this series where we do think this is a major factor in states like Michigan and Pennsylvania but is not playing as large of a role in states like Georgia and Florida.
How to Handle the Flywheel of Expression
As election day nears the force of this flywheel is mitigated to some extent because almost all groups respond to surveys at higher rates in October when the energy of the election is in the air. This change in participation rates is why we see “Polling actually wasn’t that terrible” articles after an election which conveniently forget the August and September polls that were 12 points off from the final results.
But let’s take a look at how to correct this bias problem for the other 102 weeks of the election cycle.
What’s important to understand about the Flywheel of Expression’s impact on increasing respondent participation rates is that it has a very specific type. Ultimately, some people get more value out of seeing their beliefs confirmed online because (a) they get a bigger reward out of the idea of being in the majority viewpoint and/or (b) they place a higher value on the media outlets that are building the narratives around these viewpoints and/or (c) they trust polling “experts” at higher levels when they see our data online.
This type of person exists in all demographic groups, geographic groups, and even political groups like party affiliation. This fact that the variables we need to control for are more about a person’s psychological profile than their demographic makes the flywheel a tricky problem for pollsters. At Wick Insights, we think there are three possible approaches to handling the problem:
Control the incentives: Incorporate non-probability polling methods to ensure the primary incentives are the same for everyone. This would only be a solution for statewide and national research because the number of respondents required for local, state house, state senate, and congressional races currently exclude panels and app intercept solutions as a major piece of the solution.
Control the symptoms: Since psychological profiles and “the way you respond to information online” are not variables that can be used for quota sampling, our best option is to find variables related to the choices people make that (1) there are state and national statistics available for quotas to be set and (2) that we think people with this psychological profile may make at higher or lower rates. For example, are they more likely to: get a post-graduate degree, work in the public/private sector, be a parent, attend church, get vaccinated and/or boosted, etc…
Approach 1: Controlling incentives by using non-probability sampling.
Disclaimer: Non-Probability Research such as panel, app intercept, and other convenience samples can be very valuable and sometimes Wick Insights would even recommend them above other research because the benefits of speed, affordability, and flexibility outweigh some of the weaknesses highlighted in this article.
Eventually, we hope to add non-probability data collection methods such as application intercepts and online panels to the mix for election polling, but we just haven’t been able to get there yet. In our September Georgia poll (n2,022), we tried using a 50% probability sample using text-to-web and IVR recruitment and a 50% non-probability sample using panels and application intercept for recruitment.
Despite using a number of techniques, we couldn’t get the results to pass the eyeball test. We will provide a more detailed report in October for subscribers, so be sure to subscribe to our email list at the end of this article. But to illustrate the point let’s take a quick look at urbanicity. For respondents who said they live in a city, the results looked pretty good, and there was only a one-point variance when looking at key measures like the Kemp vs Abrams ballot question. In rural areas, however, there was a much larger problem as you can see in Table 1 below.
Table 1: Variance in Method by Urbanicity
Could it be that people in rural areas who have joined panels and who are frequent mobile app users are not an apples-to-apples representation of the rural population as a whole? We think this is probably the case. Now, advocates of non-probability sampling may say, “you are just doing it wrong… add party as a nested quota underneath each demographic to help correct for this.” But that also didn’t work for a variety of reasons that we will get into in a future article.
Approach 2: Control the symptoms.
There is no easy and direct way to control for the fact that certain people are being incentivized at higher and lower rates than others. And there also is no cure for it unless social media and the media stop using polls to support a particular worldview. But we can identify symptoms of this problem and start to control those symptoms by finding segments of society (that historically might be afterthoughts) who are responding to polls at higher or lower rates. If logic (or instincts) support the idea that we should set quotas or use weights to correct for the over or under-representation, then we do it. As an example, the most obvious segment that most pollsters have already recognized is that voters with post-graduate degrees are answering surveys at much higher levels than they did ten or fifteen years ago. Even though this was missed by a lot of pollsters in 2020 who weren’t looking closely at education or who were grouping bachelors and post-graduates together in a “4 years of college or more” segment, this case of over-representation has been easy to correct. But the real questions pollsters should be asking are:
- Since Education Level as a variable says something about a person, and it happens to be a choice that is measured in publicly available data, and it was a variable that improperly controlled for was one of the reasons polling was so bad in 2020…. then shouldn’t we identify other choices that might help set smarter quotas and make sure other segments aren’t entering into the surveys at too high or low of rates?
For Wick, the answer is yes… which leads to the next question:
- What is it about someone who made the choice to get a postgraduate degree that makes them respond to surveys at such a higher rate than someone with an associate’s or bachelor’s degree when compared to 10 years ago? Can this offer a hint as to what variables to use moving forward?
In Part 4 of this series, we walk through the choices people make that we decided to include in our 2022 battleground polling and how incorporating those choices in our definition of a “representative sample” in these states impacted the results of the poll.
Get The Whole Truth
Subscribe to The Whole Truth, our email newsletter that makes sure you don’t miss a single insight and gets you special access to deeper dives and exclusive content.
Let’s get in touch.
Are you a media representative wishing to interview our CEO and Chief Pollster, David Burrell? Let’s get something scheduled. Email: email@example.comSchedule now