Is my Methodology Valid?
Last night I had a brief Twitter discussion with Ace regarding whether or not what I am doing has any validity. To be fair, he is being reasonably skeptical of whether or not poll rebalance is viable, and can even be done. Do we have enough information to even come up with valuable conclusions? Here is his final tweet from last night:
I believe most polls are tilted too Democrat. What I doubt is that is is possible to post-facto rebalanced missing data. - Ace
Let me make my argument why I believe this is a valid methodology. The be sure, we won't know if I am right until the election. This is when we will know the actual vote totals and the partisan split of the electorate. However, I think we have some data that gives indication that this methodology might be valid.
First of all, my basic belief is that elections are based on two factors only.
- How many partisans of each side can the parties get to the polls?
- What is the opinion of the non-partisans regarding the two candidates?
Democrats vote for Democrats or don't vote. Republicans vote for Republicans or don't vote. Independents will vote for one or the other candidate, based on the persuasiveness of the arguments and the facts on the ground (e.g. I hate the war, the economy sucks, I want my free stuff).
I fundamentally do not believe that Democrats voting for Romney, or Whites supporting Obama are of any practical value in predicting voter behavior. People self identify as either partisan (which does not mean they are necessarily registered in their preferred party), or non-partisan and willing to change their vote election by election.
Partisans move from one party to another slowly, and once they move it is a personal decision based on core philosophy. They will not change their minds based on the candidate and vote for the other team. But they might choose not to vote, which is reflected by enthusiasm.
Enthusiasm is ultimately determined through turnout. If Republicans are more enthusiastic than Democrats, then they will show up at the polls in higher numbers. This was demonstrated in 2008. As I've mentioned before, most people misunderstand the 2008 election. They think 7% more Democrats showed up to vote for Obama. This isn't true. Democrats were able to get their base to enthusiastically show up at 2% higher rate than in 2004 (which was also a good turnout for them). The real difference is that 5% of Republicans didn't bother to vote, because McCain didn't enthuse them, and McCain did not run an effective GOTV operation. The final factor is that non-partisans supported Obama overwhelmingly.
The other day, Rush discussed his belief that people don't change their minds as quickly as the polls seem to indicate. He believes that core voting decisions are established, and that the voters do not swing wildly between the candidates. I think he is right on this. Poll fluctuations we are seeing are based on the sampling variations, and the fact that RCP does not correct for these fluctuations and skews. They are comparing apples to oranges, and trying to tell us that we really were expecting a fruit salad.
I also have a problem with RCP in that it can be easily "gamed". A few polls that are purposely biased can skew the average and show momentum for one of the candidates. We have proof that the Obama campaign is purposely disabling credit card validation in their online fundraising operations. Why is it so hard to believe that some polls are being purposely manipulated to affect the RCP average?
My argument to Ace is that the poll internals
do provide all of the information needed to get a good view of what the election results will be
given a specific turnout model. They provide the partisan split in the votes, and they (usually) provide the non-partisan preference between the candidates. Using this information, it is possible to adjust the poll results such that all polls averaged can be compared using the same turnout baseline. We are comparing apples to apples, and getting applesauce.
Consider the following chart:
This chart tracks the RCP average, the Rasmussen Daily tracking poll, and my 2010 turnout average from Sept 26th to October 23rd. Notice a couple trends. From the 26th to the 1st, Romney was gradually sliding, until the first debate. After the debate, his support began to move up, and then on October 13th we hit a very stable spot in the race. In the 2010 turnout model, the Romney lead stayed very constant for 9 days. During that period, there were no real changes in the state of the race, no gaffes, nothing that was really changing the momentum. However, during that same period, Rasmussen stayed pretty even, bouncing between R+1 and R+2, while RCP began to drop down to an Obama lead.
My point is that there was no reason for this drop, other than the new introduction of polls with widely divergent samples. The reweighted polls showed stability in the race, that reflected the our general sense of where the race stood during that time.
The RCP average did not.
Now I am not saying that the election will resemble 2004 in turnout. I am only saying that rebalancing to potential turnout scenarios gives a better picture of the state of the race over time. This is why I'm offering 5 different turnout models. Voter enthusiasm and GOTV operations will determine which of the models ends up being the valid view of the election.
What my models do provide is a good view of what results we can expect if the electorate resembles a specific model. Since we know that 2008 level turnout is highly unlikely, we can then say "If we see a D+3, then the polls say this" and "If the GOP repeats the 2004 turnout, then the polls say that". I find this more useful that looking at a D+9 poll that says Obama is ahead by 5, calling it BS and ignoring it. It also helps in places like Ohio, when we get a series of 7 out of 10 polls that sample Democrats between D+6 and D+11. We are able to get value out of the polls anyway, rather than ignore them and getting a badly skewed view of what is really happening in Ohio.