Dave in Fla's Poll Analysis

Tuesday, October 23, 2012

Is my Methodology Valid?

Last night I had a brief Twitter discussion with Ace regarding whether or not what I am doing has any validity. To be fair, he is being reasonably skeptical of whether or not poll rebalance is viable, and can even be done. Do we have enough information to even come up with valuable conclusions? Here is his final tweet from last night:

I believe most polls are tilted too Democrat. What I doubt is that is is possible to post-facto rebalanced missing data. - Ace

Let me make my argument why I believe this is a valid methodology. The be sure, we won't know if I am right until the election. This is when we will know the actual vote totals and the partisan split of the electorate. However, I think we have some data that gives indication that this methodology might be valid.

First of all, my basic belief is that elections are based on two factors only.

How many partisans of each side can the parties get to the polls?
What is the opinion of the non-partisans regarding the two candidates?

Democrats vote for Democrats or don't vote. Republicans vote for Republicans or don't vote. Independents will vote for one or the other candidate, based on the persuasiveness of the arguments and the facts on the ground (e.g. I hate the war, the economy sucks, I want my free stuff).

I fundamentally do not believe that Democrats voting for Romney, or Whites supporting Obama are of any practical value in predicting voter behavior. People self identify as either partisan (which does not mean they are necessarily registered in their preferred party), or non-partisan and willing to change their vote election by election.

Partisans move from one party to another slowly, and once they move it is a personal decision based on core philosophy. They will not change their minds based on the candidate and vote for the other team. But they might choose not to vote, which is reflected by enthusiasm.

Enthusiasm is ultimately determined through turnout. If Republicans are more enthusiastic than Democrats, then they will show up at the polls in higher numbers. This was demonstrated in 2008. As I've mentioned before, most people misunderstand the 2008 election. They think 7% more Democrats showed up to vote for Obama. This isn't true. Democrats were able to get their base to enthusiastically show up at 2% higher rate than in 2004 (which was also a good turnout for them). The real difference is that 5% of Republicans didn't bother to vote, because McCain didn't enthuse them, and McCain did not run an effective GOTV operation. The final factor is that non-partisans supported Obama overwhelmingly.

The other day, Rush discussed his belief that people don't change their minds as quickly as the polls seem to indicate. He believes that core voting decisions are established, and that the voters do not swing wildly between the candidates. I think he is right on this. Poll fluctuations we are seeing are based on the sampling variations, and the fact that RCP does not correct for these fluctuations and skews. They are comparing apples to oranges, and trying to tell us that we really were expecting a fruit salad.

I also have a problem with RCP in that it can be easily "gamed". A few polls that are purposely biased can skew the average and show momentum for one of the candidates. We have proof that the Obama campaign is purposely disabling credit card validation in their online fundraising operations. Why is it so hard to believe that some polls are being purposely manipulated to affect the RCP average?

My argument to Ace is that the poll internals do provide all of the information needed to get a good view of what the election results will be given a specific turnout model. They provide the partisan split in the votes, and they (usually) provide the non-partisan preference between the candidates. Using this information, it is possible to adjust the poll results such that all polls averaged can be compared using the same turnout baseline. We are comparing apples to apples, and getting applesauce.

Consider the following chart:

This chart tracks the RCP average, the Rasmussen Daily tracking poll, and my 2010 turnout average from Sept 26th to October 23rd. Notice a couple trends. From the 26th to the 1st, Romney was gradually sliding, until the first debate. After the debate, his support began to move up, and then on October 13th we hit a very stable spot in the race. In the 2010 turnout model, the Romney lead stayed very constant for 9 days. During that period, there were no real changes in the state of the race, no gaffes, nothing that was really changing the momentum. However, during that same period, Rasmussen stayed pretty even, bouncing between R+1 and R+2, while RCP began to drop down to an Obama lead.

My point is that there was no reason for this drop, other than the new introduction of polls with widely divergent samples. The reweighted polls showed stability in the race, that reflected the our general sense of where the race stood during that time. The RCP average did not.

Now I am not saying that the election will resemble 2004 in turnout. I am only saying that rebalancing to potential turnout scenarios gives a better picture of the state of the race over time. This is why I'm offering 5 different turnout models. Voter enthusiasm and GOTV operations will determine which of the models ends up being the valid view of the election.

What my models do provide is a good view of what results we can expect if the electorate resembles a specific model. Since we know that 2008 level turnout is highly unlikely, we can then say "If we see a D+3, then the polls say this" and "If the GOP repeats the 2004 turnout, then the polls say that". I find this more useful that looking at a D+9 poll that says Obama is ahead by 5, calling it BS and ignoring it. It also helps in places like Ohio, when we get a series of 7 out of 10 polls that sample Democrats between D+6 and D+11. We are able to get value out of the polls anyway, rather than ignore them and getting a badly skewed view of what is really happening in Ohio.

18 comments:

UnknownOctober 23, 2012 at 12:17 PM
Dave,

I think your method is very logical, and the proof is when you analyze the different polls from different firms, they all seem to "say" the same things, the variation comes almost purely from what turnout model you're using.

I don't think it's an "exact" formula (ie more Democrat voters turning out doesn't necessarily mean more Obama votes, but more than likely) but I think it gives a MUCH more accurate picture of the race than ignoring this factor and just accepting whatever absurd model the pollster is pushing. If we accept that Democrat turnout is greater than 2008 and something like +9 compared to Republicans, the campaign and candidates essentially become irrelevant because it would be nearly impossible to overcome that makeup of the electorate.

The real polling "bombshell" than no news outlets seem to be talking about is the fact that Romney is absolutely dominating when it comes to voters that identify themselves as independents. To me, that's the biggest sign that Obama is likely finished.

ReplyDelete
Replies
royOctober 23, 2012 at 1:26 PM
Excellent analysis Dave. I think you are on to something, Jay Cost parlayed election analysis into a writing gig in 2004, maybe you will be the next election day star.
ReplyDelete
Replies
losing nowOctober 23, 2012 at 2:32 PM
Dave:
Your analysis and specially, laying out all the turnout scenarios is very logical.

Any poll that says romney is winning independents and then says that obama is leading ..Does not pass the common sense test.

Rcp and nate, imo ..are analyzing wrong inputs. You can do all kinds of analysis on data ..but as they say, Garbage In, Garbage out. Mo matter how sophisticated one's analytical methods and historical performance..it has to pass the common sense test.

Thanks for diligently doing this work. Very useful
ReplyDelete
Replies
someoneOctober 23, 2012 at 2:45 PM
I agree -- seems to me the race was never really shifting as much as poll top-lines might have suggested, and is hugely determined by the deeper, slower-moving factors of turnout+indies you point to.

It seems to me that what media cycle shifts *may* produce pretty quickly are shifts in the *poll* turnout, which could be why (e.g.) we suddenly started to see Rs represented in Gallup. With response rates in the single digits, just a fraction of GOPers deciding that it's safe to register one's preference to the hostile media can probably turn these numbers from nonsense into sense.

Another point in your favor is that Ras uses this exact sort of turnout normalization (though to a *single* fudged number)... and has good predictive success with it. You're just putting all the other pollsters on the same field with Ras.
ReplyDelete
Replies
ChaseOctober 23, 2012 at 4:01 PM
I want to believe...

But I just don't understand how we can get from "poll toplines show X" to "poll toplines are actually Y" without any historical evidence that RCP's poll-of-polls methodology is flawed. It was within a point of the actual result in 2004 and 2008 (nat'l popular vote).

However, state polls were more variable in 2008. 2008 final result variance from last RCP poll average in battleground states: IN D+2.5 MO D+0.6 OH D+2.1 FL D+1.0 VA D+2.1 for an average 1.6 point variance. That's more significant, but not by much.

Also, a lot of the late swing in 2008 was to undecideds breaking heavily for Obama; but on this day in 08, there were 7.3% undecided compared to 4.9% now.

I guess what I'm saying is I haven't seen any compelling historical evidence to suggest that poll results will not reflect eventual turnout.
ReplyDelete
Replies
AnonymousOctober 23, 2012 at 4:09 PM
I think I'll be able to take a close look at this later, but I'm already impressed that Ace has been so cautious about this, and that you have chosen to defend your claims with arguments, rather then (as others might) get defensive and worse. It's all a very good sign. That, and some things about these polls just "smell bad".
-Optimizer
ReplyDelete
Replies
AnonymousOctober 23, 2012 at 6:14 PM
Your method has some validity if the only, or major skewing they are doing is a final adjustment of the partisan split.

If they are doing something trickier, then your method might not help at all.
ReplyDelete
Replies

Add comment