MethodologyWarning: Math ahead...
The election of 2012 is turning into a joke of epic proportion regarding the public polling provided. Public polls are supposed to be predictive of voter behavior, not instruments of propaganda. While I personally support Mitt Romney, I am not interested in cherry picking polls or pushing a specific result. I am interested in knowing where my chosen candidate stands with the electorate.
4 and 8 years ago we were able to look at the Real Clear Politics (RCP) average and get a sense of the state of the race. However, in 2012 the RCP average is not indicative of where the race stands.
If you take the polls used in the RCP average, and look at their internals, you find that the sampling is not based in reality. In 2008 the Democrats had their best year in several generations, with 39% of the electorate compared to 32% Republican (D+7). This was a turnout proportion unmatched in modern times. In 2010 we saw an Even turnout and in 2004 an Even turnout. Many of the polls in the RCP average are reporting results that are higher than the 2008 turnout results. D+10 or D+11 are not untypical, and one of the Pew polls was even D+19!
What I am doing is similar to what is being done at unskewedpolls.com, except I am offering a range of results instead of a single one. I am going into internals of the public polls and reweighting them to give a different polling average based on 2008 actual turnout, 2010 actual turnout, 2004 actual turnout, and a 3 month rolling average of the Rasmussen Party Identification poll.
Turnout ModelsWe start with a predicted make up of the voting population on election day. Three turnout models are used that match the actual voter identification during three recent elections, 2004, 2008, and 2010.
37% - Democrat
37% - Republican
26% - Independent
39% - Democrat
32% - Republican
29% - Independent
36% - Democrat
36% - Republican
28% - Independent
One interesting thing to note, which has important implications on the analysis of polls, is that in 2008 Democrats turned out only 2-3% above their normal levels. The real difference is that Republicans didn't vote. If you are to believe the polls used in the RCP average, then you must believe that Democrats will turn out at levels four times higher than than their record surge, or believe that Republicans are going to stay home at even higher levels than in 2008. Do either of these assumptions make any sense in the age of Obama?
Rasmussen Party ID pollEvery month, Rasmussen conducts a poll on partisan identification. He asks 15,000 people whether they identify themselves as Democrat, Republican, or Independent. In polling terms, 15,000 samples is a huge number, resulting in a margin of error (MOE) of less than 1%. Interestingly, he doesn't use this information in weighting his daily tracking poll, instead using an internals weighting based on responses received in the same poll over the last few weeks. The Rasmussen party ID poll has proven to be a good indicator of party performance in recent elections, in 2004 it reported D+1 and in 2008 D+8, which were within 1 point of the actual turnout (leaning slighting in favor of Democrats).
What I am doing is using a 3 month average of this poll to provide a 4th turnout model in my reweighting. My reasoning for using a 3 month average is that his August poll produced a huge jump in Republican identification, and I want to even out the numbers. Here are the current results:
34.0% - Democrat
35.4% - Republican
30.5% - Independent
34.0% - Democrat
34.9% - Republican
31.1% - Independent
33.3% - Democrat
37.6% - Republican
29.2% - Independent
Average (3 month - 45,000 samples)
33.8% - Democrat
36.0% - Republican
30.3% - Independent
Reweighting PollsIn order to reweight the polls, I start with the top line result for the poll. For example the AP/GfK poll is reporting a result of Romney -1%.
Next I look at the internals of the poll to get the partisan breakdown. In the AP/GfK poll it is Democrat/Republican/Independent (D/R/I) of 50/37/13. Obviously the polling sample in this poll is way off with a D+13 sample.
Next I calculate the modification to the top line number that should be made under each of the turnout models. The formula for this is ((Target Party ID) - (Reported Party ID)) x (Target Party ID/50). In English, this is the difference between what was reported and what is expected for each identification times the expected identification divided by 50. It results in how much the top line result should be adjusted for each possible party identification. Using the Democrats and the 2008 model the Democrats would have (39-50)*(39/50) = -8.58 adjustment to the top line result (-1 + 8.58 = 7.58 toward Romney).
I do not adjust for partisan splits. Some polls report x% of Republicans will support Obama, and vice versa. I assume the value of a Republican in votes for Romney is equal to the value of a Democrat in votes for Obama. I also don't try to adjust for voter enthusiasm. If you want a sense of that, then pay attention to the turnout models. 2004 is a good Republican year, the Rasmussen Party ID would be an unprecedented Republican year.
I also adjust Independents to match the specific model. By default I assume an independent to be a neutral choice, worth 50% to Romney and 50% to Obama. However, if a poll reports a preference of Independents, then I will weight the result to account for the differential.
Finally, I am including a modification factor for the undecideds reported by the poll. I maintain this as an independent variable. When I assume a 50% split of undecides, then there is no affect on the poll. Most of the time I will assume a 2:1 split toward the challenger Romney, since that matches historical trends. Note there are exceptions though, in 2004 they split 2:1 for the incumbent.
The reweighting of the AP/GfK poll used as an example thus results in:
Original - Obama +1
2008 - Romney +6
2010 - Romney +10
2004 - Romney +10
Rasmussen - Romney +11
As a final step I am averaging all of these reweighted polls and giving the result. Note that I do not include polls of which I can't see the internals. Doing so would force apples to oranges comparisons, which negates the point of this analysis. This means I can't include Gallup in the average, and drop it out of the RCP average as well. This is why my reporting of the RCP average will differ from what they post on their web site.