Thursday, September 27, 2012

Methodology

Warning: Math ahead...

The election of 2012 is turning into a joke of epic proportion regarding the public polling provided.  Public polls are supposed to be predictive of voter behavior, not instruments of propaganda.  While I personally support Mitt Romney, I am not interested in cherry picking polls or pushing a specific result.  I am interested in knowing where my chosen candidate stands with the electorate.


4 and 8 years ago we were able to look at the Real Clear Politics (RCP) average and get a sense of the state of the race.  However, in 2012 the RCP average is not indicative of where the race stands.  


If you take the polls used in the RCP average, and look at their internals, you find that the sampling is not based in reality.  In 2008 the Democrats had their best year in several generations, with 39% of the electorate compared to 32% Republican (D+7).  This was a turnout proportion unmatched in modern times.  In 2010 we saw an Even turnout and in 2004 an Even turnout.  Many of the polls in the RCP average are reporting results that are higher than the 2008 turnout results.  D+10 or D+11 are not untypical, and one of the Pew polls was even D+19!


What I am doing is similar to what is being done at unskewedpolls.com, except I am offering a range of results instead of a single one.  I am going into internals of the public polls and reweighting them to give a different polling average based on 2008 actual turnout, 2010 actual turnout, 2004 actual turnout,  and a 3 month rolling average of the Rasmussen Party Identification poll.

Turnout Models

We start with a predicted make up of the voting population on election day.  Three turnout models are used that match the actual voter identification during three recent elections, 2004, 2008, and 2010.

2004

37% - Democrat
37% - Republican
26% - Independent

2008
39% - Democrat
32% - Republican
29% - Independent 

2010
36% - Democrat
36% - Republican
28% - Independent 

One interesting thing to note, which has important implications on the analysis of polls, is that in 2008 Democrats turned out only 2-3% above their normal levels.  The real difference is that Republicans didn't vote.  If you are to believe the polls used in the RCP average, then you must believe that Democrats will turn out at levels four times higher than than their record surge, or believe that Republicans are going to stay home at even higher levels than in 2008.  Do either of these assumptions make any sense in the age of Obama?

Rasmussen Party ID poll

Every month, Rasmussen conducts a poll on partisan identification.  He asks 15,000 people whether they identify themselves as Democrat, Republican, or Independent.  In polling terms, 15,000 samples is a huge number, resulting in a margin of error (MOE) of less than 1%.  Interestingly, he doesn't use this information in weighting his daily tracking poll, instead using an internals weighting based on responses received in the same poll over the last few weeks.  The Rasmussen party ID poll has proven to be a good indicator of party performance in recent elections, in 2004 it reported D+1 and in 2008 D+8, which were within 1 point of the actual turnout (leaning slighting in favor of Democrats).

What I am doing is using a 3 month average of this poll to provide a 4th turnout model in my reweighting.  My reasoning for using a 3 month average is that his August poll produced a huge jump in Republican identification, and I want to even out the numbers. Here are the current results:


June

34.0% - Democrat
35.4% - Republican
30.5% - Independent

July

34.0% - Democrat
34.9% - Republican
31.1% - Independent

August
33.3% - Democrat
37.6% - Republican
29.2% - Independent

Average (3 month - 45,000 samples)
33.8% - Democrat
36.0% - Republican
30.3% - Independent

Reweighting Polls

In order to reweight the polls, I start with the top line result for the poll.  For example the AP/GfK poll is reporting a result of Romney -1%.

Next I look at the internals of the poll to get the partisan breakdown.  In the AP/GfK poll it is Democrat/Republican/Independent (D/R/I) of 50/37/13.  Obviously the polling sample in this poll is way off with a D+13 sample.

Next I calculate the modification to the top line number that should be made under each of the turnout models.  The formula for this is ((Target Party ID) - (Reported Party ID)) x (Target Party ID/50).  In English, this is the difference between what was reported and what is expected for each identification times the expected identification divided by 50.  It results in how much the top line result should be adjusted for each possible party identification.  Using the Democrats and the 2008 model the Democrats would have (39-50)*(39/50) = -8.58 adjustment to the top line result (-1 + 8.58 = 7.58 toward Romney).


I do not adjust for partisan splits.  Some polls report x% of Republicans will support Obama, and vice versa.  I assume the value of a Republican in votes for Romney is equal to the value of a Democrat in votes for Obama.  I also don't try to adjust for voter enthusiasm.  If you want a sense of that, then pay attention to the turnout models.  2004 is a good Republican year, the Rasmussen Party ID would be an unprecedented Republican year.


I also adjust Independents to match the specific model.  By default I assume an independent to be a neutral choice, worth 50% to Romney and 50% to Obama.  However, if a poll reports a preference of Independents, then I will weight the result to account for the differential.


Finally, I am including a modification factor for the undecideds reported by the poll.  I maintain this as an independent variable.  When I assume a 50% split of undecides, then there is no affect on the poll.  Most of the time I will assume a 2:1 split toward the challenger Romney, since that matches historical trends.  Note there are exceptions though, in 2004 they split 2:1 for the incumbent.


The reweighting of the AP/GfK poll used as an example thus results in:

Original - Obama +1
2008 - Romney +6
2010 - Romney +10
2004 - Romney +10
Rasmussen - Romney +11

As a final step I am averaging all of these reweighted polls and giving the result.  Note that I do not include polls of which I can't see the internals.  Doing so would force apples to oranges comparisons, which negates the point of this analysis. This means I can't include Gallup in the average, and drop it out of the RCP average as well.  This is why my reporting of the RCP average will differ from what they post on their web site.









3 comments:

  1. I finally spotted this item on your blog, and had a chance to review it in detail. I can follow what unskewedpolls.com, which you reference, is doing, but I don't see where your own formula comes from, how you can make your adjustments using just one party affiliation, and how your stated assumptions come into it.

    Taking the example from unskewedpolls.com, it seems like a better assumption is that 90% of R voters will go Romney, while 90% of D voters will go for Obama. If we let "R" be the percentage of voters that are Republican, and likewise for "D", then Romney would get .9*R+.1*D votes from affiliated voters, while Obama would get .1*R+.9*D. The difference comes out to 0.8*(R-D). If we define the turnout differential, T=R-D, (the difference between the Republicans that vote and the Democrats that vote), that formula comes down to 0.8*T.

    Anyone could use that formula to make a first-cut at "adjusting" a skewed poll. If a poll is "D+10", for example, but you think the turnout will be "D+0", you could simply subtract 0.8*10=8.0 from the poll's published result.

    But what about Independents? It seems clear that they are about 30% of the voters. That means that the contribution to the final result due to independent is 0.3*dI, where "dI" is the difference between the percentage of Independent voters going for Romney, minus the percentage going for Obama.

    Putting it altogether, the voting result (making the 90% assumption above) is V = 0.8*T + 0.3*dI. For example, if you go with a voter mix of R/D/I of 36/34/30 (Roughly what Ramussen says it is), and the Romney/Obama split among Independents was 55/45, you would get V = 0.8*(36-34)+0.3*(55-45) = 0.8*2 +0.3*10 = 1.6+3 = +4.6 (favoring Romney).

    It seems to me that the way to do this would be to first estimate dI, by calculating dI = (Poll_result-0.8*T_poll)/(fraction of independents in poll), where T_poll is the turnout reflected in the poll (the negative of the "D+" number). Basically, you just need the split among Independents. If the the internals give you that more directly, that would be best (rather than doing the above).

    Then calculate V = 0.8*T + 0.3*dI (where, again, T=R-D).

    Anyway, this formula shows the importance of GOTV. Improving your turnout by 1% gives you +0.8%, while improving your split among independents by 1% only gives you +0.3%.

    -Optimizer

    ReplyDelete