After my last blog post, a few of my friends were curious about how I reduced all those votes down to two dimensions. I did kind of gloss over the details last time, so I figured I’d take a stab at explaining my methodology.
Last time around, I pulled all the roll call votes for the Washington State Senate in 2016, and I made a big matrix of senators and their votes. A matrix with over 25,000 cells is not exactly easy to process, so I reduced the dimensions to two, resulting in this scatterplot showing the partisan divide:
How can you take votes on 402 bills and turn it into a two-dimensional graph? Essentially, we’re converting each senator’s list of 402 votes into two new “features.” Those features are all their votes smushed together to give us new points that fall into a lower dimensional space.
How do we reduce dimensions and figure out which votes to use in our new features? Well the key thing to consider is this: all bills are not created equal. Some bills (and the votes on those bills) tell us more about a person’s politics than others. Before you read on, consider this: if the senate votes unanimously for (or against) a bill, does that tell us much about the differences between the senators that voted on it?
A Delicious Detour to Pizzaville
Let’s take a trip to Pizzaville, a lovely nation with a unicameral legislature consisting of five members: Monroe, Washington, Jefferson, Adams, and Madison. They’ve voted on four bills this session: each one establishing a topping on the national pizza. One for cheese, one for pepperoni, one for tofu (which didn’t pass the floor), and one for jalapeños. What a productive year!
If we want to understand how Pizzaville’s legislators fall on a partisan continuum, we need to figure out what votes tell us the most about the members. Take a look at the votes and tell me – which votes seem like they tell you the most about each member of the legislature?
Can we learn much from the way a member voted on Cheese? No, it doesn’t tell us much – they all voted for it. Clearly no one in Pizzaville is lactose-intolerant. If we’re looking for information about a member’s politics, the Cheese vote is useless to us. We could leave this out of our model entirely and be no worse for it.
What about Pepperoni? This is a more interesting result – here we have two members who voted Nay, the other three voted Yea. Do you think there’s information here? I would argue that there is. There’s a lot more variance in this vote, meaning the body is much more divided. On top of that, every person who voted for Pepperoni voted against Tofu, and everyone who voted against Pepperoni voted for Tofu. If we know someone’s Pepperoni vote, then we also know their Tofu vote! That is a high-information vote. (Note that this also works the other way around.) These items are highly correlated. When we look for high-information votes, we’re looking for votes that have high variance, and are highly correlated with other votes.
And the Jalapeño vote? There’s more variance than cheese, but it doesn’t map as neatly to division on other toppings as Pepperoni and Tofu. It would be good to include this in our model, but perhaps not weighted as highly as the previous two.
- Cheese, while delicious, tells us nothing. We’ll just throw this bit of information away by weighing this vote at 0.
- Pepperoni was one of our high-information votes. Let’s put those pepperoni-eaters on the left of our graph, so we’ll give it a weight of -1.
- Tofu is opposite of Pepperoni, so it gets a weight of 1.
- Jalapeños has some information to it, but not quite as much. Since a “yea” on Jalapeños means there’s a 2/3 chance you’re pro-tofu and anti-pepperoni, let’s give it a weight somewhere closer to tofu, like .7
(If you’re thinking that choosing these values seems arbitrary, you’re right. We’ll dig into a more rigorous process shortly)
Let’s use these weights to generate new partisan scores for Pizzaville’s legislators. To do this, we convert each Yea/Nay into a +1 or -1, multiply each vote by its weight, and add them up together. Here’s a very messy table of all that arithmetic:
|Cheese (0)||Pepperoni (-1)||Tofu (1)||Jalapenos (.7)||calculation||Score|
|Monroe||1*0 = 0||1 * (-1) = -1||(-1) * 1 = -1||1 * .7 = .7||0 -1 -1 + .7||-1.3|
|Washington||1*0 = 0||(-1) * (-1) = 1||1 * 1 = 1||1 * .7 = .7||0 + 1 + 1 + .7||2.7|
|Jefferson||1*0 = 0||1 * (-1) = -1||(-1) * 1 = -1||(-1) * .7 = -.7||0 -1 -1 – .7||-2.7|
|Adams||1*0 = 0||(-1) * (-1) = 1||1 * 1 = 1||1 * .7 = .7||1 + 1 + .7||2.7|
|Madison||1*0 = 0||1 * (-1) = -1||(-1) * 1 = -1||(-1) * .7 = -.7||0 -1 -1 – .7||-2.7|
I’ve plotted these legislators on a graph below. Tweak the weights for each vote to see what happens! A few things to try:
- Give Cheese a weight other than zero.
- Change either Pepperoni or Tofu to be larger.
- What happens if you weight Jalapeños the same as Pepperoni or Tofu?
Want to play with the source code of this widget? Download or fork it on GitHub.
According to this model, Washington and Adams are our most “right-wing” legislators, with Jefferson and Madison at the far-left. We have found the partisan divide in Pizzaville – between the right-wing Vegetarian Party, and the left-wing Carnivore Party! Monroe is our lone moderate, cutting through the partisan crust for some spicy jalapeños. And who can blame him?
From Pizza to Principal Components
Now that we’re all very hungry, let’s get back to Washington state. We had 402 bills that had at least one roll call last year – how the heck are we supposed to assign weights to all of those? Not to mention our weights in Pizzaville were awfully arbitrary – how do we make sure the weights in our model are appropriately balanced? The real world sure is a lot messier than Pizzaville!
In practice, when you are trying to reduce dimensionality of a data set like this, you don’t use intuition to assign weights like we did above. You use a more formal process like “Principal Component Analysis” (PCA.) This is where things get a tiny bit “math-y,” but stay with me!
PCA uses some nifty properties of matrices (our legislators and votes table was a matrix) to find what are called eigenvalues and eigenvectors, which can be used to change the coordinate system of the matrix, usually to one much smaller. If that sounds complicated, never fear! In this case, eigenvectors are a lot like those weights we established above – the eigenvector is like a list of 402 weights used to combine all the votes into one new value!
The cool thing about using PCA is that it allows us to assign all these weights and create these scores from the bottom-up. We don’t have to go line-by-line and try to understand every piece of legislation like we did in Pizzaville. The downside is that we don’t always know exactly how our model works – analyzing 402 weights to make sense of what “matters” can be really complicated. PCA is also not specific to this application – you can use it to reduce the dimensionality of your data in any field, from political science all the way to basketball statistics.
Olympia’s Pepperoni and Tofu
Alright, so some of the State Senate’s votes tell us more about partisanship than others. In fact, 243 of their 402 bills were unanimous at final passage last year – that’s a lot of cheese!
Which votes are most useful for understanding legislative differences? What’s Olympia’s Pepperoni and Tofu? It turns out that there were a lot of highly partisan votes squeaking by in the senate last session – the Republican caucus had a narrow majority which they exercised often – and all those are interesting for partisanship scores. Rather than going through all those bits of legislation, I thought I’d call out one of the items that was weighted the other way: what vote drags your position farthest to the left?
It turns out that it’s not a piece of legislation at all, but a gubernatorial appointment! The honor goes to SGA 9137, the appointment of Lynn Peterson to head the Washington State Department of Transportation. In 2013, Governor Inslee appointed Lynn Peterson to lead the Washington State Department of Transportation. She served on what was technically an interim basis for three years (!), until her confirmation was brought to a vote in February of 2016. Her confirmation was voted down, with 53% of senators voting nay. I found this one particularly interesting because I learned that this was the first time the State Senate had rejected a gubernatorial nominee in 18 years!
Digging into the news on this, there was a lot of speculation about whether this was politically motivated, or whether it was backlash due to Seattle’s ongoing Bertha issues and I-405 tolling concerns. My model gave a vote for this a highly liberal rating – this was a strict partisan vote, with only the Senate’s liberal wing voting yea.
Not so Bad, Right?
So there you have it! I hope this was a helpful overview of how you might take a big, intractable-seeming data problem and reduce it’s dimensions to something our puny human brains can visualize and explore. As I mentioned above, this is a bottom-up way of modeling and understanding votes. This has its strengths and weaknesses. On the upside, it’s a really fast way to start understanding underlying differences in a new dataset – you can very quickly find some of the signal in a large matrix of data. Unfortunately, this can sometimes mean that your results are opaque or difficult to verify with conventional wisdom. So view it as one tool in your tool belt. I’m excited to dig into the legislative data to understand the substance of legislation a bit more.
Source Code and Examples
Want to Explore yourself? If you want to explore using this method to map out partisanship, I have a couple of things you can play with in a GitHub repository. It includes:
- Principal Component Analysis Example in Python – I wrote very little original code here, this is just to show you how easy it is to apply! This simple script opens up a matrix of legislators and their votes, and then performs PCA to scale them down to partisanship scores.
- D3.js visualization of weights and scores – This is the widget embedded up above that allows you to play with weights and watch the points change.