These are completed volunteer tasks. Thanks, volunteers -- you're awesome!

speech information [COMPLETED]

done by didier deshommes [1.3M file] tracks how many speeches and how many words per speech each representative has given this year. Writing a simple parser that maps GovTrack IDs to these facts would be great.

earmarks [COMPLETED]

done by MichaelM and AlexG

Taxpayers for Common Sense has a 4MB earmark database up on their website at Write a parser for it. A good place to start would be getting number of earmarks and sum of earmark size for each representative.

voting history [COMPLETED]

done by Dan Steingart contains a series of XML files that identify various interesting facts about legislators. A good first step would be writing a parser that outputs a mapping from GovTrack IDs to numbers like a) the percentage of votes a person has missed, b) the number of bills they've introduced, c) the number of bills they've gotten enacted.

Goes something like this

  • Import RepStats

  • Instantiate the class you're looking for

    • NoVote
    • Cosponsored
    • Enacted
    • Verbosity
    • People
    • Speeches
    • Spectrum
    • Enacted
    • Introduced
    • Cosponsored
  • To look up by name, use *.findRepByName(name,metric)

    • example repstats.NoVote().findRepByName("Obama")
  • To look up by ID, use watchdogVoting.findRepByName(name,metric)

    • example repstats.NoVote().findRepByName("Obama","verbosity")
  • Returns a dictionary

  • If partial ID or partial name is used, will return multiple dictionaries for that metric.

  • Right now it uses URL grabs to get the xmlfile, but would be much faster if done locally. It downloads the given data set per instantiation.


done by AaronSw

These scores, developed by political scientists Keith Poole and Howard Rosenthal, are probably the most respected measurement of Congressperson partisanship. We can use them to give people a sense of where their congressperson sits on the political spectrum.

IRS income statistics

AlexG is happy to help program if someone is better with stats...

done by AaronSw

The IRS provides basic statistics about people's income tax returns by zip code. We need someone to a) parse the Excel files (check out the we already have), b) convert them into some numbers that make sense (ideas: median income, gini inequality, tax burden, charitable contributions, % paid preparers), and c) combine the zip codes so we can estimate the values by congressional district.

AlexG: Does anyone know if congressional districts usually bisect zipcodes? This could present a challenge to data integrity.

AaronSw: Sadly, they often do. (zipdict.txt shows just how frequent this is.) Unfortunately, the IRS doesn't seem to provide any data on Congressional boundaries, so I figure we'll just have to do the best we can. Congressional districts are big enough compared to zip codes that I don't think it'll make too much of a difference. We can note this and also provide a way to query by zip code and by state.

FEC parser

being developed by by Jeremy Schwartz

The FEC publishes data files for all the campaign contribution information they collect:

A good place to start would be figuring out on what schedule these files update and parsing out total_receipts, total_expenditures, and pac_contributions for politicians and lining those up with our identifiers.

