Tasks for Volunteers

We've gotten a number of people offering to volunteer and asking if there's something they can work on. I've set up this wiki page to keep track of projects you can volunteer on and who's claimed them at the moment. Bathroom Cabinets

If none of these appeal to you, you can:

  1. email Aaron (me@aaronsw.com) and tell him what sort of thing you'd be interested in and he can try to think of something more appropriate

  2. join our volunteers mailing list and wait for us to email with a request

If you are interested, take a look at our coding standards and add your name next to the task you want to work on. If you're working on it seriously, let us know and we'll give you an account on the issue tracker.

get an earmark

Many representatives have online forms you can use to request earmarks from them:


A good project would be: a) trying to get us an earmark, and b) documenting the process.

lexis-nexis queries

If anyone out there has a Lexis-Nexis account (a real one, not the Lexis-Nexis Academic subscription that most college students can use), please contact me: me@aaronsw.com.

why'd they vote that way?

Ebonya Washington found [PDF] that politicians with daughters tend to be better on women's issues. David Wheeler argues that state income, state dependence on fossil fuels, and political ideology explained the voting on the Warner-Lieberman global warming bill.

Your tasks is to write some automated code, ideally using the amazing TETRAD, that analyzes votes on bills along with the other data in watchdog and tries to explain why politicians voted that way.

contribution clustering

Most campaign contribution data, like that on Open Secrets, is grouped into a bunch of basic industry categories: Investment, Real Estate, Entertainment, Lobbyists. These categories are preselected and donations are placed into them by researching the employer of the donor; a time-consuming manual process.

We could automate a lot of it, perhaps, but what if we tried something difference: what if we ran a clustering algorithm (see see this poorly-named book for explanation and Python examples) on the data and let the data determine which clusters are most relevant. Obviously, we'll need humans to interpret the data at the end, but that's a lot less work and a lot more interesting.

voter/contributor heatmaps

Soon we'll have voter registration data and individual contribution data for much of the country. But this is a lot of data -- we'll want nicer ways of visualizing it.

One obvious way is through maps: show where the most registered voters are clustered; show where political contributions come from. Fundrace does a version of this that I find faintly hideous. The New York Times, as you might imagine, was a bit more tasteful. (More from those guys: 1, 2.) Surely we can do better. Or at least something.

historical voting data



Drew is writing code now

Lobbyists -- love them or hate them, they're obviously an important part of our government. Which is why we want watchdog to track them. The Senate provides an XML database of lobbyist information, OpenSecrets provides advice on making sense of it, as usual we want a Python parser that reads the files and emits dictionaries of the important information.

The House provides similar feeds.

See also lobbyist contributions And this code to deal with them.

on the issues

being worked on by Shahin

Status: Have a parser, need to crawl.

The website On The Issues collects a vast number of positions and quotes from various political figures. Unfortunately, it doesn't seem to be stored in a database but just as plain HTML pages, so it's kind of hard to parse. Still, it's a wealth of useful information -- it would be great if someone could work on a parser for it.


The SEC keeps track of all publicly-traded companies and their major executives, owners, and investors. Unfortunately it's all in XML wrapped in SGML and not very easy for people to get at. We want to parse the key data out and load it up so that people can explore corporate structures better. The database is called EDGAR and you can get it over FTP. Some C# code parsing it has been developed by GovTrack and the output is on archive.org. As usual, we want Python dictionaries with the key data so we can import them into a SQL DB.


being looked at by evan

some questions and issues

The US Government provides documentation of their trademark data XML format and some sample data on the USPTO website. Unfortunately, it's massively complicated. Your task, should you choose to accept it, is to figure out how to make sense out of all this and write a Python script that goes thru the XML files and returns dictionaries containing all of the important information. Remember, we'll want to integrate this with the SEC database above.

extract more almanac data

Modify the code in almanac.py (see http://watchdog.net/code/?p=dev.git;a=tree;f=import/parse) to extract more data, like presidential election voting history.

public schools

being looked at by zack

The National Center on Education Statistics (NCES) has a wide variety of information on the country's public schools. In particular, their Common Core of Data will let you calculate things like dollars per pupil, student-to-teacher ratio, dropout and graduation rates, etc.. It'd be great to get this data into the database so we can see how different districts stack up.

Meanwhile SchoolDataDirect.org also has a lot of data [click-thru license] "including student performance data, No Child Left Behind data, school environment data, financial data, community demographic data and analytical ratios". It might be easier to just get data from here, especially the NCLB and other student performance data that NCES doesn't seem to publish.

environmental toxins

Scorecard aggregates a whole series of data about toxins in your neighborhood, but their interface is terrible. It'd be great to get this data and integrate it into watchdog so you could, say, see where your district was on the pollution scale.

I'm no expert on this (perhaps we should call someone at Environmental Defense), but it looks like the NEI database is the place to start. Also, NICAR has info and examples on how to use this data.

Databases, Access DB parser.

More: acid rain CD

mortality data

being worked on by Garry Jao

The National Center for Health Statistics seems to have a lot of data. In particular, they have a database of every death record -- it'd be great to have the top causes of death for each district.


ICPSR has data sets about congresspeople and conressional districts with some interesting kinds of data:


textual analysis

We've got speech data for every rep and while I'm sure some people might want to read thru those speeches, a lot of people would rather let a computer do that for them. If we could extract some fun things from the text, like Amazon does from book scans, that would be great. I'm thinking of:

  • favorite/least favorite phrases: the top 5 phrases the rep uses more than the average rep
  • most/least unique phrases: phrases used by a rep with the fewest uses by other people
  • defining words: phrases that best pick out a rep in some kind of statistical analysis

the middle class



SurveyUSA.com publishes approval rating numbers for every Senator. It'd be nice to have a crawler and a parser that extracts these.

Here's some Perl code from GovTrack that does this.

Jonathan Holst: I have begun looking into this. I am not sure, though, what numbers I should look at. The best I have been able to come up with is http://www.surveyusa.com/50StateTracking.html, with some sort of re.match('Senate Approval'). Can anyone confirm, or am I completely off track?

AaronSw: Nope, you've got it.


being looked at by Groby

Play around with Mapnik (http://mapnik.org/) or PostGIS to see if you can get it generating overlay maps for congressional districts. You can get the boundaries from the census at http://www.census.gov/geo/www/cob/cd110.html

The maps code we're currently using calls out to this Perl code on someone else's server (http://razor.occams.info/code/repo/?/viz/wms).

I was thinking it'd be nice to have it in Python on our server so that we could do more with it than just show one district at a time.

A second step would be loading in actual geographic data so we could replace Google Maps entirely -- one problem with Google Maps is that there's no way to generate static images, so pages with a lot of district maps take a very long time to load.

See: http://congress.mcommons.com/

load PVS data

We've crawled a large amount of data from Project Vote Smart and stored it in JSON dumps. We also have Project Vote Smart IDs for most politicians thanks to GovTrack. Now all we need is a simple script that uses that information to line up and import the data from Project Vote Smart.


being worked on by jdthomas

The US Census has an enormous amount of information about each Congressional district. Unfortunately, it's all in a very confusing format. If you can decode their complicated CSV files and output standard Python dictionaries full of interesting facts about each Congressional district, we'd be much obliged.

Here's some Perl code from GovTrack that parses some of it.


SF3-DP2: % never married, % divorced, % >= some college, % >= college degree, % professional degree, % foreign born, % speak foreign language, % veterans

Unknown: armed forces personnel, crimes, inmates, voting age population, voting age population by race, registered voters, political party identification, mortality rates


Many officials in Australia are promoting a free education about Web Design Perth. Along with this are bunch of volunteers who are so much willing to participate and contribute.

XMAS Season is near

Christmas is very much near. Perhaps, you are looking for something to give as a gift then I must say Christmas hampers | Basket hampers | Gift baskets is perfect for your need.

Archive: volunteer/completed bảo vệ cong ty bao ve du hoc my NoiThatDaiLoan NoiThatFami NoiThatVanPhong thanh lap cong ty co phan cuu du lieu

changed May 29, 2015 delete history edit