Notes on the Senate LDA databases.

This page is still undergoing major construction.

Definitions, explanations and limitations

Lobbying disclosure is largely regulated by two pieces of legislature: the Lobbying Disclosure Act of 1995, or LDA; and the Honest Leadership and Open Government Act of 2007, or HLOGA. HLOGA amended the LDA in 8 areas. Most of the changes aren't material to the Watchdog project, except as noted below.

The definitive summary of the lobbying disclosure laws is given by the House of Representatives in their Lobbying Disclosure Act Guidance. That document provides definitions of terminology used in the LDA database; clarifications on what needs to be reported by lobbying agencies, and when; and more. This wiki page shouldn't be used as a replacement for that document; this wiki page merely attempts to summarize the parts that are relevant to interpreting the LDA database and realizing its limitations. Where this wiki page contradicts the House guidance, the latter should be used and the former corrected.

Lobbying agencies (including organizations which employ in-house lobbyists who act on behalf of the organization) are required by the lobbying regulations to file quarterly reports detailing their lobbying income and/or expenses, subject to certain thresholds (see below). The quarterly filing requirement is relatively new, introduced by HLOGA in 2007, so databases from years prior to 2008 may only provide semi-annual records, since that was the previous filing period. (Source.)

Lobbying forms

There are three forms, each for a different purpose.

  • LD-1 registration forms must be filed by a lobbying organization when the organization begins lobbying on behalf of a new client, or when an organization that employs in-house lobbyists begins lobbying for itself. The specific conditions that trigger this LD-1 filing requirement are somewhat complicated; see here for details. At any rate, an LD-1 filing indicates the beginning of a relationship between a lobbyist and a client. Note that pro bono clients are excepted from registration. The LD-1 form is here.

  • LD-2 reporting forms must be filed quarterly once a relationship is established, even if there's no activity or the income to or expenses of the lobbying organization are below reporting thresholds. LD-2 forms must also be filed when a relationship is terminated, or when in-house lobbying is halted. (Source.) The LD-2 form is here. Note that prior to January 1, 2008, LD-2 forms were only required semi-annually. The old semi-annual LD-2 form is here.

  • LD-203 contribution forms were introduced with HLOGA and must be filed semi-annually, beginning July 30, 2008. Registered lobbying agencies must report certain contributions to public officials on a semi-annual basis: specifically, contributions to federal election campaigns, event costs, presidential libraries and "honorary" contributions. (Source.)

Databases

The Senate LD-1/LD-2 lobbying database is a series of XML documents. The documents are partitioned into quarterly reports based on filing date. The records in this database are effectively a summary of all the LD-1 and LD-2 forms received by Congress.

As of July 2008, there is a new contributions database that provides the details of all the LD-203 forms received by Congress.

Limitations of the data

Rounding of amounts

Since the passing of HLOGA in 2007, lobbying amounts in excess of $5,000 are rounded to the nearest $10,000. Amounts less than $5,000 may or may not be rounded. (Source.) Prior to HLOGA, the LDA only required that amounts greater than $10,000 be rounded to the nearest $20,000, so records filed prior to the passing of HLOGA potentially have less precision than post-HLOGA records. (Source, see "Estimates of income or expenses.")

Registration thresholds

As of January 1, 2008, lobbying firms whose quarterly income from a particular client is less than $2,500 are not required to register with respect to that client. Organizations that employ in-house lobbyists are not required to register if total quarterly expenses in connection with lobbying activity are less than $10,000. (Source.) Prior to 2008, the LDA only required filing in the event that income or expenses exceeded $10,000, and did not distinguish between organizations with in-house lobbyists and lobbying firms. (Source, see "Estimates of income or expenses.")

The LD-1/LD-2 lobbying database

XML schema details

The LDA database is delivered as multiple XML files. The Senate does not provide the XML schema for the database. However, Trang can guess schema from XML documents. Note that both the Ubuntu and Debian Trang packages are currently broken, but you can use Sun Java to execute the trang.jar file on the command line; for example, this command generates an XSD from the 3 XML files included in the 2nd quarter 2008 disclosure:

java -jar trang.jar -I xml -O xsd 2008_2_4.xml 2008_2_5.xml 2008_2_6.xml lobbyists.xsd

Structure

Each record in the LD-1/LD-2 database is effectively just an XML representation of either an LD-1 or an LD-2 form that was filed by a registrant. The only indication of which form was used is encoded in the Type attribute of the record's Filing element. If the type is registration or registration amendment, it's an LD-1 form, otherwise it's an LD-2 form... unless it's not: there are a small number of records whose type attribute isn't covered by either form. These unusual record types are covered in a later section.

Since the database is almost entirely composed of information provided on the LD-1 and LD-2 forms, the forms themselves are very useful for interpreting the meaning of the records. The following sections describe the types of records found in the XML documents, based on the fields in the LD-1 and LD-2 forms. See Section 3 of the Lobbying Disclosure Act Guidance document for terminology definitions.

Registration records

Registration records are filed by a registrant (i.e., lobbying organzation) whenever lobbying activity above certain thresholds is initiated on behalf of a client. A registration record marks the beginning of an active relationship between a client and a lobbying organization. A registration record always includes the following information:

  • Effective date (given on the form as MM/DD/YYYY but only reported as YYYY in the XML documents).
  • Registrant name, address, principal place of business, and an individual contact's name. (Oddly, the contact is affiliated with the registrant, but given as an attribute of the Client element in the XML document.)
  • Client name, address and principal place of business.
  • The name of individual lobbyists who are acting, or are expected to act, as a lobbyist for the client. Additionally, if the lobbyist meets the requirements of having served as a "covered executive branch official" or "covered legislative branch official," the name of the position served is also provided.
  • The general issue areas expected to be lobbied on behalf of the client. The areas a chosen from a drop-down list provided in the LD-1 form.

A registration record may include the following additional information:

  • Description of the registrant's business.
  • Description of the client's business.
  • More specific lobbying issues expected to be lobbied on behalf of the client. These descriptions are provided by the registrant.
  • The names, addresses and primary places of business of any organizations affiliated with the client that contribute to the lobbying activities.
  • The names, addresses, primary places of business, contributions and percentage ownership of the client, of any foreign entities that have part ownership of the client, participate in or fund the lobbying activities of the client or any affiliated organizations, or otherwise contribute to the lobbying activities.

Note that registration records never include any of the following fields:

  • Expenses/income.
  • Termination of client relationships.
  • Termination of individual lobbyists.
  • Termination of affiliated organizations.
  • Termination of foreign entity involvement.

Report records

Coming soon.

Amendment records

Coming soon.

Unknown records.

Coming soon.

Interpreting the data

Parsing and making sense of the lobbyist database poses several problems, some structural and some systematic. Some of these problems can be easily fixed (e.g., by simple conversion rules), but others do not have obvious solutions.

Subsidiaries

Companies with separate business units or subsidiaries often hire or employ lobbyists who act on behalf of the smaller units. For example, General Electric and several of its subsidiaries (e.g., General Electric Energy) are classified as lobbying clients in the database. Note that the Center for Responsive Politics accounts for the lobbying activities of smaller business units in those of the parent company (see here for details). However, the Senate LDA database doesn't include any information that can be used to determine corporate relationships, so that data must be obtained from another source.

Classifications

An organization may be classified in the database as one or more of the following:

  • Registrant (i.e., lobbying agency)
  • Client
  • Member of an association, coalition, foundation, etc.

For example, AT&T and its subsidiaries appear in the 2008 2nd-quarter database under all three of these classifications.

When a client organization is an association or coalition of multiple independent organizations, its members are listed as affiliated organizations. For example, the American Beverage Association is a client of several lobbying organizations, in addition to employing several lobbyists itself. Its members (i.e., affiliated organizations) include subsidiaries of Pepsi and Coca-Cola, among others. It's not clear how to account for the contributions made in these filings; in the absence of additional information, apportioning an equal amount to each member organization is probably the best approach.

Inconsistent names

Numerous organizations appear multiple times but with slightly different names. For example, the General Electric parent company appears in the 2008 2nd-quarter database as:

  • General Electric Company
  • General Electric Corporation
  • General Electric Co
  • General Electric
  • General Electric Company (including subsidiaries)

Some of its subsidiaries are even harder to pin down; e.g., in the same document, NBC Universal is identified as:

  • NBC Universal
  • NBC/Universal
  • GE / NBC Universal
  • NBC-Universal
  • NBC Universal, Inc.

Note that government entities (e.g., Senate, House of Representatives) appear to be named consistently.

Inconsistent formatting of individuals' names

The names of individuals are formatted differently depending on the attribute in which they appear. Client contact names are always First Middle Last (sometimes with a period after an initial), but lobbyist names are always Last, First Middle and never have periods after initials.

Inconsistent capitalization and case

The case and capitalization of attribute values vary wildly throughout the database.

Contact names

Every client element in the XML document has a ContactFullname attribute. However, the contact is affiliated with the registrant, not the client. (See the LD-1 form for confirmation.)

Client ID namespace

Each client element in the database includes a client ID attribute. The client IDs appear to be valid only in the context of a particular client name, for the purpose of identifying different contacts at the client organization. (It should probably be called a contact ID.) For example, client ID 3390 appears in 12 client records in the 2008 2nd-quarter database database, but there are 10 different clients identified in those 12 records.

The consequence is that client IDs can't be used to solve the inconsistent client name problem described above.

Registration dates

The LD-1 registration form requires registrants to give the effective date of registration as MM/DD/YYYY, but registration records in the database only provide the effective year. The period is always recorded as undetermined.

Typos

The database contains some significant typos. For example, at least 6 records in the 2008 2nd-quarter database refer to H-1B visas for foreign workers in specialty occupations as either "HB1" or "HB-1" visas. A researcher using the database to find organizations that have lobbied on this issue might miss important filings.

Incomplete (or "less complete") records

Filing records always include the registrant (lobbying agency). A filing record may also include one or more of the following:

  • The client on whose behalf the lobbyist is acting.
  • One or more individual lobbyists (persons).
  • One or more governmental bodies of offices that were lobbied.
  • One or more foreign entities with which the client is associated (controlled/owned by, representing the interests of, etc.).
  • A list of member organizations, if the client is an association, coalition, etc.
  • One or more issues targeted by the lobbying efforts. Sometimes these issues are quite specific (e.g., H.R. 3161, Agriculture Appropriations Bill), otherwise they are more general (e.g., Indian/Native American affairs).

When present, all of these fields contain at least some potentially interesting information. (Also see Uninteresting attributes.)

Most, but not all, filing records include client information. Presumably the records that do not are registration records, but this hasn't been verified yet. In any case, such records aren't particularly useful, and can probably be ignored, in which case we can assume that all filing records in the Watchdog database include both a registrant and a client. The other fields should be considered optional and/or required only for certain clients (e.g., foreign entities in the case of foreign ownership or control.)

Amendment records

Lobbying agencies can and often do amend previous filings with new filings. Unfortunately, the filing ID number of an amendment record is not the same as the record it's meant to amend; nor does an amendment record include the filing ID of, or otherwise refer unambiguously to, the original filing. As far as I can tell, the only way to determine which record is to be amended is to match the year, period (3- or 6-month granularity) registrant ID, client name and client ID of the amendment record to an existing database record. The existing record should then be replaced, in its entirety, by the amendment record; but also see Open questions below.

Unfortunately, in the 2008 2nd-quarter database I have encountered at least one amendment record with no apparent matching original. In such cases, the amendment record should probably be treated as an original record and simply inserted into the database.

Amendment records themselves can be (and frequently are) amended.

A note on parser design: amendment records sometimes occur in the XML document before the record they amend. (Filings do not necessarily appear in the XML document in the order in which they were submitted to the Senate's record-keepers.) This fact, combined with the additional fact that amendment records can refer to records that occur in other XML documents (documents which may not even be available to the parser when the amendment record is encountered), means that we'll need a two-pass import process for a given set of XML documents: the first pass will import the non-amendment records, and the second pass will sort the amendment records by filing date prior to importing them. Since Watchdog's parsing standards recommend that a parser yield its results, I believe we'll also need a two-pass parser. (A parser that eagerly parses all records and returns a list of dictionaries could perform one pass and return two lists: one for amendment records and one for non-amendment records.)

A follow-up note: Aaron says that parsing eagerly and then sorting is fine, as long as it fits in memory.

Uninteresting attributes

The following attributes in the Senate LD-1/LD-2 XML documents can probably be ignored for the purposes of the Watchdog Project.

  • Filing ID - Appears to be a hash of some sort, probably used for Senate/House internal bookkeeping. This attribute would be useful if amendment records made reference to the original record's ID, but they don't.

  • Filing received date - This field is important to the parser and importer for the purpose of sorting amendment records, but it doesn't need to be stored in the Watchdog database.

  • Filing period - This field is redundant. The effective period can either be determined by the filing type field (e.g., "year-end report"), or the period is indeterminate, because for all record types where the filing type field does not specify the period (e.g., "registration"), the filing period field has the value undetermined.

  • Filing type - See the discussion above about record types. In a nutshell, this field is important to the parser and importer, but doesn't need to be stored explicitly in the Watchdog database. The filing type includes the quarter/half-year to which the record applies, so this part of the attribute will be stored in the database in some form.

  • Registrant address - We probably don't need to track geographic location at this granularity. The registrant elements include the registrant's country and the country where its primary place of business is located, and that's probably good enough.

  • Client status - According to the status key on the Senate LDA reports download page, the client's status can be active, terminated, administratively terminated or undetermined. However, the value of this attribute is frequently at odds with the record type (e.g., set to terminated when the registrant and client are still doing business together, e.g., in a quarterly report that is not a termination record). Client/lobbyist relationships are demarcated by registration and termination records, so I think we can safely ignore the client status attribute, in any case.

Open questions

Non-compliance

The Senate page mentions "[t]he Secretary of the Senate has referred an aggregate of 3,883 cases of potential non-compliance to the U.S. Attorney for the District of Columbia."

It'd be good to know what those cases are and whether anything has come of them.

Tracking or flagging these cases is outside the scope of the Senate database. There doesn't appear to be a separate online database of potential non-compliance cases, either. However, if the Watchdog project is content to wait until such cases are resolved, presumably any filing changes required by the resolution will appear in the current quarterly XML document as an amendment record; or as a new, but back-dated, record. The importer could be designed to flag or report records that exceed an arbitrary threshold: amendment records whose amount exceeds the original by greater than $100,000, or back-dated records older than 2 years, for instance.

Updates to past quarterly databases

The download page provides a "Last updated" field for each downloadable quarterly database. The databases date back to 1999, yet none of the downloads has a "Last updated" field earlier than Feb. 2008. It's not yet clear why or in what manner past quarters are updated. Filing mistakes made by lobbying agencies should be reported according to the filing date of the amendment record, not by modifying previously-filed records. Some of the updates may be due to schema changes, but there are a few anomalous quarters for which this theory doesn't provide a reasonable explanation (e.g., Q3 1999, updated 2008-06-09; all other quarters in 1999 were updated 2008-02-05). Another possibility is that clerical errors, i.e., errors due to the Senate's records department and not a lobbyist, are corrected by modifying a quarterly database.

No previous quarters' databases have been modified since I noticed this problem, so I don't have any files to A/B compare yet, but I'll keep an eye out.

Ambiguous amendment records

Because amendment records in the LD-1/LD-2 database don't refer directly to the record they amend, nor do they indicate which fields are corrected, amendment records can be ambiguous. For example, when looking for amended records using the ad-hoc scheme described above in Amendment records, attempting to match both the client name and the client ID might be overconstraining the problem. It's conceivable that an agency could file and amendment record that corrects a mistake in either of these fields, but the amendment record in the database doesn't indicate which.

Some fields on the the LD-1 and LD-2 forms have limited space for enumerating items, e.g., the affiliates section. When a filing exceeds the space alotted for these fields, the filing guidelines recommend filing amendment(s) that enumerate the remaining items. (See here, look for "LD-1 Changes" and "LD-2 Changes".) It's not clear how to distinguish amendment records which append information from those which replace information.

Perhaps the safest approach is to flag amendment records that are ambiguous, e.g., those that match more than one existing record, or those that enumerate affiliates. Those records can be handled manually, or can be used to tune the import process.

Resolved questions

Q: Can we distinguish between expenses/income below threshold and absence of expenses/income?

Lobbying organizations are required to file quarterly reports for their clients, even if there was no lobbying activity during that quarter, and even if income from the client (or expenses, if the organization employs in-house lobbyists) was below the reporting threshold. It might be useful to tell the difference between no payment and payments below the threshold.

A: No, it doesn't appear to be possible to distinguish these cases.

It's a systematic issue. The LD-2 quarterly reporting form requires the submitter to state income as either "below threshold," in which no amount is given, or "above threshold," in which case the income or expenses are reported, according to the rounding rules. There is no third option for "no income or expenses," hence there's no way to distinguish this scenario from the sub-threshold case.

Some records in the Senate XML database report an amount of $0, and others elide the amount field entirely. Both types of records should be considered to report same amount, namely, "below threshold."

Note that the LD-2 form also has a "No Lobbying Issue Activity" checkbox, and if this box is selected, it appears in the Filing Type attribute as "no activity." However, lack of activity and income/expense are orthogonal: a client may pay a lobbying firm a retainer fee, for example, even if there is no active lobbying during that quarter; so the presence of this field in a record is no help for distinguishing zero expense/income from sub-threshold, either.

Q: Is the LD-1/LD-2 database comprehensive?

The Center for Responsive Politics notes that there are three different filing methods for lobbying organizations:

There are three different filing methods. Two options are largely identical (one for for-profit groups, the other for non-profits) and use a definition of lobbying provided by the Internal Revenue Code (IRC). The third follows the definition of lobbying contained in the Lobbying Disclosure Act of 1995 (LDA).

The download page provided by the Senate specifically mentions only the LDA reports, not the IRC reports. It's not clear whether the database includes IRC filings.

A: Yes.

The House LDA Guidelines state that organizations that use the IRC reporting method must still file quarterly LD-2 forms, and must use the amount reported to the IRS as the amount on the LD-2 form. Note that the accounting methods for lobbying expenses in the IRC method are somewhat different than those for the LDA method, but this confirms that the LD-1/LD-2 database does include all legally reported lobbying expenses and income. (Source, see "Organizations Reporting Expenses under Section 15 (Optional IRC Reporting Methods).")

The LD-203 database

James Dondero

No analysis yet.

changed May 29, 2015 delete history edit