Nancy Potok is the chief statistician of the United States. I interviewed her last month about her role, and the challenges faced by the U.S. national statistics system. The interview has been lightly edited for style and flow.
Q: How does the U.S. statistics system work, and what do you do as chief statistician?
A: The U.S. has a decentralized statistical system. There is not one national office of statistics, but rather we have 13 designated statistical agencies that are embedded in departments. These include the Census Bureau and the Bureau of Economic Analysis in the Commerce Department, and the Bureau of Labor Statistics in the Labor Department. That means coordination. In 1995, a reauthorization of the Paperwork Reduction Act of 1980 was passed. This act requires coordination of federal information policy by the U.S. Office of Management and Budget (OMB), and created the position of the chief statistician within the OMB. The law lays out duties and responsibilities, including ensuring the integrity, objectivity, impartiality, utility and confidentiality of information collected for statistical purposes. It also created the Interagency Council on Statistical Policy, which is made up of the heads of the designated statistical agencies.
So the role of the chief statistician is really derived from the duties that are laid out in the act. I’ve mentioned the coordination of the federal statistical system. Also, the chief statistician is responsible for generating governmentwide data collection standards. In addition, the chief statistician generates methodological guidance and promotes innovation. There are several guidance memos, and a detailed statistical directive that tells federal agencies that if they’re going to conduct a random sample survey, for example, there is a specific quality standard and method they need to use to ensure that the results are rigorous and high quality. Finally, the chief statistician also represents the U.S. internationally: I head the delegation to the United Nations Statistical Commission, and the Committee on Statistics and Statistical Policy for the Organization for Economic Cooperation and Development (OECD).
Q: So the U.S. has a decentralized system, where you coordinate across a variety of different departments and agencies.
A: Yes, and there are other mechanisms, too. As I mentioned, we have the Interagency Council on Statistical Policy. There is also the Federal Committee on Statistical Methodology. This is a committee the chief statistician sets up that has the most qualified experts from across the federal government on various aspects of statistical methodology. There is, for example, a subcommittee on disclosure avoidance. There are people who research best advances in survey methodology. As times change, the research changes, and some of the background and qualifications of the people on the committee change, too. They are now looking, for example, at combining data from administrative records, surveys and commercial data to improve the quality of statistical data products and releases. We’re always bringing in new people who are experts in developing technical areas onto the committee — that is one of the primary methods for coordinating research across the federal government that is happening throughout the statistical community.
Q: There are a lot of people thinking about how to use less organized data than the traditional statistical series that the U.S. government and other actors have collected for decades. How does that affect your role and that of the federal statistical system, and does that provide new opportunities?
A: Opportunities is an excellent way of characterizing this. I’m really excited about having the job of chief statistician because, as the person who is coordinating the federal statistical agencies, I really have an opportunity not only to improve collaboration among agencies, but to also bring in international expertise. Other countries have been using these administrative data sources for a longer period of time. In particular, some European countries have been combining their administrative records with survey data, and have done a lot of work on how you assess the data, and what are the appropriate uses. We also have very important private sector and academic partners. There are a lot of people in academia in data science, statistics, economics and other disciplines who have been doing exciting research alongside the federal statistical agencies about how to address the questions of quality that you have to think about if you move beyond a random survey or federal statistical data. Many of these advances have become possible due to greatly increased scientific computing power that allows high quality data linkages.
The Federal Committee on Statistical Methodology and the Interagency Council on Statistical Policy have set a priority goal of energizing the federal statistical community to figure out how to reduce the burden on the public of responding to surveys, which they are ever less willing to do, while still keeping the gold standard of quality for the data we are releasing. We are putting in place many of the things recommended in the recent National Academy of Sciences report. Some of this involves looking to see if they are researching whether commercial data sources could be used in lieu of collecting data from the public. Or there may be data that the government already has in administrative records that can be reused, instead of spending money and putting a burden on people to collect that information again in a survey.
The key though is quality if we are going to be using a lot more data that would be coming from new combined sources. There are really sophisticated ways of talking about the quality of the statistical data in a survey. We can talk about variance, we can talk about standard deviation, and we can say a lot about the error measurement. If you start combining that with programmatic data, it becomes challenging to talk in a standard way about the quality of the records you are using. There are examples where countries have done this for a while, so we’re trying to learn from what they’ve done, and fit it to what is appropriate for the United States to do.
Q: That National Academy Report talks about how data has to look trustworthy to a wide variety of users. How do you do this with these new data sources, which may require more sophisticated measures to ensure that the inferences you draw from them are good?
A: I don’t think it’s more challenging than explaining statistical methodology to the general public is at the moment. When you do the deep technical dive on how statistical data are produced, I don’t think that most people who are not statisticians understand it now. Even if the techniques and methodologies are more sophisticated, most people trust the federal statistical agencies’ brand. I see it more as a continuation of the current challenge. Nevertherless, we need to provide the transparency that will earn the public’s trust.
Q: Are there other benefits that accrue from the new kinds of data that are becoming available?
A: I think so. The Census Bureau is doing research now to address a big problem that they have with a lot of businesses that do not like responding to a survey about their retail sales. Small businesses complain about the burden from the federal government, and ask why they have to report their retail sales to the Census Bureau every month. But it is very important to know what those retail sales are, if you are going to put out an accurate economic indicator for retail sales. The same is true for other indicators. As a research project, the Census Bureau has been going to a commercial source to buy aggregated credit card data, which doesn’t identify individuals, to identify how much was spent for a particular kind of commodity at a particular store. You can get this daily from the aggregator. So what are the issues with using this data, as opposed to getting businesses to break down the commodities that they sell every month, which is a painful but necessary burden for them?
The benefit is that you can get the data a lot faster. There are users like the Federal Reserve, who want the data quickly to make fiscal policy. You also get it at much lower levels of geography: for traditional data you are getting it from a survey at the national level, which means you can’t be too accurate in saying what retail sales look like in New York or Atlanta or Ames, Iowa. The question is whether the aggregated credit card data are accurate. This data covers sales, but doesn’t cover cash sales. So you have to know what you might be missing and figure out how to fill that in from another source. Also can you depend on the data to always be available from the commercial source? You know that the Census Bureau is going to be around five years from now, but if you are buying the data from a company, and the company is acquired by another company, your source could disappear or the price could go up significantly. You need some kind of risk mitigation plan — if the source dries up, are there other sources with the same high quality?
However, the research that census is doing compares the monthly retail sales survey data to the aggregated credit card data to see if they track. The numbers are not identical, but they track closely. There are a few commodities that are seldom purchased with credit cards, like automobiles. You have to then use a methodology to figure out how to get automobiles into retail sales, because they’re a very important commodity, especially in some months when people buy a lot of cars. That is the kind of data that you can buy. However, there are factors you have to consider before you move to a new source of data, so the research is going to take a little while — you want to make sure it is stable and high quality before you switch sources, especially because that will be a break in the longitudinal data series.
Finally, the key thing about federal statistical data is that it describes groups — it doesn’t describe individuals, and is never used for purposes such as law enforcement or regulation. The confidentiality and privacy provisions for federal statistical data are extremely strict. When you combine it with data from other sources, people need to understand that their confidentiality and privacy will be protected. That is, when they have given their information to the government for a particular purpose, such as applying for a benefit program, when it is reused for statistical purposes, it will be de-identified and aggregated, which means that their personal information will get the same protection as a response to a federal Census Bureau survey. This is an important point to keep in mind when we think about these other sources of data.
This article is one in a series supported by the MacArthur Foundation Research Network on Opening Governance that seeks to work collaboratively to increase our understanding of how to design more effective and legitimate democratic institutions using new technologies and methods. Neither the MacArthur Foundation nor the Network is responsible for the article’s specific content. Other posts in the series can be found here.