23 Do you need to design an API?
An important issue to be taken into account when opening databases is designing an API (application programming interface) to provide information on the Web. In the scope of this guide, an API can be summarized as a layer of interaction between a database and an application that feeds on the data. The API provides to interested developers and entrepreneurs a set of standard Web calls for extracting data from a certain database. Designing an API requires refined technical knowledge and, if the database is going to be public, arbitrary standards must be defined, trying to predict the cases in which developers and entrepreneurs will need the data. An API provides a number of advantages, such as easier and faster access to databases. Instead of downloading the entire database, programmers will only need to make a simple call on the Web to extract the section that interests them at that moment. It also facilitates real-time access to specific parts of the database, allowing the development of applications that rely on rapidly updated data.
An API can be private when a developer has control over the database and creates it to facilitate access to the data, or public when the custodian of a database designs an API to serve a community of developers and entrepreneurs, trying to foresee what kinds of database calls will be useful and generic enough to provide for the greatest number of possible applications. Services such as Facebook and Twitter have public APIs that allow programmers from around the world to interact to a limited extent with the immense amount of the data involved.
Despite its advantages, designing an API within the government can bring about uncomfortable situations, depending on the case. It is necessary to carefully think about whether designing an API is the best way to go, since there are alternatives that can better suit both developers interested in government data and teams of public employees or professionals contracted by the state to keep the APIs working stably and reliably.
A hypothetical case
Imagine that the Department of Logistics and Transport of the State of São Paulo designed a public API so that any developer could access information about the maintenance conditions of São Paulo roads. One day the API server was flooded and the database server crashed. State services that depended on this database stopped working. The logs showed that there was a sharp increase in traffic between eight and nine o’clock in the morning and loads of API calls were made from many different places. After nine o’clock, the server load decreased and everything returned to normal.
What happened?
Continuing with the fictional scenario, the year before, the Department of Logistics and Transport started to make their data available as part of the state's transparency policy. They were in a rush and, with reduced staff, decided to create an API for the road data by setting up an Internet-facing API server. The API design took into consideration potential use cases that application developers might have, but it was hard to know what people wanted. The staff of the department settled on three generic API calls.
A year after the problem with the servers, the department learned that an entrepreneur had developed a very successful mobile app used by several hundred thousand people. Every morning before the users went to work, the app showed the maintenance situation of the roads in São Paulo. To download this data, each application installed on each mobile device had to make two API calls. That promptly crashed the department's servers because the infrastructure was not designed to cope with the load.
Alternative
An alternative to the model presented above is to publish data dumps in files. In this model, data from the database is exported and transformed into an open file format, such as CSV. After that, the files are properly named and stored on a web page server. This means that any developer can download all the data, load it into their own system and design their API (in this case private) according to their planned use of the data. Then high load will hit their own servers without affecting the operation of other government services. Another advantage is that it is very simple to publish data dumps on a web page server. If files and URLs are named consistently, it is easy for developers to pick up data over time (e.g., http://exemplo.com/estradas/2015-01-30.csv).
Considerations
● Do you really need an API? Designing an API can become an expensive project that competes with other IT projects with higher priority. In addition, this type of project involves making decisions about which calls will be made. Do you know how your consumers will use your data? Will your API help consumers use the data in the best way possible? What is your plan for coping with increased load?
● Make it easy for developers to keep a local copy of your data up-to-date. Providing consistently named data dumps makes this simple.
● Isolate internal systems from the effects of external data publishing. Take proper care so that the load coming from the Web does not interfere with internal databases, affecting other services of the government.
● Make sure you can change your systems without breaking URLs. Developers will build apps that depend on your URLs. Do not force them to rewrite their software just because you are switching to a new platform. Signs that things can be better designed include platform-specific fragments like "aspx" or "jsp" in your URLs. Get rid of those.