32 Formats of the databases

In the scope of this guide, a database is nothing more than a computer file built in a structured way in order to store information for subsequent consultation and analysis. Your database can be built manually, provided you define a structure for organizing the data and maintaining consistency. This is important to ensure that queries performed on this database find what they are looking for. A database can be a text file, for example, with a list of all the cities in the state of São Paulo, or a list of the hospitals in the city of São Paulo showing the district where each is located:

Hospital Municipal Infantil Menino Jesus, Bela Vista
Hospital do Servidor Público Municipal, Aclimação
Pronto-Socorro Municipal Barra Funda, Barra Funda
Hospital Municipal Cidade Tiradentes, Cidade Tiradentes

In this case, the structure is defined by putting two names (hospital and district) in each new row of the file, separated by a delimiter, a comma. Two hospitals never appear in the same row, for example. Roughly speaking, what defines the integrity of a database, is the elements used in order to give predictability to queries on that database: In the above example, all the rows have the name of the hospital first and the district where it is located second. If any row of this database is different from the model "Hospital name, District," the integrity of the database will be compromised and it will lose its usefulness:

Hospital Municipal Infantil Menino Jesus, Bela Vista
Aclimação, Hospital do Servidor Público Municipal
Pronto-Socorro Municipal Barra Funda, Barra Funda
Hospital Municipal Cidade Tiradentes, Cidade Tiradentes

In most cases, however, the right tools on the computer are able to automatically create or convert structured files that serve as databases. One of the most common examples is the Excel spreadsheet, computer files ending in ".xls" or ".xlsx." These documents have rows and columns and enable subsequent analysis and comparison. However, the format of native Excel files uses a proprietary, closed technology; such technologies often cost money and are not widely and freely available to everyone.

The list below suggests a number of open and non-proprietary formats that best fit into the open data principles presented in this guide and provides a brief introduction to each. One format is not recommended over the others. Each team should think about the formats that the databases are currently in (Excel files, for example) and whether there is a means of converting them into any of the formats suggested below, depending on the application.