--- title: "Using GetDFPData to obtain annual financial reports from B3" author: "Marcelo Perlin" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using GetDFPData to obtain financial reports from B3} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Financial statements of companies traded at B3 (formerly Bovespa), the Brazilian stock exchange, are available in its [website](http://www.b3.com.br/). Accessing the data for a single company is straightforward. In the website one can find a simple interface for accessing this dataset. However, gathering and organizing the data for a large scale research, with many companies and many dates, is painful. Financial reports must be downloaded or copied individually and later aggregated. Changes in the accounting format thoughout time can make this process slow, unreliable and irreproducible. Package `GetDFPData` provides a R interface to all annual financial statements available in the website and more. It not only downloads the data but also organizes it in a tabular format and allows the use of inflation indexes. Users can select companies and a time period to download all available data. Several information about current companies, such as sector and available quarters are also at reach. The main purpose of the package is to make it easy to access financial statements in large scale research, facilitating the reproducibility of corporate finance studies with B3 data. The positive aspects of `GetDFDData` are: - Easy and simple R and web interface - Changes in accounting format are internally handled by the software - Access to corporate events in the FRE system such as dividend payments, changes in stock holder composition, changes in governance listings, board composition and compensation, debt composition, and a lot more! - The output data is automatically organized using tidy data principles (long format) - A cache system is employed for fast data acquisition - Completely free and open source! A white paper about the package is available at [SSRN](https://www.ssrn.com/abstract=3128252). # Installation The package is (not yet) available in CRAN (release version) and in Github (development version). You can install any of those with the following code: ```{r, eval=FALSE} # Release version in CRAN install.packages('GetDFPData') # Development version in Github devtools::install_github('msperlin/GetDFPData') ``` # Shinny interface The web interface of `GetDFPData` is available at [https://www.msperlin.com/shiny/GetDFPData/](https://www.msperlin.com/shiny/GetDFPData/). # How to use `GetDFPData` The starting point of `GetDFPData` is to find the official names of companies in B3. Function `gdfpd.search.company` serves this purpose. Given a string (text), it will search for a partial matches in companies names. As an example, let's find the _official_ name of Petrobras, one of the largest companies in Brazil: ```{r, eval=FALSE} library(GetDFPData) library(tibble) gdfpd.search.company('petrobras', cache.folder = tempdir()) ``` Its official name in Bovespa records is `PETRĂ“LEO BRASILEIRO S.A. - PETROBRAS`. Data for quarterly and annual statements are available from 1998 to 2017. The situation of the company, active or canceled, is also given. This helps verifying the availability of data. The content of all available financial statements can be accessed with function `gdfpd.get.info.companies`. It will read and parse a .csv file from my [github repository](https://github.com/msperlin/GetITRData_auxiliary). This will be periodically updated for new information. Let's try it out: ```{r, eval=FALSE} df.info <- gdfpd.get.info.companies(type.data = 'companies', cache.folder = tempdir()) glimpse(df.info) ``` This file includes several information that are gathered from Bovespa: names of companies, official numeric ids, listing segment, sectors, traded tickers and, most importantly, the available dates. The resulting dataframe can be used to filter and gather information for large scale research such as downloading financial data for a specific sector. ## Downloading financial information for ONE company All you need to download financial data with `GetDFPData` are the official names of companies, which can be found with `gdfpd.search.company`, the desired starting and ending dates and the type of financial information (individual or consolidated). Let's try it for PETROBRAS: ```{r, eval=FALSE} name.companies <- 'PETRĂ“LEO BRASILEIRO S.A. - PETROBRAS' first.date <- '2004-01-01' last.date <- '2006-01-01' df.reports <- gdfpd.GetDFPData(name.companies = name.companies, first.date = first.date, last.date = last.date, cache.folder = tempdir()) ``` The resulting object is a `tibble`, a data.frame type of object that allows for list columns. Let's have a look in its content: ```{r, eval=FALSE} glimpse(df.reports) ``` Object `df.reports` only has one row since we only asked for data of one company. The number of rows increases with the number of companies, as we will soon learn with the next example. All financial statements for the different years are available within `df.reports`. For example, the assets statements for all desired years of PETROBRAS are: ```{r, eval=FALSE} df.income.long <- df.reports$fr.income[[1]] glimpse(df.income.long) ``` The resulting dataframe is in the long format, ready for processing. In the long format, financial statements of different years are stacked. In the wide format, we have the year as columns of the table. If you want the wide format, which is the most common way that financial reports are presented, you can use function `gdfpd.convert.to.wide`. See an example next: ```{r, eval=FALSE} df.income.wide <- gdfpd.convert.to.wide(df.income.long) knitr::kable(df.income.wide ) ``` ## Exporting financial data The package includes function `gdfpd.export.DFP.data` for exporting the financial data to an Excel or zipped csv files. See next: ```{r, eval=FALSE} my.basename <- 'MyExcelData' my.format <- 'csv' # only supported so far gdfpd.export.DFP.data(df.reports = df.reports, base.file.name = my.basename, type.export = my.format) ``` The resulting Excel file contains all data available in `df.reports`.