Public CNPJ Data
The Brazilian Federal Revenue Service provides databases with public data from the national registry of legal entities (CNPJ).
In general, they contain the same information that we can see on the CNPJ card, when we make an individual consultation, plus other data from Simples Nacional, partners, etc. Very rich analyzes can come out of this data, from economic, market to investigations.
This repository contains an ETL process to i) download the files; ii) unzip; iii) read, treat and iv) insert into a PostgreSQL relational database.
Infrastructure required:
-
Python 3.8 - libraries:
- wget
- pandas
- ftplib
- datetime
- gzip
- urllib
- bs4
-re - you
- zipfile
- sqlalchemy
- psycopg2
- team
-
Database:
How to use:
-
With Postgre installed, start the server instance (may be local) and create the database as the
database.sql
file. -
Depending on your environment, replace the variables below in the
ETL_collect_data_and_write_BD.py
file:output_files
: destination directory for downloading filesuser
: database user created by thedatabase.sql
filepassw
: DB user passwordhost
: DB connection hostport
: DB connection portdatabase
: name of the database in the instance (Data_RFB
- according to thedatabase.sql
file)
-
Execute the file
ETL_collect_data_and_write_BD.py
and wait for the process to finish.- Files are large: depending on the infrastructure this might take many hours to complete.
- Files from 05/08/2021:
4.68 GB
compressed and17.1 GB
uncompressed.
Generated tables:
-
For more information, see the layout.
company
: company registration data at the matrix levelestablishment
: analytical data of the company by unit / establishment (phone numbers, address, branch, etc.)partners
: registration data of company partnerssimple
: MEI and Simple National datacnae
: code and description of CNAEsquals
: qualification table of individuals - partners, responsible and legal representative.natju
: table of legal natures - code and description.moti
: table of reasons for the registration situation - code and description.country
: country table - code and description.munic
: table of municipalities - code and description.
-
Due to the volume of data, the
company
,establishment
,partners
andsimple
tables have indexes for thecnpj_basico
column, which is the main linking key between them.
GitHub
https://github.com/aphonsoar/Receita_Federal_do_Brasil_-_Dados_Publicos_CNPJ