Created covid data pipeline using PySpark and MySQL that collected data stream from API

Nov 16, 2021 1 min read

Covid-datapipeline-using-pyspark-and-mysql

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Tools used : PySpark , MySQL

Procedure

Fetch latest data from API using requests & pandas module of python.
Apply some data processing and filtering to generate summarized information.
Store that summarized information into database using MySQL.

To build above pipeline i had used pyspark

{IMPORTANT}

Before move to the execution part please read below sentences

Use correct connector and drivername while making connection with MySQL db if you are going to use different db then procedure may differ.
change login credentials (username & password) in covid-config.json.
Make sure that mentioned database and table is already created.

How to use

clone Covid-datapipeline-using-pyspark-and-mysql repo.
start MySQL server
execute following command

  python main.py

Results:

command line output:

Database status after execution:

GitHub

View Github

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.

Created covid data pipeline using PySpark and MySQL that collected data stream from API

Covid-datapipeline-using-pyspark-and-mysql

Tools used : PySpark , MySQL

Procedure

{IMPORTANT}

How to use

Results:

GitHub

John

Service for working with open data of the State Duma of the Russian Federation

Cve-search : a tool to perform local searches for known vulnerabilities

Covid-datapipeline-using-pyspark-and-mysql

Tools used : PySpark , MySQL

Procedure

{IMPORTANT}

How to use

Results:

GitHub

Service for working with open data of the State Duma of the Russian Federation

Cve-search : a tool to perform local searches for known vulnerabilities

You might also like...