Prep for the databricks spark 3 certification

Jan 14, 2022 1 min read

Databricks Spark 3.0 ceritification

Prep for the databricks spark 3 certification using Learning Spark v2 book resource
Sample datasets have been taken from the databricks community edition storage.

The repo covers the practical aspects of dataframe api including the following:

Subsetting DataFrames (select, filter, etc.)
Column manipulation (casting, creating columns, manipulating existing columns, complex column types)
String manipulation (Splitting strings, regex)
Reading/writing DataFrames (schemas, formats- parquet, avro, json etc)
Rows, Columns and Expressions
Common Dataframe operations (filter, select, where, distinct, sort, limit)
Wide and Narrow Transformations
Working with dates (extraction, formatting, etc)
Aggregations (groupBy, orderBy, count)
Statistical methods (avg, sum, max, min, describe, correlation, sampleBy)
UDFs
Combining datasets (joins, unions, broadcasting)
Optimising and tuning (caching and persistence, repartitioning, shuffle, catalyst optimiser – logical, optimised plans)

In addition, there is also an example of mlflow pipeline although not part of the certification

GitHub

View Github

Certificate Certificates

John was the first writer to have joined pythonawesome.com. He has since then inculcated very effective writing and reviewing culture at pythonawesome which rivals have found impossible to imitate.

Prep for the databricks spark 3 certification

Databricks Spark 3.0 ceritification

GitHub

John

The GUI application by Python3.8. Using QT Design draw UI and generator UI XML file provides to PySide2 build GUI components

A general-purpose discord bot for the 8G Discord-Server

Databricks Spark 3.0 ceritification

GitHub

The GUI application by Python3.8. Using QT Design draw UI and generator UI XML file provides to PySide2 build GUI components

A general-purpose discord bot for the 8G Discord-Server

You might also like...