DSCI_553

USC Fall 2021 DSCI 553 (Foundations and Applications of Data Mining) 数据挖掘基础与应用 Score: 95/100

Course Description

Algorithms and techniques of Data Mining and Machine Learning for analyzing massive datasets. Emphasis on Map Reduce and others. Case studies and applications.

Data mining is a fundamental skill for massive data analysis. At a high level, it allows the analyst to discover patterns in data, and transform it into a usable product. The course will teach data mining algorithms for analyzing very large data sets. It will have an applied focus, in that it is meant for preparing students to utilize topics in data mining to solve real world problems.

Homeworks

These following code are my homework source code.

No. Main Application Programming Tags Score
1 Data Exploration Python MapReduce Spark Pyspark 6.5 (python) + 0.0 (scala) / 7.0 + 0.7
2 Find Frequent Itemsets Python PCY Apriori SON 7.0 (python) + 0.0 (scala) / 7.0 + 0.7
3 Recommendation Systems Python Collaborative Filtering MinHash LSH 7.0 (python) + 0.0 (scala) / 7.0 + 0.7
4 Graph Network Algorithm Python Betweenness Communities Detection Girvan-Newman Algorithm 7.0 (python) + 0.0 (scala) / 7.0 + 0.7
5 Clustering Algorithm Python K-Means Bradley-Fayyad-Reina (BFR) Algorithm NMI 7.0 (python) + 0.0 (scala) / 7.0 + 0.7
6 Streaming Mining Python Bloom Filter Flajolet-Martin Algorithm Twitter Streaming Reservoir Sampling 7.0 (python) + 0.0 (scala) / 7.0 + 0.7
Competition Recommendation System Code User Based Collaborative Filtering XGboost RMSE: 0.97 score: 8.0/8.0

GitHub

View Github