In the era of Big Data, the web is an endless source of information. For this reason, there are plenty of good tools/frameworks to perform scraping of web pages.
So, I guess, in an ideal world there should be no need of a new web scraping framework. Nevertheless, there are always subtle differences between theory and practice. The case of web scraping made no exceptions.
I know that already exists some nice solutions to this problems, but in my point of view CHeSF is simpler: you just create a class that inherits from it, define the parse method and launch it with a start url.
The framework is still very alpha. You should expect that things could change rapidly. Currently, there is no documentation, nor packaging. There is just an example showing how you could use the framework to easily scrape TripAdvisor reviews. Personally, I used it to collect this dataset, i.e. a collection of more than 220k TripAdvisor reviews.
CHeSF borrows its working philosophy (in part) from Scrapy, i.e. making a scraping tool means creating (at least) a python class.
Subscribe to Python Awesome
Get the latest posts delivered right to your inbox