A reference implementation for processing the content.log files found at opendata.dwd.de/weather.
wget https://opendata.dwd.de/weather/nwp/content.log.bz2 bunzip2 content.log.bz2 echo "./icon-d2/grib/*/t_2m/*" > wildcards.txt ./get_updated_files.py -w wildcards.txt 2022-01-26T03:00 > updated_files.txt
The produced file
updated_files.txt will hold all pathnames as given by content.log that are updated since
2022-01-26 03:00:00 UTC according to the file's modification date. Remember that those paths are relative to the directory the
content.log.bz2 was in, in this case https://opendata.dwd.de/weather/nwp/.
Also mind that there are multiple servers behind https://opendata.dwd.de which might not be exactly in sync with each other regarding file modification timestamps. Look into the code of
get_updated_files.py for a suggestion on how to deal with that.
While this program relies on the file modification timestamp dumped into
content.log.bz2, it might be more feasible to process the data reference time that is contained in the filenames instead.
$ ./get_updated_files.py --help usage: get_updated_files.py [-h] [--content-log CONTENT_LOG_FILE] [--wildcards WILDCARDS_FILE] [--min-delta MIN_DELTA] [--version] UPDATED_SINCE Filters paths of a DWD Open Data content.log file for entries that have been updated. positional arguments: UPDATED_SINCE Last time files were checked for updates optional arguments: -h, --help show this help message and exit --content-log CONTENT_LOG_FILE, -f CONTENT_LOG_FILE The decompressed content.log file (default: content.log) --wildcards WILDCARDS_FILE, -w WILDCARDS_FILE Filter results by a set of wildcards --min-delta MIN_DELTA, -d MIN_DELTA Minimum number of seconds a file needs to be younger than UPDATED_SINCE (default: 60) --version show program's version number and exit