Scraper#
- class sunpy.net.scraper.Scraper(format, **kwargs)[source]#
Bases:
objectA scraper to scrap web data archives based on dates.
- Parameters:
format (
str) – A string containing the url with the date and other information to be extracted encoded asparseformats, and any otherkwargsparameters as a string format, the former represented using double curly-brackets to differentiate from the latter. The accepted parse representations for datetime values are as given inPARSE_TIME_CONVERSIONS. This can also be a uri to a local file patterns. Default isNone.kwargs (
dict) – A dictionary containing the values to be replaced in the pattern. Will be ignored ifregexisTrue.
- now#
The pattern with the actual date. This is not checking if there is an existent file, but just how the
patternlooks with the current time.- Type:
Examples
>>> from sunpy.net import Scraper >>> >>> pattern = ('https://proba2.sidc.be/{instrument}/data/bsd/{{year:4d}}/{{month:2d}}/{{day:2d}}/' ... '{instrument}_lv1_{{year:4d}}{{month:2d}}{{day:2d}}_{{hour:2d}}{{month:2d}}{{second:2d}}.fits') >>> swap = Scraper(format=pattern, instrument='swap') >>> >>> print(swap.pattern) https://proba2.sidc.be/swap/data/bsd/{year:4d}/{month:2d}/{day:2d}/swap_lv1_{year:4d}{month:2d}{day:2d}_{hour:2d}{month:2d}{second:2d}.fits >>> >>> print(swap.datetime_pattern) https://proba2.sidc.be/swap/data/bsd/%Y/%m/%d/swap_lv1_%Y%m%d_%H%m%S.fits >>> >>> print(swap.now) https://proba2.sidc.be/swap/data/bsd/2022/12/21/swap_lv1_20221221_112433.fits
Methods Summary
filelist(timerange)Returns the list of existent files in the archive for the given time range.
matches(filepath, date)Checks if the given filepath is how the file path is expected to look on given date based on the pattern.
range(timerange)Gets the directories for a certain range of time.
Methods Documentation
- filelist(timerange)[source]#
Returns the list of existent files in the archive for the given time range.
- Parameters:
timerange (
TimeRange) – Time interval where to find the directories for a given pattern.- Returns:
filesurls (
listofstr) – List of all the files found between the time range given.
Examples
>>> from sunpy.net import Scraper >>> pattern = ('https://proba2.sidc.be/{instrument}/data/bsd/{{year:4d}}/{{month:2d}}/{{day:2d}}/' ... '{instrument}_lv1_{{year:4d}}{{month:2d}}{{day:2d}}_{{hour:2d}}{{minute:2d}}{{second:2d}}.fits') >>> swap = Scraper(pattern, instrument='swap') >>> from sunpy.time import TimeRange >>> timerange = TimeRange('2015-01-01T00:08:00','2015-01-01T00:12:00') >>> print(swap.filelist(timerange)) ['https://proba2.sidc.be/swap/data/bsd/2015/01/01/swap_lv1_20150101_000857.fits', 'https://proba2.sidc.be/swap/data/bsd/2015/01/01/swap_lv1_20150101_001027.fits', 'https://proba2.sidc.be/swap/data/bsd/2015/01/01/swap_lv1_20150101_001157.fits']
While writing the pattern, we can also leverage parse capabilities by using the
{{}}notation to match parts of the filename that cannot be known beforehand:>>> from sunpy.net import Scraper >>> from sunpy.time import TimeRange >>> pattern = 'https://proba2.sidc.be/lyra/data/bsd/{{year:4d}}/{{month:2d}}/{{day:2d}}/{{}}_lev{{Level:1d}}_std.fits' >>> lyra = Scraper(pattern) >>> timerange = TimeRange('2023-03-06T00:00:00','2023-03-06T00:10:00') >>> print(swap.filelist(timerange)) ['https://proba2.sidc.be/swap/data/bsd/2023/03/06/swap_lv1_20230306_000128.fits', 'https://proba2.sidc.be/swap/data/bsd/2023/03/06/swap_lv1_20230306_000318.fits', 'https://proba2.sidc.be/swap/data/bsd/2023/03/06/swap_lv1_20230306_000508.fits', 'https://proba2.sidc.be/swap/data/bsd/2023/03/06/swap_lv1_20230306_000658.fits', 'https://proba2.sidc.be/swap/data/bsd/2023/03/06/swap_lv1_20230306_000848.fits']
Notes
The search is strict with the time range, so if the archive scraped contains daily files, but the range doesn’t start from the beginning of the day, then the file for that day won’t be selected. The end of the timerange will normally be OK as includes the file on such end time.
- matches(filepath, date)[source]#
Checks if the given filepath is how the file path is expected to look on given date based on the pattern.
- Parameters:
filepath (
str) – File path to check.date (
datetime.datetimeorastropy.time.Time) – The date for which to check.
- Returns:
bool–Trueif the given filepath matches with the calculated one for given date, elseFalse.