Scraper#
- class sunpy.net.Scraper(pattern, regex=False, **kwargs)[source]#
Bases:
object
A Scraper to scrap web data archives based on dates.
- Parameters:
pattern (
str
) – A string containing the url with the date encoded as datetime formats, and any other parameter askwargs
as a string format. This can also be a uri to a local file patterns.regex (
bool
) – Set toTrue
if parts of the pattern uses regexp symbols. This only works for the filename part of the pattern rather than the full url. Be careful that periods.
matches any character and therefore it’s better to escape them. If regexp is used, otherkwargs
are ignored and string replacement is not possible. Default isFalse
.
- now#
The pattern with the actual date.
- Type:
Examples
>>> from sunpy.net import Scraper >>> pattern = ('http://proba2.oma.be/{instrument}/data/bsd/%Y/%m/%d/' ... '{instrument}_lv1_%Y%m%d_%H%M%S.fits') >>> swap = Scraper(pattern, instrument='swap') >>> print(swap.pattern) http://proba2.oma.be/swap/data/bsd/%Y/%m/%d/swap_lv1_%Y%m%d_%H%M%S.fits >>> print(swap.now) http://proba2.oma.be/swap/data/bsd/2022/12/21/swap_lv1_20221221_112433.fits
Notes
The
now
attribute does not return an existent file, but just how the pattern looks with the actual time.Methods Summary
filelist
(timerange)Returns the list of existent files in the archive for the given time range.
matches
(filepath, date)range
(timerange)Gets the directories for a certain range of time.
Methods Documentation
- filelist(timerange)[source]#
Returns the list of existent files in the archive for the given time range.
- Parameters:
timerange (
TimeRange
) – Time interval where to find the directories for a given pattern.- Returns:
filesurls (
list
ofstr
) – List of all the files found between the time range given.
Examples
>>> from sunpy.net import Scraper >>> pattern = ('http://proba2.oma.be/{instrument}/data/bsd/%Y/%m/%d/' ... '{instrument}_lv1_%Y%m%d_%H%M%S.fits') >>> swap = Scraper(pattern, instrument='swap') >>> from sunpy.time import TimeRange >>> timerange = TimeRange('2015-01-01T00:08:00','2015-01-01T00:12:00') >>> print(swap.filelist(timerange)) ['http://proba2.oma.be/swap/data/bsd/2015/01/01/swap_lv1_20150101_000857.fits', 'http://proba2.oma.be/swap/data/bsd/2015/01/01/swap_lv1_20150101_001027.fits', 'http://proba2.oma.be/swap/data/bsd/2015/01/01/swap_lv1_20150101_001157.fits']
Notes
The search is strict with the time range, so if the archive scraped contains daily files, but the range doesn’t start from the beginning of the day, then the file for that day won’t be selected. The end of the timerange will normally be OK as includes the file on such end time.