Guide
This is an usage guide for the procyclingstats package. For more detailed information about the concrete scraping classes and methods see the API page.
Creating scraping objects
All scraping classes have the
Scraper constructor. The
easiest way to create scraping object that is ready for parsing is by passing
the URL (either relative or absolute). In order for the scraping class to work
correctly, the passed URL has to be in valid format (e.g.
"rider/tadej-pogacar" for the Rider
class). There is an example of valid URL for each scraping class in it’s
documentation. URL validating is no longer supported so it’s only up to you to
decide whether valid HTML will be obtained from passed URL. If HTML from passed
URL isn’t valid, the parsing methods won’t work correctly.
To create an object ready for parsing without making request it’s needed to
pass the HTML of the page as html parameter. URL has to be passed in that
case too. Passed HTML should be a string that is HTML from procyclingstats
page. You should also set the update_html parameter to False so request
to given URL isn’t made. HTML that was passed or obtained from the page might
be also invalid in some cases. Invalid HTML looks usually like this one.
When that is the case, ValueError is raised.
Parsing methods
When scraping object is ready for parsing, use parsing methods to get data from the HTML. Parsing methods differs among scraping classes and all of them are documented on the API page. There are three types of parsing methods:
- Basic parsing methods
Parses only one piece of information from the HTML. See
Rider.birthdatemethod for an example.
- Table parsing methods
Parses table or unordered list from the HTML and returns it as list of dicts where dict keys are wanted fields which are passed as arguments. See
Rider.teams_historymethod for an example.
- Select menu parsing methods
Parses select menu from HTML and returns it as list of dicts where dict keys are always
"text"and"value". SeeRace.prev_editions_selectmethod for an example.
Some parsing methods might be unavailable with some HTMLs. In that case the
method raises ExpectedParsingError after being called. For an example, when
a Ranking scraping object is
created from a URL that points to a page with team ranking
Ranking.individual_ranking
method raises ExpectedParsingError, because the ranking on the page isn’t
an individual ranking. Use instead
Ranking.team_ranking
method to get the ranking.
Parsing all available data
When it’s needed to get all parsable data from the page, use the
parse method. It calls all
the scraping methods of the scraping class and returns dictionary where keys
are called scraping methods and values are returned parsed values. See the
parse method for more
information.