Adding a crawler¶
conrad
’s event database is updated every Monday and Thursday at 00:00 UTC
using a GitHub workflow. The workflow runs all crawlers, updates the event database and raises a PR for a maintainer to review and then merge. Once the PR is merged, these new events become available for consumption using the command-line interface!
Currently, conrad
has crawlers for:
There are two steps you need to do for adding a crawler to conrad
, writing a crawler and then scheduling the crawler.
Note
Please use the pull request workflow that is described in the Contributor’s Guide.
Writing a crawler¶
All crawlers are present in the crawlers
package at the root of the GitHub repository.
You can use the generate
command to generate the base code for your crawler:
$ conrad generate crawler Creepy
create crawlers/creepy/creepy_crawler.py
And then add your crawling code to the generated file which will be used to populate the events list:
class CreepyCrawler(BaseCrawler):
def get_events(self):
# Populate this list of events using your code
events = []
# YOUR CODE HERE
# Extend the self.events list with the new list
self.events.extend(events)
You can use the run
command to see if your data is getting saved in the format specified in Adding new events:
$ conrad run crawler Creepy
save data/creepy.json
After you’re finished writing your crawling code, you just need to schedule it.
Scheduling the crawler¶
To schedule the newly added CreepyCrawler
, you need to update the workflow definition, by adding the following step (before the “Create pull request” step) to the get_events
job:
- id: source_name
name: Get Creepy events action step
uses: ./.github/actions/get-events-action
with:
crawler-name: 'CreepyCrawler'
Finally, you can raise a PR, which after getting merged can start populating the events list every Monday and Thursday.