Your First Crawl¶
Let’s crawl your first webpage! Start by opening up a webpage that you'd like to crawl, and note the URL for later.
Accessing your dashboard¶
To start a crawl, you'll need to log in using a Browsertrix account that has crawler permissions.
You likely have crawler permissions already if:
- You registered for an org on hosted Browsertrix
- You joined an existing org and were given crawler permissions
- You are the admin of a self-hosted instance
Check if you have crawler permissions by logging in. If you see a + Create New... button near the org name, you're able to start a crawl. If you don't see this button and think that you should, contact your org admin to update your permissions.
Starting the crawl¶
When you log in, the first page you see is the org dashboard. If you've navigated away to another page, navigate back to Dashboard.
- Tap the Create New... shortcut and select Crawl Workflow.
- Enter the URL of the webpage that you noted earlier as the Page URL.
- Tap Run Crawl.
- You should now see your new crawl workflow running. Give the crawler a few moments to warm up, and then watch as it archives the webpage!
Next steps¶
After running your first crawl, check out the following to learn more about Browsertrix's features:
- A detailed list of crawl workflow setup options.
- Adding exclusions to limit your crawl's scope and evading crawler traps by editing exclusion rules while crawling.
- Best practices for crawling with browser profiles to capture content only available when logged in to a website.
- Managing archived items, including uploading previously archived content.
- Organizing and combining archived items with collections for sharing and export.
- Invite collaborators to your org.