Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
5e2b1e3
refactor: put old course to legacy folder
honzajavorek Oct 15, 2025
192b1ca
refactor: update links to target the legacy folder
honzajavorek Oct 15, 2025
cc21bf0
refactor: put new course to JS folder
honzajavorek Oct 15, 2025
5141db3
chore: prepare redirects
honzajavorek Sep 8, 2025
87aaf33
feat: put the new course to the /scraping-basics-javascript/ URL
honzajavorek Sep 8, 2025
606e2a0
fix: edit links to the JS course
honzajavorek Sep 8, 2025
74825e7
feat: change URLs of the legacy JS course
honzajavorek Sep 8, 2025
c0c39bc
fix: leftover, correct URL for the course root
honzajavorek Sep 8, 2025
a21d603
feat: redirects from original URLs to the new JS course's lessons, wi…
honzajavorek Sep 8, 2025
212f8d0
feat: set new sidebar position
honzajavorek Sep 8, 2025
c0f6b20
feat: unlist the legacy JS course
honzajavorek Sep 8, 2025
30689dc
feat: implement and use the LegacyJsCourseAdmonition component
honzajavorek Oct 14, 2025
b666fee
fix: pretend that this is an updated example output
honzajavorek Oct 14, 2025
f801c3d
fix: update various links leading to the old course
honzajavorek Oct 15, 2025
6fbb6f3
feat: denote that the old course is old
honzajavorek Oct 15, 2025
e03b1a3
fix: do not set the old JS course as unlisted, set as noindex instead
honzajavorek Oct 15, 2025
c4e7cc7
feat: publish the new JS course
honzajavorek Oct 15, 2025
6659bdb
style: make linters happier
honzajavorek Oct 15, 2025
2b29fb6
feat: add admonition to all old course pages
honzajavorek Oct 15, 2025
aef6d08
style: make Vale happy about Gzip
honzajavorek Oct 16, 2025
e55d80f
style: make Vale happy about H1s
honzajavorek Oct 16, 2025
c610a36
style: make Vale happy about H1s
honzajavorek Oct 16, 2025
2260e8c
style: make Vale happy about H1s
honzajavorek Oct 16, 2025
62b6f41
style: make Markdown lint happy
honzajavorek Oct 16, 2025
92d189a
style: make the link more comprehensive
honzajavorek Oct 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 31 additions & 3 deletions nginx.conf
Original file line number Diff line number Diff line change
Expand Up @@ -346,11 +346,39 @@ server {

# Rename output schema to dataset schema
rewrite ^/platform/actors/development/actor-definition/output-schema$ /platform/actors/development/actor-definition/dataset-schema permanent;
rewrite ^academy/deploying-your-code/output-schema$ /academy/deploying-your-code/dataset-schema permanent;
rewrite ^/academy/deploying-your-code/output-schema$ /academy/deploying-your-code/dataset-schema permanent;

# Academy restructuring
rewrite ^academy/advanced-web-scraping/scraping-paginated-sites$ /academy/advanced-web-scraping/crawling/crawling-with-search permanent;
rewrite ^academy/php$ /academy/php/use-apify-from-php redirect; # not permanent in case we want to reuse /php in the future
rewrite ^/academy/advanced-web-scraping/scraping-paginated-sites$ /academy/advanced-web-scraping/crawling/crawling-with-search permanent;
rewrite ^/academy/php$ /academy/php/use-apify-from-php redirect; # not permanent in case we want to reuse /php in the future

# Academy: replacing the 'Web Scraping for Beginners' course
rewrite ^/academy/web-scraping-for-beginners/best-practices$ /academy/scraping-basics-javascript?legacy-js-course=/best-practices permanent;
rewrite ^/academy/web-scraping-for-beginners/introduction$ /academy/scraping-basics-javascript?legacy-js-course=/introduction permanent;
rewrite ^/academy/web-scraping-for-beginners/challenge/initializing-and-setting-up$ /academy/scraping-basics-javascript?legacy-js-course=/challenge/initializing-and-setting-up permanent;
rewrite ^/academy/web-scraping-for-beginners/challenge/modularity$ /academy/scraping-basics-javascript?legacy-js-course=/challenge/modularity permanent;
rewrite ^/academy/web-scraping-for-beginners/challenge/scraping-amazon$ /academy/scraping-basics-javascript?legacy-js-course=/challenge/scraping-amazon permanent;
rewrite ^/academy/web-scraping-for-beginners/challenge$ /academy/scraping-basics-javascript?legacy-js-course=/challenge permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/exporting-data$ /academy/scraping-basics-javascript/framework?legacy-js-course=/crawling/exporting-data permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/filtering-links$ /academy/scraping-basics-javascript/getting-links?legacy-js-course=/crawling/filtering-links permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/finding-links$ /academy/scraping-basics-javascript/getting-links?legacy-js-course=/crawling/finding-links permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/first-crawl$ /academy/scraping-basics-javascript/crawling?legacy-js-course=/crawling/first-crawl permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/headless-browser$ /academy/scraping-basics-javascript?legacy-js-course=/crawling/headless-browser permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/pro-scraping$ /academy/scraping-basics-javascript/framework?legacy-js-course=/crawling/pro-scraping permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/recap-extraction-basics$ /academy/scraping-basics-javascript/extracting-data?legacy-js-course=/crawling/recap-extraction-basics permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/relative-urls$ /academy/scraping-basics-javascript/getting-links?legacy-js-course=/crawling/relative-urls permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/scraping-the-data$ /academy/scraping-basics-javascript/scraping-variants?legacy-js-course=/crawling/scraping-the-data permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling$ /academy/scraping-basics-javascript/crawling?legacy-js-course=/crawling permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/browser-devtools$ /academy/scraping-basics-javascript/devtools-inspecting?legacy-js-course=/data-extraction/browser-devtools permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/computer-preparation$ /academy/scraping-basics-javascript/downloading-html?legacy-js-course=/data-extraction/computer-preparation permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/devtools-continued$ /academy/scraping-basics-javascript/devtools-extracting-data?legacy-js-course=/data-extraction/devtools-continued permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/node-continued$ /academy/scraping-basics-javascript/extracting-data?legacy-js-course=/data-extraction/node-continued permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/node-js-scraper$ /academy/scraping-basics-javascript/downloading-html?legacy-js-course=/data-extraction/node-js-scraper permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/project-setup$ /academy/scraping-basics-javascript/downloading-html?legacy-js-course=/data-extraction/project-setup permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/save-to-csv$ /academy/scraping-basics-javascript/saving-data?legacy-js-course=/data-extraction/save-to-csv permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/using-devtools$ /academy/scraping-basics-javascript/devtools-locating-elements?legacy-js-course=/data-extraction/using-devtools permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction$ /academy/scraping-basics-javascript/devtools-inspecting?legacy-js-course=/data-extraction permanent;
rewrite ^/academy/web-scraping-for-beginners$ /academy/scraping-basics-javascript?legacy-js-course=/ permanent;

# Removed pages
# GPT plugins were discontinued April 9th, 2024 - https://help.openai.com/en/articles/8988022-winding-down-the-chatgpt-plugins-beta
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
---
title: Robotic process automation
title: What is robotic process automation (RPA)
description: Learn the basics of robotic process automation. Make your processes on the web and other software more efficient by automating repetitive tasks.
sidebar_position: 8.7
slug: /concepts/robotic-process-automation
---

# What is robotic process automation (RPA)? {#what-is-robotic-process-automation-rpa}

**Learn the basics of robotic process automation. Make your processes on the web and other software more efficient by automating repetitive tasks.**

---
Expand All @@ -31,7 +29,7 @@ With the advance of [machine learning](https://en.wikipedia.org/wiki/Machine_lea

## Is RPA the same as web scraping? {#is-rpa-the-same-as-web-scraping}

While [web scraping](../../webscraping/scraping_basics_javascript/index.md) is a kind of RPA, it focuses on extracting structured data. RPA focuses on the other tasks in browsers - everything except for extracting information.
While web scraping is a kind of RPA, it focuses on extracting structured data. RPA focuses on the other tasks in browsers - everything except for extracting information.

## Additional resources {#additional-resources}

Expand Down
2 changes: 1 addition & 1 deletion sources/academy/glossary/tools/apify_cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The [Apify CLI](/cli) helps you create, develop, build and run Apify Actors, and

## Installing {#installing}

To install the Apfiy CLI, you'll first need npm, which comes preinstalled with Node.js. If you haven't yet installed Node, learn how to do that [here](../../webscraping/scraping_basics_javascript/data_extraction/computer_preparation.md). Additionally, make sure you've got an Apify account, as you will need to log in to the CLI to gain access to its full potential.
To install the Apfiy CLI, you'll first need npm, which comes preinstalled with Node.js. Additionally, make sure you've got an Apify account, as you will need to log in to the CLI to gain access to its full potential.

Open up a terminal instance and run the following command:

Expand Down
2 changes: 1 addition & 1 deletion sources/academy/homepage_content.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"Beginner courses": [
{
"title": "Web scraping basics with JS",
"link": "/academy/web-scraping-for-beginners",
"link": "/academy/scraping-basics-javascript",
"description": "Learn how to use JavaScript to extract information from websites in this practical course, starting from the absolute basics.",
"imageUrl": "/img/academy/intro.svg"
},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,21 +1,24 @@
---
title: I - Webhooks & advanced Actor overview
title: Webhooks & advanced Actor overview
description: Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks.
sidebar_position: 6.1
sidebar_label: I - Webhooks & advanced Actor overview
slug: /expert-scraping-with-apify/actors-webhooks
---

# Webhooks & advanced Actor overview {#webhooks-and-advanced-actors}

**Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks.**

:::caution Updates coming
This lesson is subject to change because it currently relies on code from our archived **Web scraping basics for JavaScript devs** course. For now you can still access the archived course, but we plan to completely retire it in a few months. This lesson will be updated to remove the dependency.
:::

---

Thus far, you've run Actors on the platform and written an Actor of your own, which you published to the platform yourself using the Apify CLI; therefore, it's fair to say that you are becoming more familiar and comfortable with the concept of **Actors**. Within this lesson, we'll take a more in-depth look at Actors and what they can do.

## Advanced Actor overview {#advanced-actors}

In this course, we'll be working out of the Amazon scraper project from the **Web scraping basics for JavaScript devs** course. If you haven't already built that project, you can do it in three short lessons [here](../../webscraping/scraping_basics_javascript/challenge/index.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same.
In this course, we'll be working out of the Amazon scraper project from the **Web scraping basics for JavaScript devs** course. If you haven't already built that project, you can do it in [three short lessons](../../webscraping/scraping_basics_legacy/challenge/index.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same.

Take another look at the files within your Amazon scraper project. You'll notice that there is a **Dockerfile**. Every single Actor has a Dockerfile (the Actor's **Image**) which tells Docker how to spin up a container on the Apify platform which can successfully run the Actor's code. "Apify Actors" is a serverless platform that runs multiple Docker containers. For a deeper understanding of Actor Dockerfiles, refer to the [Apify Actor Dockerfile docs](/sdk/js/docs/guides/docker-images#example-dockerfile).

Expand All @@ -41,7 +44,7 @@ Prior to moving forward, please read over these resources:

## Our task {#our-task}

In this task, we'll be building on top of what we already created in the [Web scraping basics for JavaScript devs](/academy/web-scraping-for-beginners/challenge) course's final challenge, so keep those files safe!
In this task, we'll be building on top of what we already created in the [Web scraping basics for JavaScript devs](../../webscraping/scraping_basics_legacy/challenge/index.md) course's final challenge, so keep those files safe!

Once our Amazon Actor has completed its run, we will, rather than sending an email to ourselves, call an Actor through a webhook. The Actor called will be a new Actor that we will create together, which will take the dataset ID as input, then subsequently filter through all of the results and return only the cheapest one for each product. All of the results of the Actor will be pushed to its default dataset.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
---
title: IV - Apify API & client
title: Apify API & client
description: Gain an in-depth understanding of the two main ways of programmatically interacting with the Apify platform - through the API, and through a client.
sidebar_position: 6.4
sidebar_label: IV - Apify API & client
slug: /expert-scraping-with-apify/apify-api-and-client
---

# Apify API & client {#api-and-client}

**Gain an in-depth understanding of the two main ways of programmatically interacting with the Apify platform - through the API, and through a client.**

---
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
---
title: VI - Bypassing anti-scraping methods
title: Bypassing anti-scraping methods
description: Learn about bypassing anti-scraping methods using proxies and proxy/session rotation together with Crawlee and the Apify SDK.
sidebar_position: 6.6
sidebar_label: VI - Bypassing anti-scraping methods
slug: /expert-scraping-with-apify/bypassing-anti-scraping
---

# Bypassing anti-scraping methods {#bypassing-anti-scraping-methods}

**Learn about bypassing anti-scraping methods using proxies and proxy/session rotation together with Crawlee and the Apify SDK.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,9 @@ Before developing a pro-level Apify scraper, there are some important things you

> If you've already gone through the [Web scraping basics for JavaScript devs](../../webscraping/scraping_basics_javascript/index.md) and the first courses of the [Apify platform category](../apify_platform.md), you will be more than well equipped to continue on with the lessons in this course.

<!-- ### Puppeteer/Playwright {#puppeteer-playwright}

[Puppeteer](https://pptr.dev/) is a library for running and controlling a [headless browser](../../webscraping/scraping_basics_javascript/crawling/headless_browser.md) in Node.js, and was developed at Google. The team working on it was hired by Microsoft to work on the [Playwright](https://playwright.dev/) project; therefore, many parallels can be seen between both the `puppeteer` and `playwright` packages. Proficiency in at least one of these will be good enough. -->

### Crawlee, Apify SDK, and the Apify CLI {#crawlee-apify-sdk-and-cli}

If you're feeling ambitious, you don't need to have any prior experience with Crawlee to get started with this course; however, at least 5–10 minutes of exposure is recommended. If you haven't yet tried out Crawlee, you can refer to [this lesson](../../webscraping/scraping_basics_javascript/crawling/pro_scraping.md) in the **Web scraping basics for JavaScript devs** course (and ideally follow along). To familiarize yourself with the Apify SDK, you can refer to the [Apify Platform](../apify_platform.md) category.
If you're feeling ambitious, you don't need to have any prior experience with Crawlee to get started with this course; however, at least 5–10 minutes of exposure is recommended. If you haven't yet tried out Crawlee, you can refer to the [Using a scraping framework with Node.js](../../webscraping/scraping_basics_javascript/12_framework.md) lesson of the **Web scraping basics for JavaScript devs** course. To familiarize yourself with the Apify SDK, you can refer to the [Apify Platform](../apify_platform.md) category.

The Apify CLI will play a core role in the running and testing of the Actor you will build, so if you haven't gotten it installed already, please refer to [this short lesson](../../glossary/tools/apify_cli.md).

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
---
title: II - Managing source code
title: Managing source code
description: Learn how to manage your Actor's source code more efficiently by integrating it with a GitHub repository. This is standard on the Apify platform.
sidebar_position: 6.2
sidebar_label: II - Managing source code
slug: /expert-scraping-with-apify/managing-source-code
---

# Managing source code {#managing-source-code}

**Learn how to manage your Actor's source code more efficiently by integrating it with a GitHub repository. This is standard on the Apify platform.**

---
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
---
title: V - Migrations & maintaining state
title: Migrations & maintaining state
description: Learn about what Actor migrations are and how to handle them properly so that the state is not lost and runs can safely be resurrected.
sidebar_position: 6.5
sidebar_label: V - Migrations & maintaining state
slug: /expert-scraping-with-apify/migrations-maintaining-state
---

# Migrations & maintaining state {#migrations-maintaining-state}

**Learn about what Actor migrations are and how to handle them properly so that the state is not lost and runs can safely be resurrected.**

---
Expand Down
Original file line number Diff line number Diff line change
@@ -1,19 +1,18 @@
---
title: VII - Saving useful run statistics
title: Saving useful run statistics
description: Understand how to save statistics about an Actor's run, what types of statistics you can save, and why you might want to save them for a large-scale scraper.
sidebar_position: 6.7
sidebar_label: VII - Saving useful run statistics
slug: /expert-scraping-with-apify/saving-useful-stats
---

# Saving useful run statistics {#savings-useful-run-statistics}

**Understand how to save statistics about an Actor's run, what types of statistics you can save, and why you might want to save them for a large-scale scraper.**

---

Using Crawlee and the Apify SDK, we are now able to collect and format data coming directly from websites and save it into a Key-Value store or Dataset. This is great, but sometimes, we want to store some extra data about the run itself, or about each request. We might want to store some extra general run information separately from our results or potentially include statistics about each request within its corresponding dataset item.

The types of values that are saved are totally up to you, but the most common are error scores, number of total saved items, number of request retries, number of captchas hit, etc. Storing these values is not always necessary, but can be valuable when debugging and maintaining an Actor. As your projects scale, this will become more and more useful and important.
The types of values that are saved are totally up to you, but the most common are error scores, number of total saved items, number of request retries, number of CAPTCHAs hit, etc. Storing these values is not always necessary, but can be valuable when debugging and maintaining an Actor. As your projects scale, this will become more and more useful and important.

## Learning 🧠 {#learning}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
---
title: V - Handling migrations
title: Handling migrations
description: Get real-world experience of maintaining a stateful object stored in memory, which will be persisted through migrations and even graceful aborts.
sidebar_position: 5
sidebar_label: V - Handling migrations
slug: /expert-scraping-with-apify/solutions/handling-migrations
---

# Handling migrations {#handling-migrations}

**Get real-world experience of maintaining a stateful object stored in memory, which will be persisted through migrations and even graceful aborts.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 6.7
slug: /expert-scraping-with-apify/solutions
---

# Solutions

**View all of the solutions for all of the activities and tasks of this course. Please try to complete each task on your own before reading the solution!**

---
Expand Down
Loading
Loading