You can use this document to learn how to prevent a page from being indexed in Google search results.
To accomplish this, you can include a 'noindex' header in HTTP Response headers via EdgeRules.
noindexdirective to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the
noindexdirective, and the page can still appear in search results, for example, if other pages link to it.
To learn more, review this article from Google.
Step 1: Create a rule
- In the StackPath Control Portal, in the left-side navigation, click Sites.
- Locate and select the desired site.
- In the left-side navigation, click Edge Rules.
- Click Edge Rules.
- Scroll down to Delivery Rules, and then click Add Delivery Rule.
- In Rule Name, enter a descriptive name for the rule.
- Create and save the following EdgeRule:
IF URL Matches Regular Expression /\.(?.pdf|htm?l|jpe?g|gif)$/ THEN Add Response Header X-Robots-Tag noindex
This rule can be applied to any number of file types by prefixing them with a ? to indicate a wildcard location, and | to separate them. In this example, the regex pattern
/\.(?.pdf|htm?l|jpe?g|gif)$/will apply this rule to .pdf, .html, .jpg, .jpeg, and .gif file types.
Step 2: Verify the rule
The newly created EdgeRule will add a public response to the CDN response. To verify this rule, review the headers from the CDN assets via browser inspect tools or cURL requests.
- For browser tools, right-click anywhere on your website, select Inspect or Inspect Element, and then navigation to the Network tab.
- If you do not see any requests, then refresh the page with the tab open to view them as they come in.
- Select an asset coming from the CDN from the tab to reveal the headers associated with the request. StackPath x-robots-tag header should appear here.
- For cURL requests, use the option
-Ito show the headers appropriately.
- Review the following example with the x-robots-tag header highlighted.