You can download this code by clicking the button below.
This code is now available for download.
This function uses the Scrapy library to extract all links from the given URL. It first defines a parser function that uses a CSS selector to extract links from the HTML response. Then, it creates a Scrapy spider instance and uses it to crawl the request. Finally, it returns a list of extracted links.
Technology Stack : Scrapy, Selector, HtmlResponse, scrapy.Request, scrapy.Spider
Code Type : Crawler function
Code Difficulty : Intermediate
def extract_links_from_url(url):
from scrapy import Selector
from scrapy.http import HtmlResponse
import scrapy
def parse_selector(response):
return response.css('a::attr(href)').getall()
def run_spider():
spider = scrapy.Spider('links_spider', custom_settings={'USER_AGENT': 'Mozilla/5.0'})
request = scrapy.Request(url, callback=parse_selector)
response = spider.crawl(request)
return response.css('a::attr(href)').getall()
if isinstance(url, str):
return run_spider()
else:
return "Error: URL must be a string."