Extracting Links with Scrapy from URL

  • Share this:

Code introduction


This function uses the Scrapy library to extract all links from the given URL. It first defines a parser function that uses a CSS selector to extract links from the HTML response. Then, it creates a Scrapy spider instance and uses it to crawl the request. Finally, it returns a list of extracted links.


Technology Stack : Scrapy, Selector, HtmlResponse, scrapy.Request, scrapy.Spider

Code Type : Crawler function

Code Difficulty : Intermediate


                
                    
def extract_links_from_url(url):
    from scrapy import Selector
    from scrapy.http import HtmlResponse
    import scrapy

    def parse_selector(response):
        return response.css('a::attr(href)').getall()

    def run_spider():
        spider = scrapy.Spider('links_spider', custom_settings={'USER_AGENT': 'Mozilla/5.0'})
        request = scrapy.Request(url, callback=parse_selector)
        response = spider.crawl(request)
        return response.css('a::attr(href)').getall()

    if isinstance(url, str):
        return run_spider()
    else:
        return "Error: URL must be a string."