Extract Links from HTML with Scrapy Selector

  • Share this:

Code introduction


This function uses Scrapy's Selector and XPath to extract all links from the given HTML content.


Technology Stack : Scrapy, Selector, XPath

Code Type : Scrapy Selector and XPath Extraction

Code Difficulty : Intermediate


                
                    
def extract_links_from_html(html_content):
    from scrapy import Selector
    from scrapy.http import HtmlResponse

    # Create a Selector object from the HTML content
    selector = Selector(text=html_content)

    # Extract all links from the HTML content
    links = selector.xpath('//a/@href').getall()

    return links