Extract Image URLs via CSS Selector from Web Page

  • Share this:

Code introduction


This function retrieves HTML content from a specified URL and extracts all image URLs that match the given CSS selector.


Technology Stack : Scrapy, requests, HtmlResponse, Selector, CSS selector

Code Type : Scrapy third-party library custom functions

Code Difficulty : Intermediate


                
                    
def extract_images_from_url(url, selector):
    """
    Extracts image URLs from a given HTML content using a CSS selector.
    
    Args:
        url (str): The URL of the web page to scrape.
        selector (str): A CSS selector to match image tags.
    
    Returns:
        list: A list of image URLs extracted from the web page.
    """
    from scrapy import Selector
    from scrapy.http import HtmlResponse
    import requests

    response = requests.get(url)
    html_response = HtmlResponse(url, body=response.content, encoding='utf-8')
    selector = Selector(text=html_response.body)
    image_urls = selector.css(selector).xpath('@src').getall()
    
    return image_urls