Extract Headings from URL with BeautifulSoup

2024-12-07 15:47:49 8 Views

Code introduction

This function takes a URL and a tag type as input, then sends an HTTP request to the URL using the requests library and parses the returned HTML content using beautifulsoup4. It finds all headings of the specified tag type and returns a list of the text of these headings.

Technology Stack : beautifulsoup4, requests, HTML, HTTP

Code Type : Function

Code Difficulty : Intermediate

                
                    
def extract_headings(url, tag='h1', parser='html.parser'):
    from bs4 import BeautifulSoup
    import requests

    # Send a request to the URL
    response = requests.get(url)
    # Parse the HTML content
    soup = BeautifulSoup(response.text, parser)
    # Find all headings of the specified tag
    headings = soup.find_all(tag)
    # Return a list of headings text
    return [heading.get_text() for heading in headings]