Extract HTML Headings with BeautifulSoup

  • Share this:

Code introduction


This function uses the BeautifulSoup library to extract headings with a specified tag name (default 'h1') from the HTML content and returns a list of strings containing the text of the extracted headings.


Technology Stack : Beautiful Soup

Code Type : Function

Code Difficulty : Intermediate


                
                    
def extract_headings(html_content, tag_name='h1'):
    """
    Extract headings from HTML content using BeautifulSoup.

    Args:
        html_content (str): The HTML content to parse.
        tag_name (str): The tag name of the headings to extract. Default is 'h1'.

    Returns:
        list: A list of strings containing the text of the extracted headings.
    """
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html_content, 'html.parser')
    headings = soup.find_all(tag_name)
    return [heading.get_text() for heading in headings]                
              
Tags: