Extract HTML Headings with BeautifulSoup

  • Share this:

Code introduction


This function takes HTML content as input, parses the HTML using the BeautifulSoup library, and extracts all heading tags (from <h1> to <h6>), then returns a dictionary containing the names of the heading tags and their text.


Technology Stack : Beautiful Soup

Code Type : Function

Code Difficulty : Intermediate


                
                    
def extract_headings(html_content):
    from bs4 import BeautifulSoup, SoupStrainer
    
    # Use SoupStrainer to parse only the <h1> to <h6> tags
    heading_strainer = SoupStrainer('h1', 'h2', 'h3', 'h4', 'h5', 'h6')
    soup = BeautifulSoup(html_content, 'html.parser', parse_only=heading_strainer)
    
    # Extract headings and their text
    headings = {tag.name: tag.get_text() for tag in soup.find_all()}
    
    return headings                
              
Tags: