Extract HTML Headings with BeautifulSoup

2024-12-16 12:11:56 4 Views

Code introduction

This function takes HTML content as input, parses the HTML using the BeautifulSoup library, and extracts all heading tags (from <h1> to <h6>), then returns a dictionary containing the names of the heading tags and their text.

Technology Stack : Beautiful Soup

Code Type : Function

Code Difficulty : Intermediate

                
                    
def extract_headings(html_content):
    from bs4 import BeautifulSoup, SoupStrainer
    
    # Use SoupStrainer to parse only the <h1> to <h6> tags
    heading_strainer = SoupStrainer('h1', 'h2', 'h3', 'h4', 'h5', 'h6')
    soup = BeautifulSoup(html_content, 'html.parser', parse_only=heading_strainer)
    
    # Extract headings and their text
    headings = {tag.name: tag.get_text() for tag in soup.find_all()}
    
    return headings

Tags: Beautiful Soup

Enhanced Zip Function with Fillvalue Support

2024-11-30 15:01:30 201 views
Shuffling List Elements with Random.sample

2024-11-30 15:01:34 180 views
Merging and Sorting Two Lists Function

2024-11-30 15:01:36 177 views

Extracting Links from HTML with BeautifulSoup

2024-12-07 16:29:31 139 views
Extract Links from HTML by Tag and Class

2024-12-07 16:27:36 89 views
Extract HTML Headings

2024-12-16 12:17:48 48 views
Extract Links from HTML with BeautifulSoup

2024-12-16 12:16:28 44 views
Extract <h1> Titles from Web Pages

2024-12-16 12:17:12 43 views
Extract and Convert HTML Links to Absolute URLs

2024-12-16 12:13:54 42 views
Fetch and Extract HTML Title from URL

2024-12-16 12:16:09 41 views
Extract Hrefs from URL by Tag and Class

2024-12-16 12:17:05 39 views
Extract Headings from URL with BeautifulSoup

2024-12-16 12:16:38 39 views
Extracting Web Page Titles from URLs

2024-12-16 12:15:04 37 views