Extracting Links from HTML Content using BeautifulSoup

  • Share this:

Code introduction


This function takes HTML content and a tag name as arguments, uses the beautifulsoup4 library to parse the HTML, and finds all instances of the specified tag. It then returns a list containing all the URLs of the links.


Technology Stack : beautifulsoup4

Code Type : Function

Code Difficulty : Intermediate


                
                    
def find_all_links(html_content, tag='a'):
    from bs4 import BeautifulSoup, SoupStrainer

    # Create a BeautifulSoup object with only the specified tag
    tag_filter = SoupStrainer(tag)
    soup = BeautifulSoup(html_content, 'html.parser', parse_only=tag_filter)

    # Find all the specified tag in the soup object
    links = soup.find_all(tag)

    # Extract the href attribute from each link
    return [link.get('href') for link in links if link.get('href') is not None]