Extract Text from HTML Tags

  • Share this:

Code introduction


This function extracts text from all specified tags within a given HTML string. By default, it extracts text from `<p>` tags.


Technology Stack : lxml

Code Type : Function

Code Difficulty : Intermediate


                
                    
def extract_text_from_html(html, tag='p'):
    from lxml import etree

    def extract_text(element):
        text = ''.join(element.itertext())
        return text.strip()

    tree = etree.HTML(html)
    elements = tree.xpath(f"//{tag}")
    return [extract_text(el) for el in elements]                
              
Tags: