xpath获取标签和子标签下的文字内容

时间：12-10来源：作者：点击数：

本文展示了如何使用Python的lxml库解析HTML，提取`<li>`元素的第二个`<code>children</code>`节点，并演示了去除多余空白字符的方法。重点在于DOM操作和字符串处理技巧。

from lxml import etree

html = """
    <ol>
        <li class="">直接选取<code>ul</code>的第二个<code>children</code></li>
    </ol>
"""

data = etree.HTML(html)
result = data.xpath('string(//ol/li)')
print(result) # 直接选取ul的第二个children

from lxml import etree

# 如果li换行 则result结尾会有\n
html = """
    <ol>
        <li class="">直接选取<code>ul</code>的第二个<code>children</code>
        </li>
    </ol>
"""

data = etree.HTML(html)
result = data.xpath('string(//ol/li)')
print(result) # 直接选取ul的第二个children\n
# 通过normalize-space去掉\n
result = data.xpath('normalize-space(string(//ol/li))')
print(result) # 直接选取ul的第二个children

方便获取更多学习、工作、生活信息请关注本站微信公众号 城东书院微信服务号

来顶一下

返回首页

上一篇:python通过xpath读取html指定的内容并获取该部分html代码下一篇:定位网页上某个按钮时，总是报错元素定位不到：NoSuchElementException: Message: no such element: Unable to locate elemen

如何快速了解一个行业	InDesign入门教程\|半
Vmware虚拟机三种网络	Git的安装与卸载详细