Internet Explorer 对象

Set createInternetExplorerObject = CreateObject("InternetExplorer.Application")

工具>参考> Microsoft Internet Controls
相关 DLL:ieframe.dll
源:Internet Explorer 浏览器

MSDN-InternetExplorer 对象

通过自动化控制 Windows Internet Explorer 的实例。

Internet Explorer Objec 基本成员

下面的代码应该介绍 IE 对象的工作原理以及如何通过 VBA 对其进行操作。我建议单步执行它,否则在多次导航期间可能会出错。

Sub IEGetToKnow()
    Dim IE As InternetExplorer 'Reference to Microsoft Internet Controls
    Set IE = New InternetExplorer
    
    With IE
        .Visible = True 'Sets or gets a value that indicates whether the object is visible or hidden.
        
        'Navigation
        .Navigate2 "http://www.example.com" 'Navigates the browser to a location that might not be expressed as a URL, such as a PIDL for an entity in the Windows Shell namespace.
        Debug.Print .Busy 'Gets a value that indicates whether the object is engaged in a navigation or downloading operation.
        Debug.Print .ReadyState 'Gets the ready state of the object.
        .Navigate2 "http://www.example.com/2"
        .GoBack 'Navigates backward one item in the history list
        .GoForward 'Navigates forward one item in the history list.
        .GoHome 'Navigates to the current home or start page.
        .Stop 'Cancels a pending navigation or download, and stops dynamic page elements, such as background sounds and animations.
        .Refresh 'Reloads the file that is currently displayed in the object.
        
        Debug.Print .Silent 'Sets or gets a value that indicates whether the object can display dialog boxes.
        Debug.Print .Type 'Gets the user type name of the contained document object.
        
        Debug.Print .Top 'Sets or gets the coordinate of the top edge of the object.
        Debug.Print .Left 'Sets or gets the coordinate of the left edge of the object.
        Debug.Print .Height 'Sets or gets the height of the object.
        Debug.Print .Width 'Sets or gets the width of the object.
    End With
    
    IE.Quit 'close the application window
End Sub

网页搜罗

与 IE 最常见的事情是抓取网站的一些信息,或填写网站表格并提交信息。我们将看看如何做到这一点。

让我们考虑一下 example.com 的 源代码:

<!doctype html>
<html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
        <style ... </style> 
    </head>

    <body>
        <div>
            <h1>Example Domain</h1>
            <p>This domain is established to be used for illustrative examples in documents. You may use this
            domain in examples without prior coordination or asking for permission.</p>
            <p><a href="http://www.iana.org/domains/example">More information...</a></p>
        </div>
    </body>
</html>

我们可以使用下面的代码来获取和设置信息:

Sub IEWebScrape1()
    Dim IE As InternetExplorer 'Reference to Microsoft Internet Controls
    Set IE = New InternetExplorer
    
    With IE
        .Visible = True
        .Navigate2 "http://www.example.com"
        
        'we add a loop to be sure the website is loaded and ready.
        'Does not work consistently. Cannot be relied upon.
        Do While .Busy = True Or .ReadyState <> READYSTATE_COMPLETE 'Equivalent = .ReadyState <> 4
            ' DoEvents - worth considering. Know implications before you use it.
            Application.Wait (Now + TimeValue("00:00:01")) 'Wait 1 second, then check again.
        Loop
        
        'Print info in immediate window
        With .Document 'the source code HTML "below" the displayed page.
            Stop 'VBE Stop. Continue line by line to see what happens.
            Debug.Print .GetElementsByTagName("title")(0).innerHtml 'prints "Example Domain"
            Debug.Print .GetElementsByTagName("h1")(0).innerHtml 'prints "Example Domain"
            Debug.Print .GetElementsByTagName("p")(0).innerHtml 'prints "This domain is established..."
            Debug.Print .GetElementsByTagName("p")(1).innerHtml 'prints "<a href="http://www.iana.org/domains/example">More information...</a>"
            Debug.Print .GetElementsByTagName("p")(1).innerText 'prints "More information..."
            Debug.Print .GetElementsByTagName("a")(0).innerText 'prints "More information..."
            
            'We can change the localy displayed website. Don't worry about breaking the site.
            .GetElementsByTagName("title")(0).innerHtml = "Psst, scraping..."
            .GetElementsByTagName("h1")(0).innerHtml = "Let me try something fishy." 'You have just changed the local HTML of the site.
            .GetElementsByTagName("p")(0).innerHtml = "Lorem ipsum........... The End"
            .GetElementsByTagName("a")(0).innerText = "iana.org"
        End With '.document
        
        .Quit 'close the application window
    End With 'ie
    
End Sub

到底是怎么回事?这里的关键角色是 .Document ,即 HTML 源代码。我们可以应用一些查询来获取我们想要的集合或对象。
例如 IE.Document.GetElementsByTagName("title")(0).innerHtmlGetElementsByTagName 返回一个 HTML 元素集合,其中包含“ title ”标记。源代码中只有一个这样的标记。该集合从 0 开始。因此,要获得第一个元素,我们添加 (0)。现在,在我们的例子中,我们只想要 innerHtml(一个 String),而不是 Element Object 本身。所以我们指定我们想要的属性。

点击

要关注网站上的链接,我们可以使用多种方法:

Sub IEGoToPlaces()
    Dim IE As InternetExplorer 'Reference to Microsoft Internet Controls
    Set IE = New InternetExplorer
    
    With IE
        .Visible = True
        .Navigate2 "http://www.example.com"
        Stop 'VBE Stop. Continue line by line to see what happens.
        
        'Click
        .Document.GetElementsByTagName("a")(0).Click
        Stop 'VBE Stop.
        
        'Return Back
        .GoBack
        Stop 'VBE Stop.
        
        'Navigate using the href attribute in the <a> tag, or "link"
        .Navigate2 .Document.GetElementsByTagName("a")(0).href
        Stop 'VBE Stop.
        
        .Quit 'close the application window
    End With
End Sub

Microsoft HTML 对象库或 IE 最好的朋友

为了充分利用加载到 IE 中的 HTML,你可以(或应该)使用另一个库,即 Microsoft HTML Object Library 。在另一个例子中更多关于这个。

IE 主要问题

IE 的主要问题是验证页面是否已完成加载并准备好与之交互。Do While... Loop 有帮助,但不可靠。

此外,使用 IE 只是为了刮取 HTML 内容是 OVERKILL。为什么?因为浏览器是用于浏览的,即显示包含所有 CSS,JavaScripts,图片,弹出窗口等的网页。如果你只需要原始数据,请考虑不同的方法。例如,使用 XML HTTPRequest 。在另一个例子中更多关于这个。