一、selenium库
主体分为两大目录webdriver:主要用于浏览器的驱动创建以及操作,common目录下主要用于selenium的异常抛出,下面列出常用的操作
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains # 用于进行部分操作使用
from selenium.webdriver.common.keys import Keys # 用于模拟键鼠操作
from selenium.webdriver.common.by import By # 用于模拟各种By操作
from selenium.webdriver.support.ui import WebDriverWait # 用于设置元素等待
from selenium.webdriver.support import expected_conditions # 用于判断函数
import selenium.common.exceptions as sce
二、common目录
目录下只有一个exceptions以及init文件
exceptions的介绍只有一句,点名了这个模块的作用就是用于抛出异常
"""
Exceptions that may happen in all the webdriver code.
"""
基本错误为WebDriverException,继承父类为Exception
class WebDriverException(Exception):
"""
Base webdriver exception.
"""
def __init__(self, msg=None, screen=None, stacktrace=None):
self.msg = msg
self.screen = screen
self.stacktrace = stacktrace
def __str__(self):
exception_msg = "Message: %s\n" % self.msg
if self.screen is not None:
exception_msg += "Screenshot: available via screen\n"
if self.stacktrace is not None:
stacktrace = "\n".join(self.stacktrace)
exception_msg += "Stacktrace:\n%s" % stacktrace
return exception_msg
其余错误类均为WebDriverException的子类或者子类的子类。。。
三、webdriver目录
__init__.py标注了部分可以直接导入的模块,涉及大部分模块均差不多,这里以chrome做记录以及common的操作模块
from .firefox.webdriver import WebDriver as Firefox # noqa
from .firefox.firefox_profile import FirefoxProfile # noqa
from .firefox.options import Options as FirefoxOptions # noqa
from .chrome.webdriver import WebDriver as Chrome # noqa
from .chrome.options import Options as ChromeOptions # noqa
from .ie.webdriver import WebDriver as Ie # noqa
from .ie.options import Options as IeOptions # noqa
from .edge.webdriver import WebDriver as Edge # noqa
from .opera.webdriver import WebDriver as Opera # noqa
from .safari.webdriver import WebDriver as Safari # noqa
from .blackberry.webdriver import WebDriver as BlackBerry # noqa
from .phantomjs.webdriver import WebDriver as PhantomJS # noqa
from .android.webdriver import WebDriver as Android # noqa
from .webkitgtk.webdriver import WebDriver as WebKitGTK # noqa
from .webkitgtk.options import Options as WebKitGTKOptions # noqa
from .remote.webdriver import WebDriver as Remote # noqa
from .common.desired_capabilities import DesiredCapabilities # noqa
from .common.action_chains import ActionChains # noqa
from .common.touch_actions import TouchActions # noqa
from .common.proxy import Proxy # noqa
chrome文件夹下包含五个文件
1.webdriver
用于控制浏览器并进行对应操作,创建时继承大类为RemoteWebDriver,该库后续进行介绍,默认参数有以下:
- executable_path - 可执行文件的路径。如果使用默认值,则假定可执行文件位于$PATH中
- port - 希望服务运行的端口,如果设置为0,将找到一个空闲端口。
- options - 接受chrome支持的选项设置
- service_args - 要传递给驱动服务的参数列表
- desired_capabilities - 仅具有非浏览器特定功能的字典对象,例如"proxy"或"loggingPref"。
- service_log_path - 日志目录
- chrome_options - 已经被弃用,目前使用options
if chrome_options:
warnings.warn('use options instead of chrome_options',
DeprecationWarning, stacklevel=2)
options = chrome_options
- keep_alive - 是否配置ChromeRemoteConnection使用HTTP保持连接。
options传递进来的时候也会进行一系列的处理,实际上我们最后在使用的时候并非使用options,而是通过desired_capabilities
if options is None:
# desired_capabilities stays as passed in
if desired_capabilities is None:
desired_capabilities = self.create_options().to_capabilities()
else:
if desired_capabilities is None:
desired_capabilities = options.to_capabilities()
else:
desired_capabilities.update(options.to_capabilities())
Service主要控制驱动开启或者停止,后续进行介绍
service = Service(
executable_path,
port=port,
service_args=service_args,
log_path=service_log_path)
service.start()
ChromeRemoteConnection主要用于远程控制,后续进行介绍
try:
RemoteWebDriver.__init__(
self,
command_executor=ChromeRemoteConnection(
remote_server_addr=self.service.service_url,
keep_alive=keep_alive),
desired_capabilities=desired_capabilities)
except Exception:
self.quit()
raise
(1)库里常用的为quit()函数,用于退出,实际上在调用这个quit函数时也是调用的RemoteWebDriver的quit()
def quit(self):
"""
Closes the browser and shuts down the ChromeDriver executable
that is started when starting the ChromeDriver
"""
try:
RemoteWebDriver.quit(self)
except Exception:
# We don't care about the message because something probably has gone wrong
pass
finally:
self.service.stop()
(2)create_options函数用于返回一个Options对应,关于Options对象后续进行介绍
def create_options(self):
return Options()
2.options模块
主要用于返回一个浏览器选项对象,默认参数有以下
self._binary_location = '' # 二进制文件的路径
self._arguments = [] # 浏览器的参数,列表形式
self._extension_files = [] # 浏览器的扩展文件
self._extensions = [] # 浏览器的编码扩展列表
self._experimental_options = {} # 浏览器的其他选项,非args等
self._debugger_address = None # 浏览器的debug地址
self._caps = DesiredCapabilities.CHROME.copy() # 浏览器的参数字典
默认如下:
CHROME = {
"browserName": "chrome",
"version": "",
"platform": "ANY",
}
类中函数分为三种:带@property装饰器的函数、变量操作函数
(1)带@property装饰器的函数,主要用于返回默认参数的值,因为默认属性均为self._xxx,在python中这表示为私有属性(虽然外部仍然可以访问),所以这个使用通过这类函数用不带_开头的名称表示对应的属性,这样在调用的时候就可以是instancename.xxx而不是instancename._xxx
def binary_location(self):
"""
Returns the location of the binary otherwise an empty string
"""
return self._binary_location
(2)变量的操作函数
按照列表和字典的操作区分,实际等同于外部对字典进行update,对列表进行append
def set_capability(self, name, value):
"""Sets a capability."""
self._caps[name] = value
def add_argument(self, argument):
"""
Adds an argument to the list
:Args:
- Sets the arguments
"""
if argument:
self._arguments.append(argument)
else:
raise ValueError("argument can not be null")
其中有一个headless,他并非类默认变量,只有当调用add_argument传入'--headless'或者调用
同时headless为可读可写(增加了@headless.setter,如果单纯使用@property则表示只读)
@property
def headless(self):
"""
Returns whether or not the headless argument is set
"""
return '--headless' in self._arguments
@headless.setter
def headless(self, value):
"""
Sets the headless argument
Args:
value: boolean value indicating to set the headless option
"""
args = {'--headless'}
if platform.system().lower() == 'windows':
args.add('--disable-gpu')
if value is True:
self._arguments.extend(args)
else:
self._arguments = list(set(self._arguments) - args)
def set_headless(self, headless=True):
""" Deprecated, options.headless = True """
warnings.warn('use setter for headless property instead of set_headless',
DeprecationWarning, stacklevel=2)
self.headless = headless
to_capabilities则是webDriver中使用的将options格式化的函数,最后返回一个字典
def to_capabilities(self):
"""
Creates a capabilities with all the options that have been set and
returns a dictionary with everything
"""
caps = self._caps
chrome_options = self.experimental_options.copy()
chrome_options["extensions"] = self.extensions
if self.binary_location:
chrome_options["binary"] = self.binary_location
chrome_options["args"] = self.arguments
if self.debugger_address:
chrome_options["debuggerAddress"] = self.debugger_address
caps[self.KEY] = chrome_options
return caps
3.remote_connection模块
ChromeRemoteConnection类,继承RemoteWebDriver,后续进行介绍
def __init__(self, remote_server_addr, keep_alive=True):
RemoteConnection.__init__(self, remote_server_addr, keep_alive)
self._commands["launchApp"] = ('POST', '/session/$sessionId/chromium/launch_app')
self._commands["setNetworkConditions"] = ('POST', '/session/$sessionId/chromium/network_conditions')
self._commands["getNetworkConditions"] = ('GET', '/session/$sessionId/chromium/network_conditions')
self._commands['executeCdpCommand'] = ('POST', '/session/$sessionId/goog/cdp/execute')
4.service模块
Service模块在webDriver模块中提到过,主要控制驱动开启或者停止,它继承selenium的service大类,后续进行介绍,默认接受参数
- executable_path : 浏览器驱动的执行路径
- port : 服务端口,与控制模块默认一致
- service_args : 浏览器驱动对应的参数
- log_path : 执行过程中的日志路径
init函数中近进行service的初始化
self.service_args = service_args or []
if log_path:
self.service_args.append('--log-path=%s' % log_path)
service.Service.__init__(self, executable_path, port=port, env=env,
start_error_message="Please see https://sites.google.com/a/chromium.org/chromedriver/home")