時間:2020-05-24來源:電腦系統城作者:電腦系統城
Selenium是一個涵蓋了一系列工具和庫的總體項目,這些工具和庫支持Web瀏覽器的自動化。并且在執行自動化時,所進行的操作會像真實用戶操作一樣。
Selenium有3個版本,分別是 Selenium 1.0、Selenium2.0、Selenium3.0;
Selenium 1.0 主要是調用JS注入到瀏覽器;最開始Selenium的作者Jason Huggins開發了JavaScriptTestRunner作為測試工具,當時向多位同事進行了展示(這個作者也是個很有趣的靈魂)。從這個測試工具的名字上可以看出,是基于JavaScript進行的測試。這個工具也就是Selenium的“前身”。
Selenium 2.0 基于 WebDriver 提供的API,進行瀏覽器的元素操作。WebDriver 是一個測試框架也可以說是一個集成的API接口庫。
Selenium 3.0 基于 Selenium 2.0 進行擴展,基本差別不大;本文將以Selenium 3.0 版本進行技術說明。
在官方介紹中介紹了有關支持瀏覽器的說明:“通過WebDriver,Selenium支持市場上所有主流瀏覽器,例如Chrom(ium),Firefox,Internet Explorer,Opera和Safari。”
安裝好環境后,簡單的使用selenium讓瀏覽器打開CSDN官網。
在環境配置時需要注意:必須把驅動給配置到系統環境,或者丟到你python的根目錄下。
首先引入 webdriver :
from selenium.webdriver import Chrome
當然也可以:
from selenium import webdriver
引入方式因人而異,之后使用不同的方法新建不同的實例。
from selenium.webdriver import Chrome
driver = Chrome()
或者
from selenium import webdriver
driver = webdriver.Chrome()
一般性的python語法將不會在下文贅述。
之前所提到,需要把驅動配置到系統環境之中,但不外乎由于其它原因導致的不能驅動路徑不能加入到系統環境中,在這里提供一個解決方法:
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'F:\python\dr\chromedriver_win32\chromedriver.exe')
這里使用 executable_path 指定驅動地址,這個地址是我驅動所存放的位置。當然這個位置可以根據自己需求制定,并且以更加靈活;本文為了更好說明,所以使用了絕對路徑傳入。
火狐瀏覽器:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://www.csdn.net")
谷歌瀏覽器:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.csdn.net")
火狐瀏覽器與谷歌瀏覽器只有實例化方法不同,其它的操作方法均一致。
在代碼最開頭引入 webdriver ,在代碼中實例化瀏覽器對象后,使用get方法請求網址,打開所需要的網址。
查看 webdriver.py 實現(from selenium import webdriver):
import warnings
from selenium.webdriver.remote.webdriver import WebDriver as RemoteWebDriver
from .remote_connection import ChromeRemoteConnection
from .service import Service
from .options import Options
class WebDriver(RemoteWebDriver):
"""
Controls the ChromeDriver and allows you to drive the browser.
You will need to download the ChromeDriver executable from
http://chromedriver.storage.googleapis.com/index.html
"""
def __init__(self, executable_path="chromedriver", port=0,
options=None, service_args=None,
desired_capabilities=None, service_log_path=None,
chrome_options=None, keep_alive=True):
"""
Creates a new instance of the chrome driver.
Starts the service and then creates new instance of chrome driver.
:Args:
- executable_path - path to the executable. If the default is used it assumes the executable is in the $PATH
- port - port you would like the service to run, if left as 0, a free port will be found.
- options - this takes an instance of ChromeOptions
- service_args - List of args to pass to the driver service
- desired_capabilities - Dictionary object with non-browser specific
capabilities only, such as "proxy" or "loggingPref".
- service_log_path - Where to log information from the driver.
- chrome_options - Deprecated argument for options
- keep_alive - Whether to configure ChromeRemoteConnection to use HTTP keep-alive.
"""
if chrome_options:
warnings.warn('use options instead of chrome_options',
DeprecationWarning, stacklevel=2)
options = chrome_options
if options is None:
# desired_capabilities stays as passed in
if desired_capabilities is None:
desired_capabilities = self.create_options().to_capabilities()
else:
if desired_capabilities is None:
desired_capabilities = options.to_capabilities()
else:
desired_capabilities.update(options.to_capabilities())
self.service = Service(
executable_path,
port=port,
service_args=service_args,
log_path=service_log_path)
self.service.start()
try:
RemoteWebDriver.__init__(
self,
command_executor=ChromeRemoteConnection(
remote_server_addr=self.service.service_url,
keep_alive=keep_alive),
desired_capabilities=desired_capabilities)
except Exception:
self.quit()
raise
self._is_remote = False
def launch_app(self, id):
"""Launches Chrome app specified by id."""
return self.execute("launchApp", {'id': id})
def get_network_conditions(self):
return self.execute("getNetworkConditions")['value']
def set_network_conditions(self, **network_conditions):
self.execute("setNetworkConditions", {
'network_conditions': network_conditions
})
def execute_cdp_cmd(self, cmd, cmd_args):
return self.execute("executeCdpCommand", {'cmd': cmd, 'params': cmd_args})['value']
def quit(self):
try:
RemoteWebDriver.quit(self)
except Exception:
# We don't care about the message because something probably has gone wrong
pass
finally:
self.service.stop()
def create_options(self):
return Options()
從注釋中表明這是 “創建chrome驅動程序的新實例,并且創建chrome驅動程序的實例”。
在此只列出本篇文章使用到的參數:
在 selenium 實現自動化過程中,必要的一步是啟動服務,查看 init初始化方法中,發現了以下代碼:
self.service = Service(
executable_path,
port=port,
service_args=service_args,
log_path=service_log_path)
self.service.start()
以上代碼實例化了Service類,并且傳入相關參數,之后啟動服務;在這里最主要的參數為 executable_path,也就是啟動驅動。查看 Service 類(selenium.service):
from selenium.webdriver.common import service
class Service(service.Service):
"""
Object that manages the starting and stopping of the ChromeDriver
"""
def __init__(self, executable_path, port=0, service_args=None,
log_path=None, env=None):
"""
Creates a new instance of the Service
:Args:
- executable_path : Path to the ChromeDriver
- port : Port the service is running on
- service_args : List of args to pass to the chromedriver service
- log_path : Path for the chromedriver service to log to"""
self.service_args = service_args or []
if log_path:
self.service_args.append('--log-path=%s' % log_path)
service.Service.__init__(self, executable_path, port=port, env=env,
start_error_message="Please see https://sites.google.com/a/chromium.org/chromedriver/home")
def command_line_args(self):
return ["--port=%d" % self.port] + self.service_args
查看基類 start 方法實現(由于基類過長不全部展出,基類在selenium.webdriver.common import service 中):
def start(self):
"""
Starts the Service.
:Exceptions:
- WebDriverException : Raised either when it can't start the service
or when it can't connect to the service
"""
try:
cmd = [self.path]
cmd.extend(self.command_line_args())
self.process = subprocess.Popen(cmd, env=self.env,
close_fds=platform.system() != 'Windows',
stdout=self.log_file,
stderr=self.log_file,
stdin=PIPE)
except TypeError:
raise
except OSError as err:
if err.errno == errno.ENOENT:
raise WebDriverException(
"'%s' executable needs to be in PATH. %s" % (
os.path.basename(self.path), self.start_error_message)
)
elif err.errno == errno.EACCES:
raise WebDriverException(
"'%s' executable may have wrong permissions. %s" % (
os.path.basename(self.path), self.start_error_message)
)
else:
raise
except Exception as e:
raise WebDriverException(
"The executable %s needs to be available in the path. %s\n%s" %
(os.path.basename(self.path), self.start_error_message, str(e)))
count = 0
while True:
self.assert_process_still_running()
if self.is_connectable():
break
count += 1
time.sleep(1)
if count == 30:
raise WebDriverException("Can not connect to the Service %s" % self.path)
其中發現:
try:
cmd = [self.path]
cmd.extend(self.command_line_args())
self.process = subprocess.Popen(cmd, env=self.env,
close_fds=platform.system() != 'Windows',
stdout=self.log_file,
stderr=self.log_file,
stdin=PIPE)
except TypeError:
raise
except OSError as err:
if err.errno == errno.ENOENT:
raise WebDriverException(
"'%s' executable needs to be in PATH. %s" % (
os.path.basename(self.path), self.start_error_message)
)
elif err.errno == errno.EACCES:
raise WebDriverException(
"'%s' executable may have wrong permissions. %s" % (
os.path.basename(self.path), self.start_error_message)
)
else:
raise
except Exception as e:
raise WebDriverException(
"The executable %s needs to be available in the path. %s\n%s" %
(os.path.basename(self.path), self.start_error_message, str(e)))
count = 0
while True:
self.assert_process_still_running()
if self.is_connectable():
break
count += 1
time.sleep(1)
if count == 30:
raise WebDriverException("Can not connect to the Service %s" % self.path)
啟動子進程開啟驅動。在出現異常時接收拋出異常并且報錯。開啟驅動打開瀏覽器。
在異常拋出檢測到此已知道了selenium如何啟動服務。接下來查看get請求網址的實現流程。
查看webdriver基類(selenium.webdriver.remote.webdriver),找到get方法:
def get(self, url):
"""
Loads a web page in the current browser session.
"""
self.execute(Command.GET, {'url': url})
def execute(self, driver_command, params=None):
"""
Sends a command to be executed by a command.CommandExecutor.
:Args:
- driver_command: The name of the command to execute as a string.
- params: A dictionary of named parameters to send with the command.
:Returns:
The command's JSON response loaded into a dictionary object.
"""
if self.session_id is not None:
if not params:
params = {'sessionId': self.session_id}
elif 'sessionId' not in params:
params['sessionId'] = self.session_id
params = self._wrap_value(params)
response = self.command_executor.execute(driver_command, params)
if response:
self.error_handler.check_response(response)
response['value'] = self._unwrap_value(
response.get('value', None))
return response
# If the server doesn't send a response, assume the command was
# a success
return {'success': 0, 'value': None, 'sessionId': self.session_id}
通過get方法得知,調用了 execute 方法,傳入了 Command.GET 與 url。
查看Command.GET的類Command(selenium.webdriver.remote.command)得知,Command為標準WebDriver命令的常量;找到GET常量:
GET = "get"
從文件上,應該是執行命令方式的類文件。
首先整理一下流程:
其中get方法具體流程:
其中 execute 的實現為:
def execute(self, driver_command, params=None):
"""
Sends a command to be executed by a command.CommandExecutor.
:Args:
- driver_command: The name of the command to execute as a string.
- params: A dictionary of named parameters to send with the command.
:Returns:
The command's JSON response loaded into a dictionary object.
"""
if self.session_id is not None:
if not params:
params = {'sessionId': self.session_id}
elif 'sessionId' not in params:
params['sessionId'] = self.session_id
params = self._wrap_value(params)
response = self.command_executor.execute(driver_command, params)
if response:
self.error_handler.check_response(response)
response['value'] = self._unwrap_value(
response.get('value', None))
return response
# If the server doesn't send a response, assume the command was
# a success
return {'success': 0, 'value': None, 'sessionId': self.session_id}
其中核心代碼為:
params = self._wrap_value(params)
response = self.command_executor.execute(driver_command, params)
if response:
self.error_handler.check_response(response)
response['value'] = self._unwrap_value(
response.get('value', None))
return response
主要查看:
self.command_executor.execute(driver_command, params)
其中 command_executor 為初始化后實例,查看派生類 webdriver(selenium import webdriver) command_executor 的實例化為:
RemoteWebDriver.__init__(
self,
command_executor=ChromeRemoteConnection(
remote_server_addr=self.service.service_url,
keep_alive=keep_alive),
desired_capabilities=desired_capabilities)
查看 ChromeRemoteConnection 類(selenium import remote_connection):
from selenium.webdriver.remote.remote_connection import RemoteConnection
class ChromeRemoteConnection(RemoteConnection):
def __init__(self, remote_server_addr, keep_alive=True):
RemoteConnection.__init__(self, remote_server_addr, keep_alive)
self._commands["launchApp"] = ('POST', '/session/$sessionId/chromium/launch_app')
self._commands["setNetworkConditions"] = ('POST', '/session/$sessionId/chromium/network_conditions')
self._commands["getNetworkConditions"] = ('GET', '/session/$sessionId/chromium/network_conditions')
self._commands['executeCdpCommand'] = ('POST', '/session/$sessionId/goog/cdp/execute')
得知調用的是基類初始化方法,查看得知 execute 方法實現為:
def execute(self, command, params):
"""
Send a command to the remote server.
Any path subtitutions required for the URL mapped to the command should be
included in the command parameters.
:Args:
- command - A string specifying the command to execute.
- params - A dictionary of named parameters to send with the command as
its JSON payload.
"""
command_info = self._commands[command]
assert command_info is not None, 'Unrecognised command %s' % command
path = string.Template(command_info[1]).substitute(params)
if hasattr(self, 'w3c') and self.w3c and isinstance(params, dict) and 'sessionId' in params:
del params['sessionId']
data = utils.dump_json(params)
url = '%s%s' % (self._url, path)
return self._request(command_info[0], url, body=data)
def _request(self, method, url, body=None):
"""
Send an HTTP request to the remote server.
:Args:
- method - A string for the HTTP method to send the request with.
- url - A string for the URL to send the request to.
- body - A string for request body. Ignored unless method is POST or PUT.
:Returns:
A dictionary with the server's parsed JSON response.
"""
LOGGER.debug('%s %s %s' % (method, url, body))
parsed_url = parse.urlparse(url)
headers = self.get_remote_connection_headers(parsed_url, self.keep_alive)
resp = None
if body and method != 'POST' and method != 'PUT':
body = None
if self.keep_alive:
resp = self._conn.request(method, url, body=body, headers=headers)
statuscode = resp.status
else:
http = urllib3.PoolManager(timeout=self._timeout)
resp = http.request(method, url, body=body, headers=headers)
statuscode = resp.status
if not hasattr(resp, 'getheader'):
if hasattr(resp.headers, 'getheader'):
resp.getheader = lambda x: resp.headers.getheader(x)
elif hasattr(resp.headers, 'get'):
resp.getheader = lambda x: resp.headers.get(x)
data = resp.data.decode('UTF-8')
try:
if 300 <= statuscode < 304:
return self._request('GET', resp.getheader('location'))
if 399 < statuscode <= 500:
return {'status': statuscode, 'value': data}
content_type = []
if resp.getheader('Content-Type') is not None:
content_type = resp.getheader('Content-Type').split(';')
if not any([x.startswith('image/png') for x in content_type]):
try:
data = utils.load_json(data.strip())
except ValueError:
if 199 < statuscode < 300:
status = ErrorCode.SUCCESS
else:
status = ErrorCode.UNKNOWN_ERROR
return {'status': status, 'value': data.strip()}
# Some of the drivers incorrectly return a response
# with no 'value' field when they should return null.
if 'value' not in data:
data['value'] = None
return data
else:
data = {'status': 0, 'value': data}
return data
finally:
LOGGER.debug("Finished Request")
resp.close()
從以上實現得知,execute 為向遠程服務器發送請求;execute中調用的_request方法為發送http請求并且返回相關結果,請求結果通過瀏覽器進行響應。
官方說明中說明了請求原理:
At its minimum, WebDriver talks to a browser through a driver.
Communication is two way: WebDriver passes commands to the browser through the driver, and receives information back via the same route.
The driver is specific to the browser, such as ChromeDriver for Google’s Chrome/Chromium, GeckoDriver for Mozilla’s Firefox, etc. Thedriver runs on the same system as the browser. This may, or may not be, the same system where the tests themselves are executing.
This simple example above is direct communication. Communication to the browser may also be remote communication through Selenium Server or RemoteWebDriver. RemoteWebDriver runs on the same system as the driver and the browser.
言而總之我們通過webdriver與瀏覽器進行對話,從而瀏覽器進行響應。
通過以上實例得知,使用 execute 向遠程服務器發送請求會通過 webdriver 與瀏覽器交互,且發送已定義的命令常量可獲得一些相關信息。
由于在代碼中我們實例的是 webdriver 實例,去 webdriver基類(selenium.webdriver.remote.webdriver)中查詢相關信息,是否有相關函數可以獲取信息。發現以下函數:
def title(self):
"""Returns the title of the current page.
:Usage:
title = driver.title
"""
resp = self.execute(Command.GET_TITLE)
return resp['value'] if resp['value'] is not None else ""
@property
def current_url(self):
"""
Gets the URL of the current page.
:Usage:
driver.current_url
"""
return self.execute(Command.GET_CURRENT_URL)['value']
@property
def page_source(self):
"""
Gets the source of the current page.
:Usage:
driver.page_source
"""
return self.execute(Command.GET_PAGE_SOURCE)['value']
以上并沒有列全,我們簡單的嘗試以上函數的使用方法,使用方法在函數中已經說明。嘗試獲取 title(標題)、current_url(當前url)、page_source(網頁源代碼):
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.csdn.net")
print(driver.title)
print(driver.current_url)
print('作者博客:https://blog.csdn.net/A757291228')
#支持原創,轉載請貼上原文鏈接
# print(driver.page_source)
結果成功獲取到網頁標題以及當前網址:
試試 page_source:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.csdn.net")
print(driver.title)
print(driver.current_url)
print('作者博客:https://blog.csdn.net/A757291228')
#支持原創,轉載請貼上鏈接
print(driver.page_source)
成功獲?。?br data-filtered="filtered" />
原創不易,看到這里點個贊支持一下唄!謝謝
2022-03-01
PHP如何從txt文件中讀取數據詳解2022-03-01
分享5個方便好用的Python自動化腳本2021-03-29
Python中pycharm編輯器界面風格修改方法