AI

💻 Browser Use

browser-use

AI控制浏览器,访问网站,总结信息,并给出结论。
支持多种模型,如gpt-4o,deepseek-r1等。

installation

$ pip install browser-use

# install playwright
$ playwright install

pip安装包的时候有报错,如下:

(.venv)   browser-use-demo pip install browser-use -i https://pypi.org/simple
ERROR: Could not find a version that satisfies the requirement browser-use (from versions: none)
ERROR: No matching distribution found for browser-use

# 在pypi中查询知道:browser-use需要Python>=3.11,这里对应提高下python版本后重试
# 这一步安装,多少需要点时间..

Demo

api key: 创建.env文件,添加OPENAI_API_KEY。 API Key参见:OpenAI API Key

这里readme中给的demo如下:

from langchain_openai import ChatOpenAI
from browser_use import Agent
import asyncio
from dotenv import load_dotenv
load_dotenv()

async def main():
    agent = Agent(
        task="Compare the price of gpt-4o and DeepSeek-V3",
        llm=ChatOpenAI(model="gpt-4o"),
    )
    await agent.run()

asyncio.run(main())

使用的是gpt-4o模型,但是我没有api key..因此修改成为deepseek-r1模型。
具体支持的模型可以参考:supported-models

# agent-deepseek-r1.py
from langchain_openai import ChatOpenAI
from browser_use import Agent
from pydantic import SecretStr
import asyncio


# Initialize the model
api_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxx"
llm=ChatOpenAI(base_url='https://api.deepseek.com/v1', model='deepseek-chat', api_key=SecretStr(api_key))

# Create agent with the model
async def main():
    agent = Agent(
        task="Can you compare the icicibank and hdfcbank fundamental from screener.in?",
        llm=llm,
        use_vision=False
    )
    await agent.run()
asyncio.run(main())

运行:python3 agent-deepseek-r1.py,会使playwright打开浏览器,开始执行task,如下为一张截图: browser-use-deepseek-r1

等待一段时间的执行,最后会输出结果,如下: browser-use-deepseek-r1-demo-result

如上,一个完整的task执行完成。(浏览器执行的过程有点子慢..😂)

这里又问了一个问题:Compare the differences of playwright and selenium?
浏览器的过程如下: browser-use-demo2-1

browser-use-demo2-2

结果如下,从控制台复制,自行评价:

...
...
INFO     [agent] 📄 Result: Here are the key differences between Playwright and Selenium:
1. **Speed and Performance**: Playwright is often chosen for speed and offers significantly better performance than Selenium.
2. **Learning Curve**: Playwright has an easier learning curve and provides value faster compared to Selenium.
3. **Browser Support**: Playwright supports Chromium, Firefox, and WebKit browsers, while Selenium offers a wide variety of browser support including Chrome, Firefox, IE, Edge, Opera, and more.
4. **Debugging Capabilities**: Playwright has more advanced debugging capabilities compared to Selenium.
5. **Community and Ecosystem**: Selenium has a broader support and established ecosystem, while Playwright has limited community support due to its recent entry into the market.
6. **Setup and Parallel Execution**: Playwright has easier setup with built-in parallelism, whereas Selenium requires more setup, especially for parallel execution.
7. **Modern Features**: Playwright is a newer, open-source tool developed by Microsoft, while Selenium is an open-source tool that has been in the industry for a long time.
INFO     [agent] ✅ Task completed
INFO     [agent] ✅ Successfully

Reference

Github: browser-use