Need help?
<- Back

Comments (37)

  • simon_luv_pho
    This is highly experimental right now, but here are some quick links for anyone wanting to dig deeper:- GitHub: https://github.com/alibaba/page-agent- Live Demo (No sign-up): https://alibaba.github.io/page-agent/ (you can drag the bookmarklet from here to try it on other sites)- Browser Extension: https://chromewebstore.google.com/detail/page-agent-ext/akld...I'd be really interested in feedback on the security model of client-side agents giving extension-bridge access, and taking questions on the implementation!
  • dworks
    Very interesting. Is this related to CoPaw and AgentScope? I think the AG-UI integration for dynamic UI would be useful here, are you using that?I'm building a web UI workspace right now where I have been planning to integrate the agent as an app or component instead of having it be the entire UI. I may fork PageAgent for that, lets see.
  • jasonjmcghee
  • mentalgear
    > Data processed via servers in Mainland ChinaAppreciate the transparency, but maybe you could add some European (preferably) alternatives ?
  • pscanf
    Very cool!I'm particularly impressed by the bookmark "trick" to install it on a page. Despite having spent 15 years developing for the browser, I had somehow missed that feature of the bookmarks bar. But awesome UX for people to try out the tool. Congrats!
  • general_reveal
    I’ve been thinking about something like this. If it’s just a one line script import, how the heck are you trusting natural language to translate to commands for an arbitrary ui?The only thing I can think of is you had the AI rewrite and embed selectors on the entire build file and work with that?
  • dzink
    Is this Affiliated with the Chinese company Alibaba? Any chance data goes there too?
  • arjunchint
    Oh whoa, we are working in parallel on a similar angle!We just launched Rover (https://rover.rtrvr.ai/) as the first Embeddable Web Agent.Similar principles, just embed a script tag and you get an agent that can type/click/select to onboard/demo/checkout users.I tried on your website and it was reeaaaally slow. Quick question:- you are injecting numbering on to the UI. Are you taking screenshots? But I don't see any screenshots in the request being sent, what is the point of the numbering?I don't think building on browser-use is the way to go, it was the worst performing harness of all we tested [https://www.rtrvr.ai/blog/web-bench-results]. We built out our own logic to build custom Action Trees that don't require any ARIA or accessibility setup from websites.Would love to meet and trade notes, if possible (rtrvr.ai/request-demo)!
  • Mnexium
    Curious - how does it perform with captchas and other "are you human" stuff on the web?
  • moehj
    "Interesting architecture — embedding the agent inside the app context rather than outside it makes sense for session-aware tasks. One question: how do you handle output validation before the agent acts on the DOM? Client-side agents acting on live state without a certification layer seems like a reliability risk in production. We've been building ARU (aru-runtime.com) as a runtime certification layer for exactly this — curious if you've thought about that boundary."
  • coreylane
    Looks cool! Are you open to adding AWS Bedrock or LiteLLM support?
  • MeteorMarc
    Confusing name because of the existence of pageant, the putty agent.
  • jadbox
    Firefox support?
  • popalchemist
    Does it support long-click / click-and-drag?
  • jauntywundrkind
    Not exactly the same but I'd also point to Paul Kinlan's FolioLM as a very interesting project in this space. A very nice browser extension,> Collect and query content from tabs, bookmarks, and history - your AI research companion. FolioLM helps you collect sources from tabs, bookmarks, and history, then query and transform that content using AI.https://github.com/PaulKinlan/NotebookLM-Chrome https://chromewebstore.google.com/detail/foliolm/eeejhgacmlh...
  • anon
    undefined