Show HN: PageAgent, A GUI agent that lives inside your web app

<- Back

Show HN: PageAgent, A GUI agent that lives inside your web app

simon_luv_pho

Comments (37)

simon_luv_pho
This is highly experimental right now, but here are some quick links for anyone wanting to dig deeper:- GitHub: https://github.com/alibaba/page-agent- Live Demo (No sign-up): https://alibaba.github.io/page-agent/ (you can drag the bookmarklet from here to try it on other sites)- Browser Extension: https://chromewebstore.google.com/detail/page-agent-ext/akld...I'd be really interested in feedback on the security model of client-side agents giving extension-bridge access, and taking questions on the implementation!
dworks
Very interesting. Is this related to CoPaw and AgentScope? I think the AG-UI integration for dynamic UI would be useful here, are you using that?I'm building a web UI workspace right now where I have been planning to integrate the agent as an app or component instead of having it be the entire UI. I may fork PageAgent for that, lets see.
jasonjmcghee
Any plans to support WebMCP? https://developer.chrome.com/blog/webmcp-epp
mentalgear
> Data processed via servers in Mainland ChinaAppreciate the transparency, but maybe you could add some European (preferably) alternatives ?
pscanf
Very cool!I'm particularly impressed by the bookmark "trick" to install it on a page. Despite having spent 15 years developing for the browser, I had somehow missed that feature of the bookmarks bar. But awesome UX for people to try out the tool. Congrats!
general_reveal
I’ve been thinking about something like this. If it’s just a one line script import, how the heck are you trusting natural language to translate to commands for an arbitrary ui?The only thing I can think of is you had the AI rewrite and embed selectors on the entire build file and work with that?
dzink
Is this Affiliated with the Chinese company Alibaba? Any chance data goes there too?
arjunchint
Oh whoa, we are working in parallel on a similar angle!We just launched Rover (https://rover.rtrvr.ai/) as the first Embeddable Web Agent.Similar principles, just embed a script tag and you get an agent that can type/click/select to onboard/demo/checkout users.I tried on your website and it was reeaaaally slow. Quick question:- you are injecting numbering on to the UI. Are you taking screenshots? But I don't see any screenshots in the request being sent, what is the point of the numbering?I don't think building on browser-use is the way to go, it was the worst performing harness of all we tested [https://www.rtrvr.ai/blog/web-bench-results]. We built out our own logic to build custom Action Trees that don't require any ARIA or accessibility setup from websites.Would love to meet and trade notes, if possible (rtrvr.ai/request-demo)!
Mnexium
Curious - how does it perform with captchas and other "are you human" stuff on the web?
moehj
"Interesting architecture — embedding the agent inside the app context rather than outside it makes sense for session-aware tasks. One question: how do you handle output validation before the agent acts on the DOM? Client-side agents acting on live state without a certification layer seems like a reliability risk in production. We've been building ARU (aru-runtime.com) as a runtime certification layer for exactly this — curious if you've thought about that boundary."
coreylane
Looks cool! Are you open to adding AWS Bedrock or LiteLLM support?
MeteorMarc
Confusing name because of the existence of pageant, the putty agent.
jadbox
Firefox support?
popalchemist
Does it support long-click / click-and-drag?
jauntywundrkind
Not exactly the same but I'd also point to Paul Kinlan's FolioLM as a very interesting project in this space. A very nice browser extension,> Collect and query content from tabs, bookmarks, and history - your AI research companion. FolioLM helps you collect sources from tabs, bookmarks, and history, then query and transform that content using AI.https://github.com/PaulKinlan/NotebookLM-Chrome https://chromewebstore.google.com/detail/foliolm/eeejhgacmlh...
anon
undefined