Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was looking for something similar recently and had found CogAgent[0] that looks quite interesting, has anyone tried anything similar?

0. https://github.com/THUDM/CogVLM?tab=readme-ov-file#gui-agent...



I haven't read through it yet, but there's FerretUI from Apple (mobile-specific, but I think a lot of learnings are generic) https://arxiv.org/abs/2404.05719




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: