More

julius · 2026-05-05T19:33:45 1778009625

Click coordinates. Agentic GUI is really annoying when the multi-modal agent cannot click on x,y coordinates.

I tested Qwen3.6, Gemma4, Nemotron3-nano-omni. They fully hallucinate x,y coords. (did not try GLM-5V yet)

GPT-5.5 can easily do it. But also Vocaela, a tiny 500M model, is quite good at it. Hope they improve the training for x,y clicking soon on the smallish multi-modals.

Recently slopped a http service together just so my local models can click, instead of relying on all the wild ways agents currently hack into the browser (browser-use, browser-harness, agent-browser, dev-browser etc) https://github.com/julius/vocaela-click-coords-http

cyanydeez · 2026-05-05T19:40:54 1778010054

This sounds a lot like another hacker news posted in the last few days. The same problem image generators have with a prompt like, produce numbers 1-50 in a spiral pattern and it can't count properly. But if you break it into a raster/vector where you have it first produce the visual content and then a SVG overlay, it's completely capable.

Have you tried doing a two step: review the image, then render a vector?

julius · 2026-05-05T19:50:58 1778010658

Maybe there is a smart trick to get them to do the right thing, but the things I tried did not work.

At one point I had some smaller model draw bounding boxes around everything that looked interactable and labels like "e3" ... then asked the model to tell me "click on e3". Did not work in my tests was pretty much as bad as x,y.

cyanydeez · 2026-05-05T20:48:23 1778014103

Yeah, I've held off on doing any kind of rag till there's models that properly handle layout detection and partitioning because it's so easy to generate shitty data if you're not properly attending to visual cues first before you slice up a document.

lopuhin · 2026-05-05T22:00:43 1778018443

Qwen3.5 is able to output click coordinates and bounding boxes just fine, as values normalized to 0..1000, I’d hope Qwen3.6 didn’t loose this capability.

withinrafael · 2026-05-05T21:39:10 1778017150

I've had lots of success with generating coordinates and answering questions using the UI-TARS model https://github.com/bytedance/UI-TARS.

theturtletalks · 2026-05-06T00:09:04 1778026144

I’d also checkout midscene, you can set the model and UI-TARS works but you can also use qwen vision models and it works.

julius · 2026-02-06T19:34:45 1770406485

Super cool. Brave support by any chance? Using Linux, it found my Chrome, but thats not my primary browser.

toborrm9 · 2026-02-07T10:00:59 1770458459

Yes i'm working on it

julius · 2025-08-03T12:23:01 1754223781

Less information loss -> Less params? Please correct me if I got this wrong. The Intro claims:

"The dot product itself is a geometrically impoverished measure, primarily capturing alignment while conflating magnitude with direction and often obscuring more complex structural and spatial relationships [10, 11, 4, 61, 17]. Furthermore, the way current activation functions achieve non-linearity can exacerbate this issue. For instance, ReLU (f (x) = max(0, x)) maps all negative pre-activations, which can signify a spectrum of relationships from weak dissimilarity to strong anti-alignment, to a single zero output. This thresholding, while promoting sparsity, means the network treats diverse inputs as uniformly orthogonal or linearly independent for onward signal propagation. Such a coarse-graining of geometric relationships leads to a tangible loss of information regarding the degree and nature of anti-alignment or other neg- ative linear dependencies. This information loss, coupled with the inherent limitations of the dot product, highlights a fundamental challenge."

mlnomadpy · 2025-08-10T19:45:33 1754855133

yes, since you can learn to represent the same problem with less amount of params, however most of the architectures are optimized for the linear product, so we gotta figure out a new architecture for it

julius · 2025-07-30T14:10:13 1753884613

Lots of people who have relatively stable currencies (EUR, USD..) do not want to use bitcoin. What if bitcoin price goes down? How many extra steps is it to convert my USD to bitcoin and then back to USD? Do I only convert the 19.99 USD for my current purchase into bitcoin or do I put in more?

Do you solve these issues for customers? Or are you only targeting people who already are happy bitcoin wallet users? Are stablecoins part of your strategy?

Given how Visa,Mastercard,Paypal are seen as bad actors. Do you think you can capitalize on that, possibly partnering with Valve or something of that sort?

benjamaan · 2025-07-30T17:59:53 1753898393

We as MoneyBadger create an invoice for the customer in their local currency e.g. USD. If they pay with Bitcoin Lightning, they have 3 minutes to complete the transaction at our offered exchange rate. We take on the risk of the price moving.

If they’re paying with one of the exchange wallets we support like Luno.com, VALR.com or Binance.com we do the same, and they can choose to pay with any currency supported by those wallets.

Refunds are processed at time of refund and are for the original amount in the currency of the invoice e.g. USD but at the exchange rate at the time of refunds.

It really all just works the same as paying with a credit card overseas would if you’re paying a EUR bill with USD funds.

julius · 2025-07-19T11:09:32 1752923372

Chrome/Brave: https://chromewebstore.google.com/detail/youtube-no-translat...

julius · 2025-06-27T08:27:34 1751012854

Anyone with recent real-world experience?

From talking to AI, it seems the main issues would be:

- SEO (googlebot)

- Social Media Sharing

- CSP heavy envs could be trouble

Is this right?

julius · 2025-05-26T07:55:59 1748246159

Video of the ship and visualization of the technology: https://www.youtube.com/watch?v=wOK4TGd_l_Q

julius · on Nov 5, 2024

Thanks for making this playable. I have seen videos of it, but thought I had to wait for years until I can experience it.

The future will be wild. "Hey ChatGPT, lets play Counterstrike on the Enterprise-D. Counter-Terrorists agains Spongebobs"

julius · on Oct 28, 2024

At first I thought you made a website that gives me an empty Markdown file. But I am glad I downloaded it its actually a pretty nice template.

What are you personally doing with the yearly goals in that file. Are you copy and pasting them from last week, or are you typing them down everytime to re-iterate them (and possibly even modify) ?

ejs · on Oct 28, 2024

Thanks for checking it out!

Yeah, currently I am just copy/pasting the Yearly Goals section over. I want to eventually add a feature to allow someone signed up for the email to edit that part. Then someone could modify that goal section and have it correctly emailed each week.

julius · on Aug 24, 2024

Mercury sounds interesting. Requires a certain scale though (gravity is a bitch).

Considering just the initial mining and construction, bodies with low gravity and proximity to the earth feel like an efficient starting point, right? I always thought the moon would be a good place to bootstrap the first few thousand space habitats.

Your point about energy will probably be the biggest deal. Wondering how complicated it would be to ship a bunch of nuclear reactors to the moon. There seems to be quite a few companies working on small, "mass produced" reactors currently.