<- Back
Comments (101)
- vintermannI appreciate having an OCR interface rather than having to chat with a bot, but unfortunately chatting with Gemini 3 gives far better results than this. I gave it the document Gemini 3 got a surprisingly good result on:https://urn.digitalarkivet.no/URN:NBN:no-a1450-rk10101508282...and the output wasn't even recognizably Danish.Just out of pity I gave it a birthday card from my sister written in very readable modern handwriting, and while in managed to make the contents of that readable, the errors it made reveals that it has very little contextual intelligence. Even if ! and ? can be hard to tell apart sometimes, they weren't here, and you do not usually start a birthday letter with "Happy Birthday brother?"
- temp0826My current holy grail is my attempt to convert a Shipibo (an indigenous Peruvian language)-to-Spanish dictionary into a Shipibo-to-English dictionary. The pdf I have (available freely on archive.org) isn't a great scan (though I think it'd be a heck of a lot easier than some of the handwritten examples they show). Layout (2-columns) along with header/footers can cause some headaches, but it is all Latin script. This seems to fall on its face pretty badly (not even a couple of pages in), so my search continues. (The other major problem I'm having is trying to separate out Shipibo definitions/examples from the Spanish ones, and only translating the Spanish to English...so pretty complex I guess. I've been taking fresh stabs at this project every few months when I see OCR/LLM news pop up and continue to be disappointed)
- ameliusCan we have an open source tool that uses the same API, and that you can just instruct to use Mistral or any other service if you think the open source tool has quality issues for a particular text?This makes more sense to me, as I find that FOSS OCR is quite okay for most usecases.
- GZGavinZhaoDoes it handle math expressions (those rendered from LaTeX) well? I've been looking for a good OCR model to transcribe my math textbooks into markdown (obviously ignoring the images and figures) with LaTeX as math expressions, and none of the current OCR models work reliably enough.EDIT: you can try it yourself for free at https://console.mistral.ai/build/document-ai/ocr-playground once you create a developer account! Fingers crossed to see how well it works for my use case.
- petcatIt seems like Mistral is just chasing around sort of "the fringes" of what could be useful AI features. Are they just getting out-classed by OAI, Google, Anthropic?It seems like EU in general should be heavily invested in Mistral's development, but it doesn't seem like they are.
- hereme888I'm reading worse performance than many OSS offerings like Paddle, MinerU, MonkeyOCR, etc:https://www.codesota.com/ocr
- TiberiumFrom a tweet: https://x.com/i/status/2001821298109120856> can someone help folks at Mistral find more weak baselines to add here? since they can't stomach comparing with SoTA....> (in case y'all wanna fix it: Chandra, dots.ocr, olmOCR, MinerU, Monkey OCR, and PaddleOCR are a good start)
- tecoholic> Mistral OCR 3 is ideal for both high-volume enterprise pipelines and interactive document workflows.I don’t know how they can make this statement with 79% accuracy rate. For any serious use case, this is an unacceptable number.I work with scientific journals and issues like 2.9+0.5 and 29+0.5 is something we regularly run into that has us never being able to fully trust automated processes and require human verification every step.
- pzothere has been so many open source OCR in the last 3 months that would be good to compare to those especially when some are not even 1B params and can be run on edge devices.- paddleOCR-VL- olmOCR-2- chandra- dots.ocrI kind of miss there is not many leaderboard sections or arena for OCR and CV and providers hosting those. Neglected on both Artificial Analysis and OpenRouter.
- jesuslopI am testing it as a replacement of MathPix, first few tests look rather decent. In python for windows: https://pastebin.com/uyiFHKdJ (alpha version prototype). Launches windows snip tool, waits for clipboard image, calls Mistral, retrieves markdown and puts it as text in the clipboard, ready to be pasted in Typora, Obsidian, or other markdown editor.
- speffThis might be a good place to check the options available for OCR in-place translations. I took a look at OCR3, but it doesn't seem to support my use-case. It looks more tailored towards data extraction for further processing.I've got some foreign artbooks that I would like to get translated. The translations would need to be in place since the placement of the text relative to the pictures around it is fairly important. I took a look at some paid options online, but they seemed to choke - mostly because of the non-standard text placements and all.The best solution I could come up with is using Google Lens to overlay a translation while I go through the books, but holding a camera/tablet up to my screen isn't very comfortable. Chrome has Lens built in, but (IIRC) I still need to manually select sections for it to translate - it's not as easy to use as just holding my phone up.Anyone know of any progress towards in-place OCR/translations?
- singularity2001No one mentioning the possibly most beautiful css effect on the Internet??
- film42Is open router still sending all OCR jobs to Mistral? I wonder if they're trying to keep that spot. Seems like Mistral and Google are the best at OCR right now, with Google leading Mistral by a fair bit.
- 7thpowerMy main beef with mistral is that they don’t bother to respond to customer inquiries for products the hide behind “reach out for pricing” terms, so even if they were better than SoTA it wouldn’t really matter.
- singularity2001Not OS / free weights right?
- vascoGave it a birth registry from a Portuguese locality from 1755 which my dad and I often decipher to figure out geneology and it did a terrible job.Regular Gemini Thinking can actually get 70-80% of the documents correct except lots of mistakes on given names. Chatgpt maybe understands like 50-60%.This Mistral model butchered the whole text, literally not a word was usable. To the point I think I'm doing something wrong.The test document: https://files.fm/u/3hduyg65a5
- constantinumAt instances where data accuracy is of paramount importance, i think a hybrid route of non-llm ocr for data parsing and LLMs for structured data extraction is the safe passage to tread on. Seen better results for LLMWhisperer(OCR)[1] and Latest Gemini.[1] - https://pg.llmwhisperer.unstract.com/
- awaymazdacx5[dead]
- greenique[dead]
- breadislove[flagged]