Why an LLM won't replace a good parser. A case with auto parts

A client came to me with a task that at first glance looks almost naively simple — so much so that it’s a little suspicious.

There is a catalog: more than 50,000 items, and it’s not static — it constantly grows as suppliers add their goods, each living at their own pace and not particularly caring about uniform naming. On the other hand — an incoming stream of orders with no structure at all: someone writes a part number, someone a car model, someone tries to describe the problem in natural language, and someone limits themselves to something like “need front brake pads for Audi A4 B8 urgently”, as if the system is obliged to understand all that from intonation.

Managers who work with this every day usually don’t explain exactly how they do it. They just look at the text and fairly quickly extract the meaning, but if you look more closely, it becomes clear there’s no magic understanding. It’s more of a stable pattern-recognition skill: a set of words instantly maps to familiar entities that have been stored in the head for years — part type, model, generation, permissible replacements, critical parameters. It’s not reading text, it’s working with an internal reference that long ago stopped being explicit.

And against this background the first and most obvious suggestion inevitably arises: let’s plug in an LLM, let it “understand” all of this.

Where this idea breaks

The problem is that such tasks are very easily mistaken for language-understanding problems, while in essence they are closer to engineering filtering with strict constraints than to semantic analysis.

Take simple examples. Oil filter 2.0L diesel Audi A4 and Oil filter 2.0L petrol Audi A4 — from the standpoint of linguistic similarity this is almost the same description, the difference looks cosmetic, clarifying, something the model might “average out” as insignificant. But in reality these are two different physical entities, and an error here doesn’t become a small mismatch, it becomes incompatibility.

The same goes for brake discs: Audi A4 B7 brake disc 280x22 and Audi A4 B8 brake disc 280x25 look almost identical, especially if you look at them as text. The three millimeter difference in thickness almost disappears in semantic space, dissolving into overall similarity, but in engineering reality that difference is the key one that separates “fits” from “does not fit at all”.

And here it’s important to note: the model isn’t making a mistake in the usual sense. It’s simply doing what it does best — finding things that are close in meaning. The problem is that meaning isn’t the primary measure here.

Vector search and the neat illusion of closeness

The next step that usually comes to mind almost automatically is embeddings and vector search. At first glance it even looks like the right direction: texts turn into vectors, then we just find the nearest matches, and it seems the problem is solved more elegantly.

In practice everything comes down to the same fundamental issue: vector space works well where “similarity” truly means useful closeness, but starts to fail where strict differences in specific parameters matter. It nicely brings together front brake pads and “передние колодки” because there the meaning really matches, but it equally confidently brings together 280x22 and 280x25, because to it those are just two very similar sets of symbols, not two mutually exclusive engineering parameters.

At some point it becomes obvious that all this semantic closeness starts to work against the task: the system starts finding “almost right”, and in this domain almost right usually means wrong.

The attempt to “just train your own model”

Another understandable idea usually follows: if off-the-shelf models don’t account for the specifics, we should train our own on our data, for our nomenclature.

Sounds rational, but only until you look at the volume required. For a model to reliably distinguish critical parameters — engine type, platform, dimensions, part compatibility — you need not just data, but an enormous number of carefully labeled examples where each match is confirmed by a person who knows what they’re doing.

And here reality usually begins. The data either doesn’t exist, or it’s insufficient, or it’s noisy and incomplete, and trying to bring it to the required quality becomes a separate long project that starts to rival the task in complexity. The irony is that at this point the “simple ML solution” suddenly becomes more expensive than carefully written rules.

What actually needs to be done

If you remove all the habitual pull toward “smart solutions”, the task turns out to be pretty down-to-earth. You don’t need to understand the text in a human sense, you don’t need to guess intent, and you don’t need to look for semantic analogs. You need something else: parse the input string into a structure and then work not with free text but with a set of strictly defined fields.

From a string like Brake disc ventilated front R 280x25 Audi A4 B8 you need to extract not the “meaning” but the structure: part type, variant, position, dimensions, model and generation. After that search becomes a simple comparison of parameters, where either they match, or they don’t, or a strictly defined tolerance is allowed if the business logic permits.

And at that moment it becomes clear that the hardest part of the whole system is not about algorithms. It’s about the domain. About how parts are organized, which parameters are critical, how they relate to each other, what can be considered equivalent and what must not be touched under any circumstances. It’s a dictionary that usually doesn’t look like technology, which is why it’s underestimated.

Where LLM is actually appropriate

That said, the LLM is not thrown out of the system, as one might want to declare in a fit of engineering honesty. Its role is simply much narrower and, to be honest, quite mundane.

It works well where the input text is unstructured, noisy, written haphazardly, and contains everything at once. In such cases the model actually does a decent job extracting structure: it recognizes the car model, part type, generation, position, even when it’s all smeared into one sentence with no hint of order.

But after that it’s no longer needed. After the extraction stage begins the zone where you can’t work with probabilities and similarities, because the cost of an error is too straightforward. What remains are normalization, a dictionary, exact matches and compatibility rules.

And that’s probably the main point in this whole story.

Instead of a conclusion

In such tasks there is almost always a temptation to find a “smarter model” that will somehow resolve the complexity itself. That’s an understandable desire — it saves time on the unpleasant part of the job, where you have to deal not with technology but with the subject area.

But reality is simpler and a bit more boring: if a task requires accuracy, it won’t be saved by semantics, vector closeness, or an attempt to train a universal intelligence for local data chaos. What saves it is structure, rules, and an understanding of which parameters in the system are critical.

And sometimes the least exciting tool turns out to be the only one that really works.

Why an LLM won't replace a good parser. A case with auto parts

Where this idea breaks

Vector search and the neat illusion of closeness

The attempt to “just train your own model”

What actually needs to be done

Where LLM is actually appropriate

Instead of a conclusion

Другие статьи Python Dev

Generating images from a template: SVG + Python + Playwright

Why I decided to build the billionth ToDo app in my spare time

Two days on a task that seemed trivial: asynchronous loading in Telegram bots

Проекты Python Dev

Robot collector: automatic debtor calls

Automatic management of a Telegram channel network for a travel agent

Automated call transcript: from recording to a structured document

Need help?