Skip to main content


Hey, #LazyWeb!

Does anyone know of a #Python package that takes JSON-LD as input, and validates whether it conforms to schema.org schemas?

Bonus points if you can also easily detect fields that don't conform, not just a yes/no answer.

I looked at a bunch of things, but didn't have too much luck with it. I just figured someone must have done this before.

Update: kinda solved it, more below.

This entry was edited (21 hours ago)
in reply to Jens Finkhäuser

My best guess is that you can probably use rdflib to parse the JSON-LD, and also one of the definitions from https://schema.org/docs/developers.html, but I'm missing how to validate one against the other 🤔
in reply to Jens Finkhäuser

So that was a dead end, sadly. I'm still sure there is a solution out there somewhere, probably an obvious one. But I didn't find it.

What I did find is an outdated project that reads in schema.org definitions, and generates pydantic models from them.

It works well enough, but uses pydantic 1, while we're now at 2.9 or thereabouts.

I've been in the process of updating it. Currently in a private repo, but I'll make it public. If you're interested, I'll update this thread when it's done.

in reply to Jens Finkhäuser

It's unfortunately not *quite* what I was looking for. But it's a basis for figuring out the rest.
in reply to Jens Finkhäuser

Part of the problem is that once "JSON" and "schema" are part of your search terms, search engines return stuff about JSON Schema, and ignore JSON-LD and schema.org results.
in reply to Jens Finkhäuser

If you want search results from a certain site, you can add site:schema.org to your search term. 🙂
in reply to Aurin Azadî

@atarifrosch But schema.org does not contain that information 🙃

I want them to return results in which schema.org schemata are mentioned.

It's basically a tokenization problem that they're messing up, treating "JSON-LD" as the tokens "JSON" and "LD", and "schema.org" as "schema" and "org", and then return results matching two of the four tokens. 🤷‍♂️

Unknown parent

Jens Finkhäuser
@andre I am not trying to validate schema.org...
Unknown parent

Jens Finkhäuser
@andre I have JSON-LD data. I want to know whether it conforms to one of the schemas published at schema.org.
Unknown parent

Jens Finkhäuser
@andre eh, maybe I misunderstood. It's not different 🤷‍♂️
Unknown parent

Jens Finkhäuser
@andre Yeah, but that's not the problem I'm having... I just want to validate some JSON 🤷‍♂️
in reply to Jens Finkhäuser

validates whether it conforms to schema.org schemas?


What do you mean by this?

My memory from when I last looked at schema.org was, that EVERYTHING IS VALID due to how their use @vocab. I'm unsure how one would claim that the nonsensical

{
  "@context": "http://schema.org",
  "moo": "mooo"
}

is invalid. There are reasons, why I claim that json-ld is not ready. This is one of them.
in reply to Helge

@helge well, it's less whether that is valid. But if you e.g. have:

{
"@context": "https://schema.org/",
"@type": "Thing",
"name": 3.14
}

It should probably tell me it's not valid, because the name property should be Text (it's a little difficult here because in a textual representation as JSON, everything is text, but from a typing point of view, this isn't).

in reply to Jens Finkhäuser

AFAIK: There is nothing in json-ld that tells you this is invalid. The schema.org validator agrees.
in reply to Jens Finkhäuser

@helge *points at her sign* “RDF (and by extension JSON-LD) is highly structured, schemaless, garbage that you may find useful data in”

So yeah, there is really no validation of schema or expected types. schema.org is a misnomer in that JSON-LD and RDF just don't care whether something is a string, URI, boolean, float, whatever.

Only Shex/Shacl really start to touch on schemas.

in reply to Emelia 👸🏻

@helge you might like this really old post of my on socialhub: https://socialhub.activitypub.rocks/t/linked-data-undersold-overpromised/2268/29?u=thisismissem
in reply to Emelia 👸🏻

@thisismissem @helge I understand that.

And yet, the descriptions on schema.org are enough to perform validation with.

This isn't a JSON-LD or RDF question, really. It just happens to be the case that's what my data is expressed in.

in reply to Jens Finkhäuser

As @thisismissem points out, schema.org is not really specifying but suggesting types to be used, same with sdo-types as domain/range. If you want constraints you must define them yourself with Shex/Shacl or JSON Schema. For AMB, a schema.org-based metadata profile for educational resources on the web we chose JSON-LD plus a normative JSON Schema so that also people unfamiliar with RDF can easily use it: https://w3id.org/kim/amb/20231019 (German) JSON Schema: https://w3id.org/kim/amb/20231019/schemas/schema.json

@helge

This entry was edited (18 hours ago)
in reply to Adrian

@acka47 @thisismissem @helge I know I sometimes do not express myself well, but I feel these replies miss the point of my question by a wide margin.

Again, my question is *not* related to how RDF works. I understand that there are no constraints expressed on schema.org in a way that would make sense in the RDF world.

If you go up and follow my own reply from today, you'll see that it is perfectly possible to use the information from schema.org to generate pydantic models, which you can...

in reply to Jens Finkhäuser

@acka47 @thisismissem @helge ... then use to validate data, also data expressed in JSON-LD.

If you'd like to see this rephrased, I am surprised that despite the tooling that's available, it seems difficult to find information on how to turn this Information into SHACL shapes or whatever, and validate via that. Even more, I would have expected someone to have done that.

But it doesn't matter much, from a practical point of view, since I have a solution.

To continue this train of thought just..

in reply to Jens Finkhäuser

@acka47 @thisismissem @helge ... because, the site even publishes an (experimental) OWL2 definition, which should enable one to generate the SHACL shapes, and continue from there.

🤷‍♂️

In any case, problem solved, and thank you all for helping!

in reply to Jens Finkhäuser

I actually tried to make a point about schema.org and not RDF: The "values expected" for an sdo property are not normative, thus, if you build a schema (be it with Shex, Shacl or JSON Schema) based on this information, it is not a schema.org schema but something stricter which you can call a "schema.org-based metadata profile". @thisismissem @helge
in reply to Adrian

@acka47 I get your point.

The reason I treat this as an RDF interpretation of the question is that the flexibility of schema.org is nice in principle. But in practice, everyone who wants to interpret data, will impose some kinds of constraints in order to make that task possible.

Might as well start with treating what's documented as a starting point.