Hey, #LazyWeb!
Does anyone know of a #Python package that takes JSON-LD as input, and validates whether it conforms to schema.org schemas?
Bonus points if you can also easily detect fields that don't conform, not just a yes/no answer.
I looked at a bunch of things, but didn't have too much luck with it. I just figured someone must have done this before.
Update: kinda solved it, more below.
This entry was edited (21 hours ago)
Jens Finkhäuser
in reply to Jens Finkhäuser • • •Developers - Schema.org
schema.orgJens Finkhäuser
in reply to Jens Finkhäuser • • •So that was a dead end, sadly. I'm still sure there is a solution out there somewhere, probably an obvious one. But I didn't find it.
What I did find is an outdated project that reads in schema.org definitions, and generates pydantic models from them.
It works well enough, but uses pydantic 1, while we're now at 2.9 or thereabouts.
I've been in the process of updating it. Currently in a private repo, but I'll make it public. If you're interested, I'll update this thread when it's done.
Jens Finkhäuser
in reply to Jens Finkhäuser • • •Jens Finkhäuser
in reply to Jens Finkhäuser • • •Aurin Azadî
in reply to Jens Finkhäuser • • •Jens Finkhäuser
in reply to Aurin Azadî • • •@atarifrosch But schema.org does not contain that information 🙃
I want them to return results in which schema.org schemata are mentioned.
It's basically a tokenization problem that they're messing up, treating "JSON-LD" as the tokens "JSON" and "LD", and "schema.org" as "schema" and "org", and then return results matching two of the four tokens. 🤷♂️
Aurin Azadî
in reply to Jens Finkhäuser • • •Jens Finkhäuser
in reply to Aurin Azadî • • •Jens Finkhäuser
Unknown parent • • •Jens Finkhäuser
Unknown parent • • •Jens Finkhäuser
Unknown parent • • •Jens Finkhäuser
Unknown parent • • •Helge
in reply to Jens Finkhäuser • • •What do you mean by this?
My memory from when I last looked at schema.org was, that EVERYTHING IS VALID due to how their use
@vocab
. I'm unsure how one would claim that the nonsensicalis invalid. There are reasons, why I claim that json-ld is not ready. This is one of them.
Jens Finkhäuser
in reply to Helge • • •@helge well, it's less whether that is valid. But if you e.g. have:
{
"@context": "https://schema.org/",
"@type": "Thing",
"name": 3.14
}
It should probably tell me it's not valid, because the name property should be Text (it's a little difficult here because in a textual representation as JSON, everything is text, but from a typing point of view, this isn't).
Helge
in reply to Jens Finkhäuser • • •Jens Finkhäuser
in reply to Helge • • •Emelia 👸🏻
in reply to Jens Finkhäuser • • •@helge *points at her sign* “RDF (and by extension JSON-LD) is highly structured, schemaless, garbage that you may find useful data in”
So yeah, there is really no validation of schema or expected types. schema.org is a misnomer in that JSON-LD and RDF just don't care whether something is a string, URI, boolean, float, whatever.
Only Shex/Shacl really start to touch on schemas.
Emelia 👸🏻
in reply to Emelia 👸🏻 • • •Linked Data: Undersold, Overpromised?
SocialHubJens Finkhäuser
in reply to Emelia 👸🏻 • • •@thisismissem @helge I understand that.
And yet, the descriptions on schema.org are enough to perform validation with.
This isn't a JSON-LD or RDF question, really. It just happens to be the case that's what my data is expressed in.
Adrian
in reply to Jens Finkhäuser • • •As @thisismissem points out, schema.org is not really specifying but suggesting types to be used, same with sdo-types as domain/range. If you want constraints you must define them yourself with Shex/Shacl or JSON Schema. For AMB, a schema.org-based metadata profile for educational resources on the web we chose JSON-LD plus a normative JSON Schema so that also people unfamiliar with RDF can easily use it: https://w3id.org/kim/amb/20231019 (German) JSON Schema: https://w3id.org/kim/amb/20231019/schemas/schema.json
@helge
Allgemeines Metadatenprofil für Bildungsressourcen (AMB)
w3id.orgJens Finkhäuser
in reply to Adrian • • •@acka47 @thisismissem @helge I know I sometimes do not express myself well, but I feel these replies miss the point of my question by a wide margin.
Again, my question is *not* related to how RDF works. I understand that there are no constraints expressed on schema.org in a way that would make sense in the RDF world.
If you go up and follow my own reply from today, you'll see that it is perfectly possible to use the information from schema.org to generate pydantic models, which you can...
Jens Finkhäuser
in reply to Jens Finkhäuser • • •@acka47 @thisismissem @helge ... then use to validate data, also data expressed in JSON-LD.
If you'd like to see this rephrased, I am surprised that despite the tooling that's available, it seems difficult to find information on how to turn this Information into SHACL shapes or whatever, and validate via that. Even more, I would have expected someone to have done that.
But it doesn't matter much, from a practical point of view, since I have a solution.
To continue this train of thought just..
Jens Finkhäuser
in reply to Jens Finkhäuser • • •@acka47 @thisismissem @helge ... because, the site even publishes an (experimental) OWL2 definition, which should enable one to generate the SHACL shapes, and continue from there.
🤷♂️
In any case, problem solved, and thank you all for helping!
Adrian
in reply to Jens Finkhäuser • • •Jens Finkhäuser
in reply to Adrian • • •@acka47 I get your point.
The reason I treat this as an RDF interpretation of the question is that the flexibility of schema.org is nice in principle. But in practice, everyone who wants to interpret data, will impose some kinds of constraints in order to make that task possible.
Might as well start with treating what's documented as a starting point.