Thoughts on CIDOC-CRM Classes: "Oops, who put all this Time Stuff in my Box of Things!?"

I am using the CIDOC-CRM – the Conceptual Reference Model developed by the International Council of Museums – as the primary domain reference model guiding design and development of the FactMiners social-game platform. In a recent post I looked at the Conceptual Reference Model from a "pure graph" perspective, re-imagining the CRM's Property Declarations as "just another" labeled subset of model elements, that is, as just another important subset of CRM Classes. In this post, I explore the "entity-ness" of the CIDOC-CRM Class Declarations.

Oops, We've Got a Lot of Time 'Stuff' in Our Box of Things!?

The approach I am taking on the design of the FactMiners platform is leveraging the expressive and flexible power of today's graph database technologies. This solution technology encourages me to look at the CIDOC-CRM as much as possible from a "full graph" expression. In addition to this graph-based interest, I believe my personal experience contributes a unique perspective in working with the CIDOC-CRM.

Twenty-odd years ago I was a thought leader and designer/developer in a "skunkworks" doing "executable business model" software frameworks in Smalltalk as part of the Object Technology Practice of IBM Global Services. Through this work we did groundbreaking work "objectifying" Process, using an agent-based perspective, and driven by strict adherence to a rather simple but powerful role-actor metamodel. A "smart desktop" framework dynamically generated "application" views on the executing business model. Change the business model and the "applications" dynamically changed. We had an opportunity to "troll" the IBM Global Services consulting practices looking at their IRMs – their Industry Reference Models. We were looking for other practice collaborators who had a decent IRM and customers who might be interested in, or better yet, need exploratory business modeling services.

So when I looked into the "pile of stuff" that is the CIDOC-CRM, I had a whole lot of context and tactics for sorting stuff out and "seeing" what's there. Recently, as I poured over the CRM Definition with an eye toward importing the Property Declarations into a Neo4j graph database, I had some "A-ha!" Moments where I think I see what might be the biggest source of frustration folks have when trying to "dip into" (that is, to explore and understand) the CIDOC-CRM.

One of the distinguishing characteristics of the CIDOC-CRM is its "object-oriented" foundation. The OOP influence in the expression of the CIDOC-CRM is significant and useful. However, I believe that a less-technical reader will not see – or worse, justifiably be confused by – the important OOP-modeling distinction between "Thing objects" and "Activity objects" – to use the most general of Class names found in the E2 Temporal Entities and E77 Persistent Item branches of the class hierarchy.

Objectifying process as a first class object is a rather subtle OOP technique with significant implications. The current CIDOC-CRM Definition text document, the on-line class hierarchy diagram, and the various subsystem diagrams seriously underplay this important design distinction. I believe that if the official CRM Definition text and associated graphical diagrams more fully "set the stage" for understanding the importance and utility of this underlying "Big Picture" nature of the CRM, the motivated exploratory user might have a more successful experience.

Let's Take A Quick Look...

As you can see in the following screenshot collage, the official 'Definition of the CIDOC Conceptual Reference Model' presents the CRM Class Declarations as a prefix-sorted master list. The Modelling Principles section jumps immediately into some very deep and subtle distinctions of monotonicity, disjointness, extension, etc. while overlooking this considerable distinction between the E2 Temporal Entities and E77 Persistent Item branches of the hierarchy. Only two summary lists visually show the deeply indented hierarchies of both the Class and Property Declarations. And the explanation of this significance is barely discussed.

What jumps out at you is that the current CRM Definition document does a great job of being the 'Volume 2. Reference' of a two-volume set where the first volume provides the Big Picture perspective and example-based Getting Started material. This "missing volume" is a necessary complement to the strict "just the facts" Definition reference. Unofficial community contributions help to fill this void – notably, e.g., Dominic Oldman and the CRM Labs' "CIDOC-CRM: Primer V. 1.1" (PDF). However, the official www.CIDOC-CRM.org provided reference documents should include such a complementary introductory volume as part of the Definition set.

This "missing" introductory volume of the CRM reference documents would surely incorporate graphical diagrams similar to those currently available on the official CIDOC-CRM.org website. But a quick look at the available diagrams – as hard as I know it was to create and produce them them – confirms that the current resources dramatically underplay this important distinction between Thing Objects and Process (Temporal) Objects.

The current text reference and class hierarchy diagram do little to highlight the important distinction between the significant partition of the CRM model elements into 'Thing Objects' and 'Process Objects'.

The situation doesn't become any clearer when we look at the numerous subcomponent model diagrams on the CIDOC-CRM.org website. Again, I know how hard it was for someone to draw and validate these images. These diagrams are super helpful as far as they go. But their format seems more constrained by the diagram drawing tool used rather than there being an explicit model diagramming standard driving the current diagram presentation format.

Here, for example, is the Object Association Information component diagram. I picked this as it is relatively sparse and includes a good mix of 'Thing Objects and 'Process Objects.' This part of the model explains how Things and People can be found at certain Times in specific Places participating in discrete Events, etc.

The physical arrangement and IS_A superclass relationship double-line arrows reflect this Thing/Activity distinction among the CRM model elements, but this is a very subtle visual distinction. And as with all the current component diagrams on the official website, this diagram is a static image. There are no interactive links between the graphical Class nodes and Property edges and their respective entries in the official Definition text.

My own frustration at diving into and around the www.CIDOC-CRM.org on-line reference material has encouraged my desire to contribute an updated and more interactive reference resource for the model to the CIDOC-CRM community. Such a more exploratory resource could be a step-wise generalized contribution of our project beyond the specific focus of our work on the FactMiners platform. Here, for example, is what I did to explore some potential along these lines...

I started with a fresh copy of a Neo4j graph database with the Class Declarations as nodes and Property Declarations as relationships (between nodes). (The GitHub for this 'seed' of CIDOC-CRM model elements in a Neo4j database is here.)
I did a Cypher query to add a 'Persistent' and 'Temporal' label to each class based on its membership in the E77 or E2 branch of the class hierarchy.
I then did a query in the stock Neo4j browser to return and visualize the model elements (CRM Entities) associated with the Object Association Information component.
I next dragged the nodes of the "bouncy ball" browser graph visualization to more closely resemble the static diagram of the target diagram.
Finally, I clicked on the E7 Activity node to pop up its node properties (not to be confused with CRM Properties) to show how the full CRM Definition entries are in this Neo4j database... and then I did this screenshot.

While this stock browser visualization is too "one-off" to be truly useful, this does show the impact of something as simple as subset coloration (or other containment revealing visual queue) to help reveal the underlying design of the CRM model elements. And although the static nature of the screenshot does not reveal it, the ability to examine full Definition entries within the exploratory diagram is another step in a good direction to consider for the official CIDOC-CRM documentation and its companion website. If this much functionality can be gained from one-off use of available generic tools, imagine what we could accomplish by building a CIDOC-CRM reference resource with a full RESTful and public web interface built on a platform based on Neo4j + Structr + KeyLines (or Alchemy.js, Linkurious, or similar client-side visualization layer).

Exploratory Decomposition of CIDOC-CRM Graphical Diagrams

Continuing our thought experiment imagining an expanded set of CIDOC-CRM reference documents, I would encourage an effort to encapsulate the important structural design distinctions through a logical/functional "exploratory decomposition" graphical style. I simply don't have all the answers as to what this means fully, but I can provide an example based on my own thought experiment along these lines.

I mentioned that my prior "executable business model" experience was helpful to my understanding of the CRM. I also mentioned how the Smalltalk framework we built was agent-based, objectified Process, and was based on a "ruthlessly simple" top-level metamodel. I took a UML (Universal Modeling Language) Class model that is very similar to what we did in that Smalltalk skunkworks, and I overlaid the high-order CRM Entities that have an obvious model element alignment with this process-oriented UML Class model.

At one level, this diagram looks too simple to say all that much about a non-trivial software architecture. But appearances are deceiving when it comes to such software design diagrams. This UML diagram says a LOT about an agent-based role/actor software architecture. And having written software frameworks implementing such architectures, I know that the beauty and the "devil" is in the details suggested by such high-level diagrams. I know, for example, how extensive the OOP-programming decomposition of the Activity and Task objects can be to implement the required interfaces and functions of such a high-level diagram. Yet as complex and diffuse as the implementation decomposition of these high-level model elements may be, well-linked logical decomposition of such inter-linked diagrams of the full model can keep the reader in context to avoid confusion or misunderstanding.

This "Just Enough, Just In Time" exploratory presentation style is how we need prospective CIDOC-CRM users to be exposed to the full CRM model. We need a rich set of interactive diagrams that keep the reference-reader "in the moment" and in context. A "double-click dive" into the E7 Activity node, for example, would reveal the decomposition of Activities into the various modeled elements including E11 Modification, E10 Transfer of Custody, E86 Joining, E87 Leaving, E79 Part Addition, etc. In each case, a context-retaining diagram specific to the "leaf" class being investigated will not only be helpful for general model learners, but will be invaluable for software architects intending to use the CRM as a metamodel constraining a full system instance design.

I believe that there are far more developers today who are "comfortable users" of object-oriented programming technologies than there are those who are comfortable with designing and building systems based on advanced OOP architectures, especially ones that objectify process. For this reason, I believe that many mainstream programmers would find the current model reference resources insufficient to design and build full CIDOC-CRM based systems. Not that it can't be done... just that there would necessarily be a lot of "private mental concept transformations" required to get the job done.

Given this belief that there are more casual users of OOP technology than there are advanced OOP designer/developers, the proposed "missing" first volume of the CIDOC-CRM Reference documents should be envisioned as a Developer's Guide to CRM-compatible system design and programming. The non-technical/non-developer would find the overview and Big Picture content of such a guide useful. An exploratory learner can simply stop his or her reading of this introductory guide once the information goes deeper than needed for the learner to collaborate as a subject matter expert on a team developing such systems, or to assess the model when making a recommendation about possible adoption of the CRM by the prospective user's institution. Such a proposed Developer's Guide, however, would provide an invaluable "easy on ramp" for software architects and developers tasked with building CRM-compatible systems.

At this point, I can't go further without some non-trivial time and effort to provide additional specific examples of where we could go with CIDOC-CRM reference documentation. I do believe, however, that this post already sufficiently suggests why we might want to do this and how this effort might be approached.

As always, I appreciate comments and questions,
-: Jim :-