The Historization feature enables you to keep track automatically of all graph changes which has happened over the time on given avatars domains. It also provides a dedicated language which allows you to query the changes history in order to get for example a particular graph state at a given time or to get the changes which have occured between two provided timestamps.
The historization feature is only available for new domains created with the historized flag set to true
. All changes brought to the historized domains are tracked and saved by ThingIn with all required information in order :
So within an historized domain, nothing is deleted until the domain itself is deleted. Apart of domain, a deletion means updating the entity/value validity timerange [from, +oo[
and set the right timestamp to the current date time [from, to)
. This is very important to keep this in mind in order to keep the historical database size under control. It means that the historization feature may not be compatible with all use cases.
Historized domains can be queried using a dedicated query language named tcypher which extends the cypher language with time related directives (several limitations are described in this wiki).
To start using and learning the historization feature you first need data which has evolved over the time. So in this quick start section, we will first create a historized domain, secondly we will initialize a simple graph on which we will apply some successive changes, and finally we will execute some queries to travel in time.
In the remaining sub-parts of this section we will use the $THINGIN_TUTORIAL_DOMAIN
environment variable, which is to be replaced by an unused domain name such as export THINGIN_TUTORIAL_DOMAIN=http://www.example.com/tutorial-docs-$(date +%s)/
(it will generate a new domain name using the current epoch time).
In this part, we also provide some curl based simple shell scripts which automate some ThingIn API call. Those scripts require the following environment variables to be set in order to get access to the ThingIn platform :
THINGIN_API
to be set typically with https://coreapi.thinginthefuture.com
THINGIN_TOKEN
to be set with a value the ThingIn user interface can provide you. (develop menu > Get my Thing in token)This is straightforward, just execute a POST /domains/
with such JSON payload :
{
"iri": "$DOMAIN",
"historized": true,
"force_update": false
}
You can use the script historization-quick-start-domain-create.sh
once you've initialized the THINGIN_TUTORIAL_DOMAIN
, THINGIN_API
and THINGIN_TOKEN
. When started it will print a T0
ISO-8601 date corresponding to the time where your domain was empty.
The force_update
parameter is optional (false
by default), true
means that an overwrite with the same value than before will generate a distinct event.
WARNING: currently historized domains can be requested without verifying user rights. Visibility and ACL are bypass !
Consider the script historization-quick-start-domain-scenario.sh
which will execute the following graph operation scenario:
$DOMAIN/A
with hasLumens=0$DOMAIN/B
with description="somewhere"$DOMAIN/A
->$DOMAIN/B
of type isIn$DOMAIN/A
set hasLumens=42$DOMAIN/A
->$DOMAIN/B
$DOMAIN/B
$DOMAIN/A
Each steps will be executed with a 2 seconds delay. And 7 new timestamps T1
to T7
will become available for queries.
Typical script execution will look like this :
Now you can query what had happened on your graph, juste replace $T0
... $T7
with the printed timestamps by previous script execution.
The following queries can be executed directly with this ThingIn user interface (Explore > Explore time with Thing in) :
Or you can also use the script historization-quick-start-domain-queries.sh
which will automatically execute all the following queries and print the results on the standard output. Don't forget provide to provide the T0 to T7 environment variables to the current shell you are using.
SNAPSHOT $T0
MATCH (n)
WHERE static(n.domain) = '$THINGIN_TUTORIAL_DOMAIN'
RETURN n
no avatars or edges is returned.
SNAPSHOT $T1
MATCH (n)
WHERE static(n.domain) = '$THINGIN_TUTORIAL_DOMAIN'
RETURN n
only 1 avatar is returned.
SNAPSHOT $T4
MATCH (a)-[e]->(b)
WHERE static(a.domain) = '$THINGIN_TUTORIAL_DOMAIN'
RETURN a,e,b
returns A and B as at $T4 A has a relationship with B. If you try $T2 nothing will be returned as the relationship is not existent at this timestamp.
The results as shown by ThingIn user interface should look like this :
The same execution but at $T3 will show a different value for the hasLumens data property.
SNAPSHOT $T4
MATCH (n)
WHERE static(n.domain) = '$THINGIN_TUTORIAL_DOMAIN'
RETURN count(n)
Returns 2 and if you replace $T4 by $T7 (or the current date) you'll get 0 as everything has been deleted, of course if you use $T0 you'll also get 0 as the avatars were not yet created.
The used query language is named T-Cypher, it is a Temporal Graph Query Language which extends the cypher query language with temporal features.
Four kind of queries are available :
SNAPSHOT
: Query the graph state at a precise timestampRANGE_SLICE
: Query for graph changes between two timestampsLEFT_SLICE
: Query for graph changes before a given timestampRIGHT_SLICE
: Query for graph changes after given timestampT-Cypher queries require one or two timestamps expressed as date-time with an offset from UTC/Greenwich in the ISO-8601 calendar system :
2011-12-03T10:15:30Z
2011-12-03T10:15:30.000Z
2011-12-03T10:15:30+01:00
Let's start with a simple example of a query to look back in the past for all connected nodes pair, a
and b
, with the source node a
belonging to a particular domain and having a hasLumens attribute lower than 17.
SNAPSHOT 2021-09-26T11:25:52+02:00
MATCH (a)-[e]->(b)
WHERE
static(a.domain) = 'http://www.example.com/tutorial-docs-42/' AND
a.`http://orange-labs.fr/fog/ont/iot.owl#hasLumens` < 17
RETURN a,e,b
RANGE_SLICE [2021-09-26T00:00:00+02:00; 2021-09-26T23:59:59+02:00) BY 30 MINUTES
MATCH (a)-[e]->(b)
WHERE
static(a.domain) = 'http://www.example.com/tutorial-docs-42/'
RETURN a,e,b
A typical query is made of four main parts (more details available on Cypher or T-Cypher web sites) :
SNAPSHOT timestamp
: graph state at timestampRANGE_SLICE [timestampA; timestampB)
: graph states between timestampA included and timestampB excludedLEFT_SLICE timestamp
: graph states before timestampRIGHT_SLICE timestamp
: graph states after timestampBY positiveIntegerValue MILLIS|SECONDS|MINUTES|HOURS
BY 15 MINUTES
means I want results every 15 minutes, if you make a query between 02:00
and 03:00
, you'll have graph states at most only for 02:00
, 02:15
, 02:30
, 02:45
.
02:15
- 02:45
then in your results you'll have results only for 02:00
, 02:45
02:00
- 02:15
, for example at 02:02
, 02:10
and 02:12
you will only get a state for 02:00
consolidating (final state) everything which has occured in the 02:00
- 02:15
interval.1970-01-01T00:00:00Z
(n)
means you're just looking for any node (you don't care about relationships) and matching nodes will be kept into the parameter named n
.(a)-[]->(b)
means you're looking for two nodes which have any kind of relation from a to b.(a)-[e]->(b)
the same but you want also to be able to apply some filtering predicates on e, and/or also return the relationship into the response.(n)-[ed:`http://elite.polito.it/ontologies/dogont.owl#isIn`]->()
means match nodes n with an object property of type isIn with any other nodeAND
or OR
operatorsn.dataPropertyName
, examples :
n.myDataPropertyLabel
n.http://orange-labs.fr/fog/ont/iot.owl#hasLumens
e.`http://elite.polito.it/ontologies/dogont.owl#isIn`
static(a.domain)
=
, <
, >
, >=
, <=
, !=
, ..."somestring" IN n.classes
RANGE_SLICE [2023-06-19T14:00:00Z; 2023-06-19T16:00:00Z) BY 60 seconds
MATCH (n)
WHERE static(n.domain) = 'http://www.example.com/analog-clock/'
OPTIONAL MATCH (n)-[e]->(m)
RETURN n, e, m
n
if the current node designated by n
has no outgoing relationsn
, e
and m
: n with one of its relation
b
has 5 relations, you will get 5 records with then
value but 5 differents couple values for e
and m
.RETURN n
will return one node of matching patterns such as (n)
, (n)->(x)
, (n)-[e]->()
,...RETURN a,e,b
will return matching 2 nodes and 1 relation of course each named parameter (a,e,b)
refers to matching node or edge of a same pattern.RETURN count(n)
count the number of matching nodeRETURN 1
The historization API is about data querying, it provides two http endpoints avatars/tfind
and avatars/tcount
which need a JSON payload as input and return a JSON response as output.
The avatars/tcount
is just a syntaxic sugar that allows you to directly get the count value returned by the provided query independentely of how the RETURN
clause is written.
The historization API is state timestamp oriented, when the historical database is queried :
So, a tcypher query when executed :
stateTimestamp
)
matching-mode
to keep the amount of data returned under controlFor data property values, the historization supports natively the following data types both at storage and event broadcast levels :
The input JSON specify the TCYPHER query you want to execute, the format is the following :
{
"tcypher":"SNAPSHOT 2022-09-26T08:36:21.221Z MATCH (n) RETURN n",
"matching_mode":"is-in",
"with_start_intervals_timestamps":true,
"with_results_validity_timeranges":true
}
tcypher
: the tcypher query stringmatching_mode
: which information should be returned for each state change timestamp
is-in
: all nodes, edges, data properties which validity timerange containing the state timestamp are returned.at-start
: a match is returned if and only if at least one of its node, edge or data property have a start validity timerange equal to the state timestamp.at-end
: a match is returned if and only if at least one of its node, edge or data property have a end validity timerange equal to the state timestamp.with_start_intervals_timestamps
: force the insertion of a "false" state timestamp in order to capture the initial state
RANGE_SLICE
and RIGHT_SLICE
kind queriesmatching_mode
must be is-in
in order to be able to capture the initial state.with_results_validity_timeranges
: include in the returned responses the validity timerange
All query responses are JSON based and follows the schema described in this section, exhaustive information about this schema is available in the SWAGGER API documentation.
The avatars/tfind
API call response is always paginated. API users can easily use automatic JSON deserialization as we provide a @class
field which give the type or sub-type of json object.
The first response level follow ThingIn standard pagination and provide you with the following information :
page_size
: Number of items per pageindex
: Retrieved page indexnext
: Next page URL linksize
: Total number of results or -1 if unknownitems
: TCypher query results in a data structure named Envelope
The Envelope
data structure is variant data type designed to support multiple data types (@class
can be use to distinguish them) although currently only one is provided :
header
: Meta data about the executed query and the used response formatresponse
: Response content in match states formatAll envelopes always contains the same header data structure, but a different response json object which is either (once again @class
is given):
timestamp
: when the query has been executedsize
: number of returned results in the data partdata
: queried results as a list of ReturnedState
objectsA ReturnedState
object has the following fields :
stateTimestamp
: Timestamp of the current stateavatars
: All requested avatars as a map of variable name associated to a ReturnedAvatar
object.edges
: All requested edges as a map of variable name associated to a ReturnedEdge
.values
: literals value - not yet supported)We won't going further into the description of the response format as it is fully described in the SWAGGER/OPENAPI documentation, take a look for example to the data structure named TFindPaginatedResponse
, the goal was just to give you the main principles.
Current historization release comes with several limitations :
SNAPSHOT 2021-09-26T11:25:52+02:00
MATCH (n)-[ed]->()
WHERE ed.`http://elite.polito.it/ontologies/dogont.owl#isIn` = 42
AND static(n.domain) = 'https://www.example.com/truc/'
RETURN COUNT(n)
RETURN a,e,b
)
RETURN 1
for example, you'll get in the results just the stateTimestamp
.