Data Modeling: Part 2 — Method for Time Series Databases

Read Part 1 — Goals and Methodology Time is notorious in modeling tasks. Indeed, the temporal aspect exacerbates the complexity of the modeling task, making simple diagrams look pretty complex. The temporal dimension becomes particularly nasty when it takes part in identifying entities.

Figure 1
What kind of entities are we trying to represent? | Time-varying entities, entities whose attributes vary over time.The answer comes directly from the literature on temporal databases: Time-varying entities are entities characterized by time-varying attributes — the variability of the value associated with an attribute is an integral part of the attribute definition. Formally speaking, a time-varying attribute is a function from the temporal domain to the domain associated with the attribute type. Time-varying entities differ from traditional entities, which model an instantaneous view. In practice, a time-varying entity describes the variations of all its time-varying attributes over time. The figure above also shows the time-varying interpretation of the employee entity, where we can observe the evolution of his age and salary over time.
Measurement ⇐⇒ Entity as valid. As time-varying entities may contain multiple time-varying and static attributes, we need to understand how to map such concepts into TSBDs and, in particular, InfluxDB. Moreover, the entity’s primary key may be composite; in that case, the TSDB counterpart should preserve the identifying nature of the attribute.
The table below shows our proposed naive mapping.
| Conceptual | Logical | Physical |
| Time-Varying Entities | Measurement | Measurement |
| Time-Varying Attribute name | Field key | Field key |
| Time-Varying Attribute value | Field value | Field value |
| Attribute name (part of the primary key) | Tag key | Tag key with K convention |
| Attribute value (part of the primary key) | Tag value | Tag value with K convention |
| Attribute name (not part of the primary key) | referenced | field key with constant convention or external table |
| Attribute value (not part of the primary key) | referenced | field key with constant convention or external table |
Measurement ⇐⇒ Entity association appears practical, such an association does not provide a complete perspective on InfluxDB abstractions, as we will see in the following.
What about relationships? | The discriminatory aspect is whether time-varying attributes characterize the relationship.Like traditional database design, TSDB relationships are slightly more sophisticated in their modeling than entities. Intuitively, we want to observe the evolution of a relationship and its attributes over time. However, a relationship involves multiple parties (typically two, but possibly many). Therefore, we must distinguish the cases.
- A relationship between static entities
- A relationship between a time-varying entity and a static one
- A relationship between two time-varying entities

Image 1

Figure 2
- If the relationship has time-varying attributes, then construct a whole new series.
- Instead, if the relationship has no time-varying attribute itself, we extend the tagset of the series modeling the time-varying side with additional tags representing the static side key and an additional tag representing the static entity name.
As before, we can use InfluxDB’s tags to represent these relationships within the series, as in the figure on the right. At a conceptual level, the series doesn’t represent the two static entities entirely. For example, it omits attributes like ball color. This issue is similar to what we discussed about static attributes of time-varying entities.
Let’s continue our NBA scenario by representing our last case, the relationship between two time-varying entities. In this case, we build on the extended schema we created. Now, we want to represent collisions between balls. To this extent, we consider the self-relationship between the POSITION entity.
While at the conceptual level, it is easy to understand the representation of collision as a relationship between objects in space and time, the representation of such a relationship is less intuitive. Indeed, the difference between COLLIDE and POSITION is the role of time, which in the former helps identify individual collisions. Continuing on the billiard examples, but in a much-simplified version that does not consider the balls’ volumes, we can define a collision as the occurrence of two balls in the same position at a given time instant “t,” — at t, ball1(x1,y1) and ball2(x2,y2) are such that x1==x2 or y1==y2. Such comparison implies a time-based join, comparing the position of the balls for each of the individual time instants. Such operation is not just algorithmically expensive but also unfeasible due to the infinite nature of the stream. Indeed, even considering the order nature of temporal data, the comparison scope remains unbounded.
As we discussed in the “Introduction to Streaming Data,” the common way to approach unboundedness is by using window operators. Therefore, by using a continuous query language like Flux, it is possible to represent time-varying relationships across time-varying entities.
As shown in the query below, the Flux query for the simplified billiard example is still quite sophisticated. It requires a self-join for each dimension to compare (x,y).
However, the most important aspect is the use of the window operator at line 7, which reduces the cardinality of the comparisons later. Modifying the value of the window (60 seconds is the average duration of a turn), we will identify a different number of collisions because the comparison scope will change.
An alternative implementation could use `aggregateWindow` to downsample the stream elements, but at the expense of the precision of the detection. In such a case, it is essential to choose the aggregation function accurately. For instance, “mean” will approximate the result differently than “last.”
In the query, we use pivoting to align the join condition by comparing the dimensions. This approach allows us to avoid pairwise comparisons for each combination. On the other hand, it requires filtering out those entries representing “self collisions,” including the collisions from the perspective of both the balls.
Finally, we unpivot the series resulting from the comparison of the dimension and using the recent experimental union, and we merge the two series back together.
In conclusion, modeling time-varying relationships between time-varying entities is an advanced task that depends on the window size parameter. A continuous query language like Flux makes this possible, but the task remains quite sophisticated. Finally, the table below summarizes our analysis for data modeling.
Summary Table
| Entity | Entity | Relationship (‘s attributes) | |
| Static | Static | Static | Out of scope |
| Static | Static | Time-Varying | See Time Varying Entity table 1, plus tags for the measurement name for the entities |
| Static | Time-Varying | Time-Varying | See Time Varying Entity table 1, plus tags for the measurement name for the entities |
| Static | Time-Varying | Static | Add tags to series for the right entity for keys attributes and the name of the left entity |
| Time Varying | Time-Varying | Time-Varying | Requires windowing |
import "experimental"
import "influxdata/influxdb"
all = from(bucket: "training")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "position")
|> group(columns: ["ball", "_field"], mode: "by")
|> window(every: 60s)
|> pivot(rowKey: ["_start", "_time", "_stop"],
columnKey: ["_field"], valueColumn: "_value")
//ball cannot be key otherwise i have only self collisions
all1 = all |> group(columns: ["_measurement"], mode: "by")
all2 = all |> group(columns: ["_measurement"], mode: "by")
//ball cannot be key otherwise i have only self collisions
collisionx = join(tables: {b1: all1, b2: all2}, on: ["_time", "x"])
|> filter(fn: (r) => r.ball_b1 != r.ball_b2)
|> map(fn: (r) => ({r with _measurement: "collision"}))
|> keep(columns: ["_measurement","_field","_value","_time","ball_b1", "ball_b2","x"])
|> group(columns: ["_measurement", "ball_b1", "ball_b2"])
|> experimental.unpivot()
collisiony = join(tables: {b1: all1, b2: all2}, on: ["_time", "y"])
|> filter(fn: (r) => r.ball_b1 != r.ball_b2)
|> map(fn: (r) => ({r with _measurement: "collision"}))
|> keep(columns: ["_measurement", "_field", "_value", "_time", "ball_b1", "ball_b2", "y"])
|> group(columns: ["_measurement", "ball_b1", "ball_b2"])
|> experimental.unpivot()
//remerge the two series and write
union(tables: [collisionx, collisiony]) |> influxdb.wideTo()

