mysql - design database relating to time attribute

Question

Welcome To Ask or Share your Answers For Others

mysql - design database relating to time attribute

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:46:28+0000

Here is a model to achieve your stated requirement.

Link to Time Series Data Model

Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.

Normalised to 5NF; no duplicate columns; no Update Anomalies, no Nulls.
When the Status of a Product changes, simply insert a row into ProductStatus, with the current DateTime. No need to touch previous rows (which were true, and remain true). No dummy values which report tools (other than your app) have to interpret.
The DateTime is the actual DateTime that the Product was placed in that Status; the "From", if you will. The "To" is easily derived: it is the DateTime of the next (DateTime > "From") row for the Product; where it does not exist, the value is the current DateTime (use ISNULL).

The first model is complete; (ProductId, DateTime) is enough to provide uniqueness, for the Primary Key. However, since you request speed for certain query conditions, we can enhance the model at the physical level, and provide:

An Index (we already have the PK Index, so we will enhance that first, before adding a second index) to support covered queries (those based on any arrangement of { ProductId | DateTime | Status } can be supplied by the Index, without having to go to the data rows). Which changes the Status::ProductStatus relation from Non-Identifying (broken line) to Identifying type (solid line).
The PK arrangement is chosen on the basis that most queries will be Time Series, based on Product?DateTime?Status.
The second index is supplied to enhance the speed of queries based on Status.
In the Alternate Arrangement, that is reversed; ie, we mostly want the current status of all Products.
In all renditions of ProductStatus, the DateTime column in the secondary Index (not the PK) is DESCending; the most recent is first up.

I have provided the discussion you requested. Of course, you need to experiment with a data set of reasonable size, and make your own decisions. If there is anything here that you do not understand, please ask, and I will expand.

Responses to Comments

Report all Products with Current State of 2

SELECT  ProductId,
        Description
    FROM  Product       p,
          ProductStatus ps
    WHERE p.ProductId = ps.ProductId  -- Join
    AND   StatusCode  = 2             -- Request
    AND   DateTime    = (             -- Current Status on the left ...
        SELECT MAX(DateTime)          -- Current Status row for outer Product
            FROM  ProductStatus ps_inner
            WHERE p.ProductId = ps_inner.ProductId
            )

ProductId is Indexed, leading col, both sides
DateTime in Indexed, 2nd col in Covered Query Option
StatusCode is Indexed, 3rd col in Covered Query Option
Since StatusCode in the Index is DESCending, only one fetch is required to satisfy the inner query
the rows are required at the same time, for the one query; they are close together (due to Clstered Index); almost always on the same page due to the short row size.

This is ordinary SQL, a subquery, using the power of the SQL engine, Relational set processing. It is the one correct method, there is nothing faster, and any other method would be slower. Any report tool will produce this code with a few clicks, no typing.

Two Dates in ProductStatus

Columns such as DateTimeFrom and DateTimeTo are gross errors. Let's take it in order of importance.

It is a gross Normalisation error. "DateTimeTo" is easily derived from the single DateTime of the next row; it is therefore redundant, a duplicate column.
- The precision does not come into it: that is easily resolved by virtue of the DataType (DATE, DATETIME, SMALLDATETIME). Whether you display one less second, microsecond, or nanosecnd, is a business decision; it has nothing to do with the data that is stored.
Implementing a DateTo column is a 100% duplicate (of DateTime of the next row). This takes twice the disk space. For a large table, that would be significant unnecessary waste.
Given that it is a short row, you will need twice as many logical and physical I/Os to read the table, on every access.
And twice as much cache space (or put another way, only half as many rows would fit into any given cache space).
By introducing a duplicate column, you have introduced the possibility of error (the value can now be derived two ways: from the duplicate DateTimeTo column or the DateTimeFrom of the next row).
This is also an Update Anomaly. When you update any DateTimeFrom is Updated, the DateTimeTo of the previous row has to be fetched (no big deal as it is close) and Updated (big deal as it is an additional verb that can be avoided).
"Shorter" and "coding shortcuts" are irrelevant, SQL is a cumbersome data manipulation language, but SQL is all we have (Just Deal With It). Anyone who cannot code a subquery really should not be coding. Anyone who duplicates a column to ease minor coding "difficulty" really should not be modelling databases.

Note well, that if the highest order rule (Normalisation) was maintained, the entire set of lower order problems are eliminated.

Think in Terms of Sets

Anyone having "difficulty" or experiencing "pain" when writing simple SQL is crippled in performing their job function. Typically the developer is not thinking in terms of sets and the Relational Database is set-oriented model.
For the query above, we need the Current DateTime; since ProductStatus is a set of Product States in chronological order, we simply need the latest, or MAX(DateTime) of the set belonging to the Product.
Now let's look at something allegedly "difficult", in terms of sets. For a report of the duration that each Product has been in a particular State: the DateTimeFrom is an available column, and defines the horizontal cut-off, a sub set (we can exclude earlier rows); the DateTimeTo is the earliest of the sub set of Product States.

SELECT               ProductId,
                     Description,
        [DateFrom] = DateTime,
        [DateTo]   = (
        SELECT MIN(DateTime)                        -- earliest in subset
            FROM  ProductStatus ps_inner
            WHERE p.ProductId = ps_inner.ProductId  -- our Product
            AND   ps_inner.DateTime > ps.DateTime   -- defines subset, cutoff
            )
    FROM  Product       p,
          ProductStatus ps
    WHERE p.ProductId = ps.ProductId 
    AND   StatusCode  = 2             -- Request

Thinking in terms of getting the next row is row-oriented, not set-oriented processing. Crippling, when working with a set-oriented database. Let the Optimiser do all that thinking for you. Check your SHOWPLAN, this optimises beautifully.
Inability to think in sets, thus being limited to writing only single-level queries, is not a reasonable justification for: implementing massive duplication and Update Anomalies in the database; wasting online resources and disk space; guaranteeing half the performance. Much cheaper to learn how to write simple SQL subqueries to obtain easily derived data.

Categories

mysql - design database relating to time attribute