Overview
DOCUMENTATION UNDER CONSTRUCTION
Matrix decomposition techniques are well-established and widely used for decades and common examples for decomposing a single matrix are the Eigendecomposition, LU decomposition, Cholesky decomposition, Singular Value Decomposition (SVD), or Non-negative Matrix Factorization (NMF).
Integrated matrix decomposition
Decomposing a single matrix can be very valuable to learn the underlying structure of a data source. However, often multiple data sources describing different properties on the same set of entities are available. Integrated matrix decomposition makes use of these overlapping sets of entities and aims to reveal more interesting aspects in a decomposition of data sources than single matrix decomposition would be capable of.
Layout description
Views
Views are abstractions for observational units or other views on the data, such as data types, layer index, time steps, and so on. Typically, they are represented by integers or strings, however, it is allowed to use any hashable type.
Each input data matrix is associated with two primary entities, a row view and a column view. It is possible for a data matrix to be associated with additional entities, such as a layer view in a tensor-like layout.
Note
Additional entities are used to organize the input data and allow, e.g., repeated observations of the same row/column view combination. Data integration is however only performed for row and column entities.
A single view is associated with type Entity, which is either a str
or int. solrCMF then uses the type alias ViewDesc, short for
view description, to describe view relationships. A ViewDesc is simply
a tuple of two or more entries of type Entity.
Examples of view relationships
The following examples are valid view relationship descriptions:
Important
Strings and integers can be used to represent views. It is important
that every appearance of view 0, say, represents the same view, no matter at
which position in the ViewDesc tuple it appears. For example, in (0, 1) and
(5, 0) the 0 represents the same view within a data layout.
This allows, e.g., for a view to appear in the rows of one data source,
but in the columns of another.
Layouts
A layout is a collection of view descriptions and can be seen as a
Python list containing entries of type ViewDesc.
Example layout
A simple multi-view layout can be described as
Defining a layout establishes relationships between views and indirectly also defines which views are present in a collection of data sources.