Skip to content

Overview

DOCUMENTATION UNDER CONSTRUCTION

Matrix decomposition techniques are well-established and widely used for decades and common examples for decomposing a single matrix are the Eigendecomposition, LU decomposition, Cholesky decomposition, Singular Value Decomposition (SVD), or Non-negative Matrix Factorization (NMF).

Integrated matrix decomposition

Decomposing a single matrix can be very valuable to learn the underlying structure of a data source. However, often multiple data sources describing different properties on the same set of entities are available. Integrated matrix decomposition makes use of these overlapping sets of entities and aims to reveal more interesting aspects in a decomposition of data sources than single matrix decomposition would be capable of.

Layout description

Views

Views are abstractions for observational units or other views on the data, such as data types, layer index, time steps, and so on. Typically, they are represented by integers or strings, however, it is allowed to use any hashable type.

Each input data matrix is associated with two primary entities, a row view and a column view. It is possible for a data matrix to be associated with additional entities, such as a layer view in a tensor-like layout.

Note

Additional entities are used to organize the input data and allow, e.g., repeated observations of the same row/column view combination. Data integration is however only performed for row and column entities.

A single view is associated with type Entity, which is either a str or int. solrCMF then uses the type alias ViewDesc, short for view description, to describe view relationships. A ViewDesc is simply a tuple of two or more entries of type Entity.

Examples of view relationships

The following examples are valid view relationship descriptions:

(0, 1), (10, 2), (1, 2)
Integers can be used as convenient short-hands for views.

("A", "B"), ("genes", "samples")
Strings can provide additional descriptions to the views.

("x", "y", "channel"), (0, 1, "a", "01:12"), (0, 1, "a", "10:50")
More than two views can be specified, where additional views are used to provide additional context for a data source, e.g., to integrate repeated measurements of a view relationship.

Important

Strings and integers can be used to represent views. It is important that every appearance of view 0, say, represents the same view, no matter at which position in the ViewDesc tuple it appears. For example, in (0, 1) and (5, 0) the 0 represents the same view within a data layout. This allows, e.g., for a view to appear in the rows of one data source, but in the columns of another.

Layouts

A layout is a collection of view descriptions and can be seen as a Python list containing entries of type ViewDesc.

Example layout

A simple multi-view layout can be described as

layout: list[ViewDesc] = [
    ("user", "datatype1"),
    ("user", "datatype2"),
    ("user", "datatype3", "layer1"),
    ("user", "datatype3", "layer2"),
]

Defining a layout establishes relationships between views and indirectly also defines which views are present in a collection of data sources.