Catalog¶
The catalog component supports the search for available data contracts. Information about data contracts can be exchanged between participants without the use of a catalog by sending the offer directly via a separate channel (e-mail, notification). A catalog will be a common component to implement data discoverability. It can be implemented as a managed service by one or more selected participants, hosted by the data space governance authority, or operated in a fully decentralized fashion by every participant that offers data contracts (see the visual representation of various implementation designs of the DSGA in Creating a Data Space). The type of catalog architecture used depends on the design of the data space as well as the needs and capabilities of the participants.
Hybrid catalog models combining central and distributed catalogs with individual decentralized catalogs are possible, but must be carefully designed to avoid unnecessarily increasing the complexity of participating in the data space.
Catalog(s)¶
Sharing data among participants requires the provision of metadata -- regardless of the design of the data space (centralized, federated, or decentralized) and whether the data is open or protected. Information about the data needs to be published with an agreed-upon vocabulary for querying and with controls that regulate access to the catalog items.
Two participants can share data directly communicating off- or online without the need for a catalog. But for more participants a catalog function greatly increases the discoverability of data assets and services. If there is more than one catalog due to a federated or decentralized design, the catalog must allow federated searches of data assets in catalogs at multiple sites.
Catalogs don't provide the data asset itself, but they provide data contract offers (more on this in the section on data sharing below).
When choosing a target architecture for a data space, the design of the catalog function can fall somewhere along the spectrum between a central catalog, multiple federated catalogs, and many decentralized catalogs. Each has its own advantages and disadvantages. Compare the three main types of catalogs, depending on the implementation design of the DSGA, to evaluate their capabilities:
| Catalog architecture | Advantages | Disadvantages |
|---|---|---|
| Centralized catalog | No deployment by individual participants | A central gatekeeper can arbitrarily exclude participants and their data from the catalog |
| Central control – a gatekeeper can regulate which entries are permissible and which are not | Single point of failure | |
| Easy discovery as only one catalog needs to be queried | Potential performance bottle neck | |
| Security issues will affect all members at once | ||
| Federated catalog | Deployment by a limited number of participants, while most participants don’t need to deploy any catalog components | Additional replication mechanisms are needed |
| Federated control – voting mechanisms for content control can be implemented | A small group of operators of federated catalog nodes can control participation in the data space | |
| Decentralized catalog | Every participant can autonomously decide which catalog items they share with whom | Every participant needs to run a catalog component |
| No interference in the interaction between two participants through a 3rd party | A list of available catalogs needs to be either provided through the DSGA or discoverable through a peer-to-peer protocol. The DSGA should specify the chosen catalog architecture and justify any centralized or federated choices, documenting the associated trade-offs and the mitigations used to preserve participant autonomy and neutrality. | |
| Data Space as a whole is more resilient towards cyberattacks even though individual members can experience outages | Participants need to crawl each other’s catalogs to see which items are available | |
| Easier to scale |
Access policies¶
A best practice of access security is for an IT system to show users only what they need to know - to minimise the potential attack surface. The same is true for data contract offers (DCO) in a data space: Participants should only see the DCOs for which they are authorised to request a contract negotiation. This does not imply that the participant already has authorisation for the data but only that a participant is allowed to see that the data exists. The permission to access is part of the data contract negotiation. Any catalog must implement attribute-based access control (ABAC) through access policies.
The most common access filter is that a participant proves membership to see which assets are in a data space. Filters can also be applied that make data assets accessible only to specific participant groups. For example, a participant who has a VC as a data space member, but also has an additional VC which attests that the participant is an auditor, could provide this participant access to audit log files or streams which are being shared as DCOs, but should not be visible to participants without the special auditor credentials.
In case a participant wants to make a DCO visible to other entities that are not participating in the data space and are merely using the technical mechanisms of the data space or have been directly informed about the existence of those DCOs, they could have an access policy which is simply a no-op, or allow-all policy.
Access policies can also be used as filters to control visibility/access to DCOs. For example, time-based policies can be used to control when DCOs can be negotiated, location-based policies can limit the audience to participants from a specific geographic region.