Research Article

Classification of processes involved in sharing individual participant data from clinical trials

[version 1; peer review: 1 approved, 2 approved with reservations]
PUBLISHED 01 Feb 2018
This article is included in the Research on Research, Policy & Culture gateway.


Background: In recent years, a cultural change in the handling of data from research has resulted in the strong promotion of a culture of openness and increased sharing of data. In the area of clinical trials, sharing of individual participant data involves a complex set of processes and the interaction of many actors and actions. Individual services/tools to support data sharing are available, but what is missing is a detailed, structured and comprehensive list of processes/subprocesses involved and tools/services needed.
Methods: Principles and recommendations from a published data sharing consensus document are analysed in detail by a small expert group. Processes/subprocesses involved in data sharing are identified and linked to actors and possible services/tools. Definitions are adapted from the business process model and notation (BPMN) and applied in the analysis.
Results: A detailed and comprehensive list of individual processes/subprocesses involved in data sharing, structured according to 9 main processes, is provided. Possible tools/services to support these processes/subprocesses are identified and grouped according to major type of support.
Conclusions: The list of individual processes/subprocesses and tools/services identified is a first step towards development of a generic framework or architecture for sharing of data from clinical trials. Such a framework is strongly needed to give an overview of how various actors, research processes and services could form an interoperable system for data sharing.


clinical trial, data sharing, individual participant data (IPD), process, business process model, generic framework


AAI, Authentication and Authorisation Infrastructure; API, Application Programming Interface; ATT, The Open Science and Research Initiative; BPMN, Business Process Model and Notation; BRIDG, Biomedical Research Integrated Domain Group; CDISC CDASH, Clinical Data Interchange Standards Consortium - Clinical Data Acquisition Standards Harmonization; CDISC ODM, Clinical Data Interchange Standards Consortium - Operational Data Model CDISC SDM, Clinical Data Interchange Standards Consortium - Study Design Model; COMET, Core Outcome Measures in Effectiveness Trials; CORBEL, Coordinated Research Infrastructures Building Enduring Life-science Services; CRUK, Cancer Research UK; DOI, Digital Object Identifier; ECRIN, European Clinical Research Infrastructure Network; ID, Identity; IPD, Individual Participant Data; IT, Information Technology; MRC, Medical Research Council (UK); PCROM, Primary Care Research Object Model; QA, Quality Assurance; UK, United Kingdom; UKCRC, UK Clinical Research Consortium; US, United States; WHO, World Health Organization


In recent years, a cultural change in the handling of research data has resulted in the strong promotion of a culture of openness and increased sharing of data. Many organisations, initiatives and projects have expressed their commitment to support open scientific research. This move has been extended also to clinical trials. Today, the results of clinical trials are more and more considered as a public good, and access to the individual participant data (IPD) generated by those trials is seen as part of a fundamental right to health data (see Research Councils UK principles on data policy).

To support data sharing in clinical trials, several organisations have developed generic principles, guidance and practical recommendations for implementation in recent years (e.g. the Institute of Medicine report in the US1, the Nordic Trial Alliance Working Group on Transparency and Registration for the Nordic countries2, the good practice principles for sharing IPD from publicly funded trials by MRC, UKCRC, CRUK and Wellcome, in the UK3,4, or the guide to publishing and sharing sensitive data for Australia5). Within the EU Horizon 2020 funded project CORBEL (Coordinated Research Infrastructures Building Enduring Life-science Services) and coordinated by the European Clinical Research Infrastructure Network (ECRIN), an interdisciplinary and international stakeholder taskforce reached a detailed consensus on principles and recommendations for data sharing of clinical trial data6. That document was taken as the starting point for the current paper.

Data sharing of IPD from clinical trials involves a complex set of processes and the interaction of many actors and actions. Some documentary support is available, (e.g. templates for data sharing plans, data transfer and data use agreements), but this is scattered and thus not always easy to find. In addition, although some IT-tools and services are available to give support for individual tasks in the process of data sharing (e.g. de-identification service for datasets; see Electronic Health Information Laboratory page on de-identification software) or an ID-generation service for study objects), these are again difficult to discover and their quality is not easy to explore. An additional aspect of complexity stems from the very heterogeneous set of repositories that are available for storage of IPD (see Registry of Research Data Repositories). There are general scientific repositories, repositories dedicated specifically to clinical research, repositories specialised in storing data related to a specific disease area and institution-specific repositories. In summary, although fragments of infrastructure are available to support sharing of IPD from clinical trials, the various services and tools are scattered and a global vision of how all these components should interact and interoperate does not currently exist.

What is still missing is a generic framework or architecture for data sharing that could be used for modelling, describing, and designing operations, data requirements, IT-systems and technological solutions (see Open Group TOGAF® framework). Such a framework would link structural concepts (e.g. actors) with behavioural concepts (e.g. processes linked to services) giving an overview of how actors, processes and services interact to form a system for data sharing of IPD. Due to its complexity with many different processes and actors, such a framework is not available at the moment. As a first step in creating such a framework, in this paper we provide a systematic, structured and comprehensive list of processes/ subprocesses linked to data sharing derived from our CORBEL consensus document.


Recommendations and principles from the data sharing consensus document were analysed in detail and individual processes/subprocesses identified and linked to actors and possible services/tools by a small group of experts (CO, SC, RB, WK, SB). The consensus document covers all stages of the data sharing life cycle and is highly structured, with 7 main topics, 10 principles assigned to these topics and 50 specific recommendations, making the analysis process relatively straightforward6. The specification of processes/subprocesses, actors and services/tools was agreed between the experts in telephone conferences and by written communication, and summarized in a table with listings.

In the next step, possible services/tools associated with single processes/subprocesses were analysed and grouped according to different types of support, preserving reference to the processes/subprocesses specified in the first step.

The following definitions were adapted from the business process model and notation (BPMN) and applied to our analysis (see Object Management Group page, 7):

Process:              A sequence or flow of activities in an organization with the objective of carrying out work (see Object Management Group page).

In this study, processes may relate to different organisations and business goals, e.g. the various activities of the data generators, data storage managers and secondary users all represent different business processes, operating at different times by different actors.

Subprocess:        A process that is included within another process (see Object Management Group page)

Actor:                 Some person or organization taking part in day-to-day business activity (see Object Management Group page)

Actors are belonging to or have a relationship with the clinical trial arena. Actors include: investigators, trial unit heads, QA-staff, senior data management and IT-staff, trial unit operational managers, statisticians, sponsors, trial management team, specialist agencies, repository managers, analysis environment providers, secondary users of data, data use advisory panel, research infrastructures, journal publishers, patient representatives, and funders. Definitions of actors have been taken from the glossary in the consensus document6 and some from the CDISC-glossary.

Service:              A service is a functional business entity that fulfils a particular requirement (see Open Science and Research framework)

Services/tools may be relatively non-technical (e.g. providing information, example materials, template policies and procedures, assessment criteria, metadata, and infrastructure specifications) or technical, i.e. information technology based. The technology required may be conventional (e.g. webpages, web-based information systems) and already available (though would need normally need specific organisation and application). Other services/tools may require specialist software development (e.g. development of an analysis environment, developing systems to support metadata repositories).

Subservice:        A subservice is a special case of a service (see Open Science and Research framework)

To keep things as simple as possible, processes were structured according to the main activities within data sharing of IPD and then further differentiated with respect to subprocesses. For every process the involved actors and possible tools/services are linked.

For graphical illustration, the BPMN approach was used. In BPMN, a process is depicted as a graph of flow elements, which are a set of activities, events, gateways, and sequence flow that adhere to a finite execution semantics. The usual BMBP notation and symbols were taken (event, activity, gateway, connections, swim lane) (see Object Management Group page). In this publication, BPMN is used only to give a high-level overview on the relation between the main processes.


From the analysis of the consensus document 9 main processes involved in data sharing of IPD were identified:

1. Preparation for data sharing, in general

2. Plan for data sharing, in the context of a specific trial

3. Preparation of data for sharing, after data collected

4. Transferring data objects to an external repository

5. Repository data and access management

6. Access to individual participant data and associated data objects

7. Discovering the data objects available

8. Publishing results of re-use

9. Monitoring data sharing

Process 1 to 5 can be summarized under the heading “Data preparation and storage”, the processes 6-9 under the heading “Data request and secondary analysis”. The relationship between the main processes is presented in Figure 1.


Figure 1. Overview on the main processes in sharing of IPD.

The main processes were structured further into more detailed processes/subprocesses and linked to actors involved and possible services/tools. As result a detailed and comprehensive list of individual processes/subprocesses involved in data sharing is given in Table 1.

Table 1. Listing of processes, actors and possible services/tools in sharing of IPD from clinical trials.

ProcessSubprocessActorsPossible Services/ToolsSubservices/tools
1. Preparation for data sharing, in general
1.1 Learn about individual
participant data (IPD) and data
object sharing1, 3.
1.1.1 Learn about policies, requirements,
implications, options, resources, etc.
Investigators, Trials unit2 heads,
operational managers
Education service (web pages,
videos, courses, texts etc.)
1.1.2 Become aware of repositories
available for data sharing, features, pros
and cons, costs, etc.
Investigators, Trials unit heads,
operational managers
Web based information sources
on repositories, published surveys,
repository quality assessments
1.2 Clarify own institution’s
requirements for data sharing
Trials unit heads, operational
1.3 Develop local SOPs and
related quality documents
supporting aspects of IPD and
data object sharing
1.3.1 Develop procedures governing
data sharing planning and procedures
within a trial.
Trials unit heads, QA staff, operational
managers, senior data management
and IT staff
Example SOPs and proformas
1.3.2 Develop procedures and libraries
to promote the use of data standards in
database and metadata design.
Trials unit operational managers,
statisticians, senior data management
and IT staff
Links to standards and associated
resources. Example local
procedures Libraries of re-usable
2. Plan for data sharing, in the context of a specific trial
2.1 Decide the strategy for data
sharing for this trial.
2.1.1 Explore options for data sharing
(considering datasets, timeframe, funder,
planned journal, costs, etc.)
Sponsors, with Trial management
team and network of investigators
Checklist of issues that need to
be considered, with supporting
material, option descriptions
2.1.2 Check funder requirements for data
Trial management team
2.1.3 Decide the strategy and specific
actions required for data sharing
Sponsors, with Trial management
Checklist of issues that need to
be considered, with supporting
material, option descriptions
2.2 Document the strategy for
data sharing for this trial
in trial documents
2.2.1 Incorporate data sharing details
within the data management plan
Trial management teamExample DMP sections, with supporting material
2.2.2 Incorporate data sharing summary
in section of the protocol
Trial management teamExample protocol sections
2.2.3 Incorporate data sharing summary
within trial registration data
Trial management teamExample registry data sections
2.3 Incorporate information
on data sharing plan into
participant documents of
clinical trials
2.3.1 Summarise and explain data
sharing plan in patient information
Trial management teamGuidance on legislation framework
– Demonstration material,
templates, examples
2.3.2 Include request for broad consent
for data sharing in informed consent
Trial management teamDemonstration material, templates,
2.4 Check and align data
sharing plans of collaborators
who are also generating data.
2.4.1 Ensure any plans to publish
collaborators‘ data (e.g. lab data) are
compatible with plans for clinical IPD
Trial management teamExamples of possible issues (e.g.
with expectations of publishing lab
data, increased re-identification
2.4.2 Ensure all collaborators have
contributed to and have agreed to data
sharing plans.
Trial management teamExamples of possible processes,
policies, to agree and document
data sharing across collaborators.
2.5 Ensure that data and
metadata standards have been
used as far as possible in
database design
Trial management teamLinks to standards and associated
resources. Libraries of re-usable
3. Preparation of data for sharing, after data collected
3.1 Decide upon strategy for
data preparation for sharing
3.1.1 Decide if (further)
pseudonymisation or anonymisation
Trial data management and IT staffGuidance on interpretation of
legal requirements in different
3.1.2 Assess risk of re-identification
with existing datasets. Decide on de-
identification required.
Trial data management and IT staff, specialist de-identification agenciesDe-identification/anonymisation
service for datasets
3.2 Carry out strategy for data
3.2.1 De-identify, and pseudo-anonymise
or anonymise dataset for data sharing
Trial data management and IT staff,
specialist de-identification agencies
service for datasets
3.2.2 Select file formats for data and
metadata and transform if necessary
Trial data management and IT staffFile formats recognised as standard
3.2.3 Update (or generate/transform)
metadata and reference to datasets
Trial data management and IT staff,
specialist metadata agencies
Specialist services for generating
3.3 Document data preparation
3.3.1 Assess and document risk of re-
identification with revised datasets
Trial data management and IT staff,
specialist de-identification agencies
service for datasets
3.3.2 Incorporate record of data
preparation and risk assessments within
Trial data management and IT staffMetadata scheme for describing
de-identification and data
preparation processes
4. Transferring data objects to external repository
4.1 Select repository (within
institutional constraints)
4.1.1 Explore repository features,
management, access options, costs,
Sponsors with trial management teamData repository identification
service including assessment
against quality criteria, standards,
certification process for repositories
4.2 Transfer the datasets
under a formal data transfer
4.2.1 Agree on access regime, data
sharing decision processes, assignment
of responsibilities including data
controller role
Sponsors with trial management teamChecklists to support data transfer
4.2.2 Agree on responsibilities for
generating discoverability metadata
Sponsors with trial management teamChecklists to support data transfer
4.2.3 Draw up and agree data transfer
agreement, including provision if
repository disappears
Sponsors with trial management teamTools for generating data transfer
4.2.4 Apply discoverability metadata to
datasets and transfer data
Trial data management and IT staff
and/or repository staff
Metadata schemas for data object
discoverability; tools for their
4.3 Monitor repository and
status of datasets transferred
to it.
4.3.1 Periodic checking of repository
systems and financial health, certification;
receiving reports of status, access
requests and related processes
Sponsors with trial management teamReporting services provided by
5. Repository data and access management
5.1 Maintain highly granular
access control to IPD, that can
be changed rapidly
5.1.1 Maintain access control that allows
individual files to be designated as either
a) publicly available, without user
identification, to download or simply view.
b) available only to self-identified named
individuals, to download or simply view
(may be managed on a group basis).
c) available only to named individuals,
as identified by data controllers, to
download or simply view (may be
managed on a group basis).
Repository managersAuthentication and authorisation
(for definition see 2) at the end of
the table)
Logging services
5.1.2 Maintain access control that can
include 2-factor authentication with either
(b) or (c) in 5.1.1.
Repository managersAuthentication and authorisation
2 factor authentication
systems, logging
5.2 Maintain mechanisms to set
up and apply authentication
and authorisation
5.2.1 Provide web based forms that
allow users to provide details about
themselves, with some degree of
validation (e.g. email confirmation, cross
reference to other AAI architectures).
Repository managersAuthentication and authorisation
tools, validation mechanisms
5.2.2 Provide appropriate log-in pages,
with password management
Repository managersAuthentication and authorisation
5.3 Provide a protected
temporary analysis
5.3.1 Provide requested datasets
(possibly imported from other
repositories) in a designated analysis
environment, along with analysis and
recording tools.
Repository managers, analysis
environment providers
The analysis environment itselfData import and
logging tools Analysis
tools and services
Workflow recording
5.4 Supply discovery data
for IPD and data objects on
a regular basis to metadata
(for definition see 1) at the end
of the table)
5.4.1 Allow regular (e.g. nightly)
harvesting of metadata that conforms or
can be mapped to a general schema for
discovery metadata, through an API.
Repository managersSchema for discovery metadata,
API for making it available from
each repository
5.5 Provide an expert advisory
5.5.1 Where the data transfer
agreements stipulate it allow the advisory
panel to process and filter requests, and
recommend or take decision on data
Repository managers
5.6 Provide data request forms5.6.1 For data that requires them, post
data request forms for users to complete.
Repository managersTemplate and example data use
5.7 Provide data use
agreement templates
5.7.1 The templates allow potential users
to see the information they will need to
provide, (for data that requires them),
and the conditions to which they will
need to conform.
Repository managersTemplate and example data use
5.8 Provide usage reports to
data depositors
5.8.1 if not already provided by the
request process, regular (e.g. quarterly)
reports on access made, by whom and
reasons given.
Repository managersReport services maintained by
repository managers
6. Access to individual participant data and associated data objects
6.1 Manage direct responses
to the sponsors or coordinating
investigators, in case no legal
sponsor is available (data not
yet in a repository)
6.1.1 Decide upon the possibility, in legal
terms, of making the data available to
others at all.
Sponsors and trial management teamGuidance on interpretation of
legal requirements in different
jurisdictions, for different levels of
6.1.2 Assess the reasonableness of the
request and the ability of the requesters
to draw sensible conclusions
Sponsors and trial management team
6.1.3 Assess the costs of de-identifying
the data, preparing metadata, etc.
Sponsors and trial management teamData on costs in data preparation
Research papers
investigating this area
6.1.4 Make a final decision as to whether
to share the data with the requester.
Sponsors and trial management team
6.1.5 Draw up a data use agreement and
transfer the data under its terms
Sponsors and trial management teamExample data use agreements
6.2 Manage access to data in
a repository (if access requests
individually reviewed)
6.2.1 Repository makes appropriate
request forms available on-line
Repository managersAvailable forms on line (see 5.6)
6.2.2 Request forms completed and
submitted (on or off-line)
Secondary users
6.2.3 (If stipulated in data transfer
Request passed to advisory panel for
assessment and recommendation,
otherwise to data controllers
Sponsors or Advisory panel
6.2.4 (If stipulated in data transfer
Decision to allow request made, by
advisory panel/repository if stipulated in
data transfer agreement, otherwise by
data controllers
Sponsors or Advisory panel, or
repository managers
6.2.5 If positive decision, data use
agreement drawn up and agreed
Sponsors or Advisory panel,
Repository managers, Secondary
6.2.6 Data access arranged after liaison
with repository managers
Sponsors or Advisory panel,
Repository managers
Pipeline for quick processing of
access change requests
6.2.7 Access request and decision
Sponsors or Advisory panel,
Repository managers
Recording systems for request and
7. Discovering the data
7.1 Agree a common metadata
7.1.1 Publish and agree a standard that
can be used for discovery metadata, or
to which existing standards can map.
Repository managers, metadata
repository managers
The metadata scheme itself
7.2 Agree an ID generation
scheme for data objects
7.2.1 Develop, cost and implement a
mechanism for generating persistent Ids
(e.g. DOIs) for data objects.
Repository managers, metadata
repository managers
The ID generation mechanism itselfExisting mechanisms
integrated where
7.3 Agree an ID generation
scheme for clinical studies
7.3.1 Implement an ID generation and/or
assignment process for clinical studies
Repository managers, metadata
repository managers, registry
managers, WHO (etc!)
The ID generation mechanism itselfExisting Ids, e.g.
registry Ids, integrated
where possible
7.4 Collect metadata together
into a public metadata
repository, under a single portal
7.4.1 Collect existing metadata samples
and sources into a prototype metadata
Metadata repository managersThe ID schemes above, existing
public repositories, WHO registry
data, cross-ref, etc.
7.4.2 Maintain the metadata by arranging
regular harvesting (e.g. nightly, using API
and metadata scheme)
Metadata repository managersThe metadata scheme from 7.1
7.4.3 Develop a single portal for
searching through the metadata
Metadata repository managers
7.4.4 Federate additional metadata
sources under the same portal
Metadata repository managers
7.5 Search for the data objects
concerned with a trial or
clinical study
7.5.1 Search for particular study data
objects using study identifiers, name,
and/or object identifiers. Receive data on
location and access details.
Secondary usersThe metadata portal described in
8. Publishing results of re-use
8.1 Carry out secondary use
and publish results
8.1.1 Publish re-analysis, preferably
open (e.g. peer reviewed journal)
8.1.2 If successful, ensure proper citation
of data and credit to data generators.
Agreed schemes for citation and
credit for data
8.1.3 Whether or not published in a
journal, publish summary results and
relevant datasets – usually in source
8.1.4 Apply metadata to new data
objects, ensure harvesting into metadata
Metadata scheme for
9. Monitoring data sharing
9.1 Gather and disseminate
data on data requests (where
explicit requests are required).
Repository managers, research
infrastructures, publishers, patient
representatives, funders, etc.
Web site on which to display
collected data
Data collection tools based on
agreed API
9.2 Gather and disseminate
data on reasons for request
refusal (where explicit requests
are required).
As 9.1Web site on which to display
collected data
9.3 Gather and disseminate
data on data accesses,
downloads etc.
As 9.1Web site on which to display
collected data; Data collection tools
based on agreed API
9.4 Attempt to monitor products
of secondary use (papers,
datasets etc.).
As 9.1Web site on which to display
collected data

1Data objects: any discrete packages of data in an electronic form – whether that data is textual, numerical, a structured dataset, an image, film clip, (etc.) in form. They are each a file, as that term is used within computer systems, and are named, at least within their source file system. In the context of clinical research and data sharing, data objects can include electronic forms of protocols, journal papers, patient consent forms, analysis plans, and any other documents associated with the study, as well as datasets representing different portions and types of the data generated, and the metadata describing that data.

2Authentication: The process of ensuring that a person or system that is trying to access a system is who they say (it says) they are. With a person, authentication is by provision of one or more of something only they should know (e.g. a password), or should have (e.g. a card or fob), or can show (e.g. fingerprint, iris pattern). With a system it is more often by provision of a secret token (in effect a machine password), often derived from public key cryptography.

Authorisation: The process of giving an authenticated entity the rights to access particular subsets of data and/or to carry out particular functions within a system. It is usually carried out by assigning user entities to roles and to groups that together define the access allowed.

In Table 2, possible services/tools associated with processes are grouped according to major types of support, preserving reference to the processes/subprocesses. As the table illustrates, these tools and services fall into 6 (overlapping) categories:

1. Providing general background material

2. Locator services (for resources for data sharing, and / or to support data standards)

3. Example documents and templates

4. Services (e.g. to de-identify data, assign IDs, provide metadata, evaluate repositories)

5. Frameworks and guidance (e.g. metadata schemas, citation systems, checklists)

6. Tools (IT based, e.g. APIs to harvest repository contents, tools to assign metadata)

Table 2. Classification and description of possible tools/services to support processes in sharing IPD from clinical trials.

Type of service/toolDescription/commentsReference to
process (Table 1)
1. Providing general background material
Providing general
background material
Collection of relevant resources about data sharing in general – e.g.
   •   Links to papers and relevant policy documents from an annotated bibliography,
   •   Summary documents (e.g. built around recent consensus paper) and web pages
   •   Glossary of terms
   •   Links to general educational and training resources provided elsewhere
   •   Courses, webinars, books using materials above
   •   Meetings, conference sessions looking at aspects of data sharing and related topics
   •   Advice to citizens, ethics committees
2. Locator services
Locator service for data
sharing resources
Resource identification – Especially of
   •   Repositories for storage of datasets and other data objects, and their facilities, terms of service etc.
   •   Services to aid in de-identification
   •   Provides information on the applicable legal framework
   •   Provides model agreements templates that can be adapted to meet the particular circumstances of data sharing
Locator service for data
standards resources
Annotated Links to
   •   Repositories of standard data items, e.g. within CDISC’s CDASH, CFAST.
   •   Repositories of standard data instruments, e.g. CDISC QRS (questionnaires, ratings and scales)
   •   Metadata schemes
   •   Core outcome sets (e.g. COMET)
1.3.2, 2.5
3. Example documents
Example documents
supporting data sharing
   •   Example SOPs,
   •   Supporting relevant checklists, forms
Covering all aspects of data sharing, e.g.
   •   during study preparation, or as part of long term data management, in the context of pre-defined collaborations,
or when handling requests for access.
   •   Use of data standards in study design
   •   Use of metadata for data description, data object discovery
   •   Examples of data sharing policies (universities, research institute)
   •   Examples of data sharing requests from funders or journals
1.3.1, 2.1,
2.4.2, 2.5
Example data sharing
documents (trial set up)
Examples of possible
   •   Sections of a protocol
   •   Sections of a Data Management Plan
   •   Trial Registry sections
   •   Participant information sheets
   •   Consent forms
   •   Proformas, for agreements with collaborators
   •   Proformas, for using lab and genetic data
All dealing with aspects of planning for data sharing and publication plans, available as a central resource. These
could then be used / adapted in the context of individual trials.
2.2.1, 2.2.2,
2.2.3, 2.3.2,
Example data sharing
documents (data transfer)
Examples / templates of possible
   •   Data transfer agreements
   •   Relevant sections of a Data Management Plan
   •   Checklists for the data transfer process
4.4.1, 4.2.2
Example data sharing
documents (data re-use)
Examples of
   •   Data request forms
   •   Data use agreements
   •   Checklists to support the development of a data use agreement
Any central resource holding such material should also provide a rationale for their structure and contents.
5.6, 5.7, 6.1.5
4. Services
De-identification /
anonymisation service for
There are four possible services here –
   •   Resources that allow trials units to develop their own de-identification/anonymisation processes (if compliant with
   legal considerations).
   •   Consultancy input to advise on de-identification in the context of a particular trial
   •   Services that carry out and document a de-identification process on behalf of the sponsor / trials unit
   •   Service for assessment of risk of re-identification
3.1.2, 3.2.1,
3.3.1, 3.3.2
Descriptive metadata
services for datasets
To be useful (easily searchable, comparable etc.) the descriptive metadata of the data needs to be in a standard
format, or one of a few recognised standard formats (e.g. CDISC ODM). Mechanisms and / or services to convert
proprietary metadata descriptions into such a format could therefore be useful when required.
3.2.3, 3.3.2
Assessment / certification
service for data repositories
Provision of a set of standards, that can be used to assess the suitability of any repository as a location for data object
storage, would act as a useful guide to the potential users of those repositories.
The further application of such standards within a certification scheme
An ID assignment
mechanism for data objects
An ID (e.g. doi) generation service is required for all stored data objects.7.2
A common pipeline for
processing access requests
With the possibility of many different data repositories emerging storing clinical datasets, there is potential advantage
from making the application, review, decision making process for each very similar (e.g. using common application
proformas) or even managing those processes together, e.g. with a common expert advisory board.
This could ultimately create a common ‘request pipeline’.
Recording and reporting
systems for data access
requests and episodes
Reports that could be provided by repositories include
   •   Level and type of data object deposition
   •   The types of data access arrangements in place
   •   Numbers and types of access requests
   •   The decisions reached and reasons for rejections
Data objects generated as a result of data re-use.
5.8, 6.2.7, 9.1
Provision of a prototype
metadata repository
A metadata repository, (or a
portal linked to multiple such repositories) with discovery metadata for clinical trial data
objects, is seen as a fundamental requirement if data sharing is going to work efficiently.
7.4, 7.5
Service for provision of a
secure analysis environment
Based on tools to provide an analysis environment for in-situ work (see below).5.3
5. Frameworks and guidance
The development of a
discovery metadata schema
Agreement is needed on a common discovery metadata standard that can link data objects to studies and that can
describe the access mechanisms associated with each.
Proposals have been made, based on an existing scheme (DataCite) but need further development.
4.2.4, 5.4, 7.1
The development of an
agreed scheme for citation
of re-use
There needs to be a universally recognised scheme that will allow fair credit for the re-use of data, in terms of
academic citation and recognition.
Legal and regulatory
As the legal and regulatory environment continues to evolve, there is a need for updating the data sharing resources
(e.g. templates, legal database, procedures). Similarly, the researchers and data managers has to be informed of any
relevant changes in laws, regulations and data sharing policies/resources.
Such a service could usefully be a central resource. It could not be a legal service as such (i.e. answering specific
questions) but it could provide a general framework for guidance.
2.3.1, 3.1.1,
Checklist to decide the
strategy for data sharing
Checklist of issues to be considered of data sharing, with supporting material, option descriptions2.1
Checklist to support
specification of agreements
Checklist to support development of data transfer agreement/data use agreement4.2
Manual to establish boards
overseeing data sharing
Manual for advisory panel/board6.2.4
6. Tools
Tools to support the
application of discovery
metadata scheme
A tool is required to allow the easy application of the metadata schema used to characterize data objects, ideally by
the object generators and if not by repository managers.
This would likely take the shape of a set of web based forms, linked to a central repository.
4.2.4, 8.1.4
Tools for de-identification /
anonymisation service for
See de-identification / anonymisation service for datasets above,3.1.2, 3.2.1,
3.3.1, 3.3.2
Authentication and
authorisation systems for
repository access
Highly granular access is needed (at the level of individual users / individual data objects) to support the variety of
controlled access mechanisms likely to be required in repositories
5.1, 5.2
Provide an analysis
environment for in-situ work
Interest has been expressed in a mechanism that allows data to be examined, re-analysed, aggregated etc. without
being downloaded first, but instead kept within a secured, tailored, ‘analysis environment’, which also contains the
analysis tools required. In fact several different types of tools would be required, for:
   •   Environment creation (e.g. as a container)
   •   Data import and logging
   •   Authentication and authorisation
   •   Analysis
   •   Workflow recording
   •   Environment destruction
APIs to access repository
catalogue data (for
metadata aggregation)
When discovery data is not (or has not been) directly transferred to a central repository using the tools described
above, it will be necessary to try and ‘harvest’ metadata from data repositories on a regular basis.
Using APIs that give access to the repository catalogues is a key part of that, and is much cheaper than trying to use
‘data mining’ techniques, e.g. natural language parsing on data object titles, to link data objects to studies.
Tools for generation of data
transfer agreements/data
use agreements
Software tools supporting the development of data transfer/data use agreements (if compliant with legal


Within the framework of the EU H2020 funded project CORBEL major issues associated with sharing of IPD were investigated and a consensus document on providing access to IPD from clinical trials was developed, using a broad interdisciplinary approach6. The taskforce reached consensus on 10 principles and 50 recommendations, representing the fundamental requirements of any framework used for the sharing of clinical trials data. To support the adoption of the recommendations, adequate tools and services are needed to promote and support data sharing and re-use amongst researchers, adequately inform trial participants and protect their rights, and provide effective and efficient systems for preparing, storing, and accessing data. As a first step on the way to inventory existing tools/services, their quality and applicability for data sharing, a systematic analysis of processes and actors involved in data sharing was performed. The work done resulted in a systematic, structured and comprehensive list of processes/subprocesses that need to be supported to make data sharing a reality in the future. It is basic work against which existing tools/services can be mapped, and gaps, where new tools/services are needed, can be identified.

In the context of this work, we explored the possibility of generating a generic framework for the sharing of IPD from clinical trials. As an example we considered the Framework for Open Science and Research by ATT (see Open Science and Research framework). This framework provides a general description of the desired architecture in a domain of open science. The framework configures and defines the key structural elements of the overall solution. It gives an overview of how various actors, research processes and services – including data, data structures, actors, roles and IT-systems – could form an interoperable system in the ‘target’ open state. The Enterprise Architecture (EA) approach is used, modelling, describing and designing operations, data requirements, IT-systems and technical solutions in accordance with a common model. The work done in developing a framework for open science and research could be of major relevance for a similar model in the area of data sharing. At this stage, of trying to basically structure processes/subprocesses involved in data sharing, it was seen as too early to develop a generic framework. It may, however, be that this approach is taken up again when the basic work has been done and the components for such a framework have been identified.

Nevertheless, we thought it useful to use a standardised terminology and notation for describing basic processes in data sharing. This will simplify the extension to a more generic and comprehensive framework at a later stage. As one approach, business modelling has been applied successfully in the health and health research area. It has been used, for example, to perform a requirements analysis of the barriers to conducting research linking of primary care, genetic and cancer data7, to model the complexity of health and associated data flow in asthma8 and to provide a generic architecture for a type 2 diabetes mellitus care system9. We decided not to apply the full spectrum of business process modelling (BPMN), but to use only basic elements to give a notational and terminological basis for further work. This does not imply, however, that the application of the full spectrum of BPMN techniques is a necessary step in developing an overall framework. More work is needed to explore the suitability and benefit of BPMN for a generic framework for data sharing.

Different models for clinical trials and clinical trials workflow already exist, such as the domain analysis model BRIDG10, the study design model CDISC SDM11 and the primary care information model PCROM12. Any framework or model for data sharing needs to map or reference these clinical trial models, though none currently include the secondary use of data after the trial has completed. Although clinical trial processes and data sharing processes are distinct, they are clearly linked, and any models need to incorporate those linkages. As a consequence, developing a generic framework or architecture for data sharing needs much more work and is not covered in this paper.

Many of the services/tools identified in this paper are non-technical but nevertheless may be of major importance, especially for data generators and data requestors. This includes templates/examples, checklists and guidance. For some of the processes specified in this paper IT-tools and services already exist and can be applied (e.g. de-identification tools and services, see Electronic Health Information Laboratory page on de-identification software), others are under development or need improvement (e.g. metadata repository for identifying clinical trial objects, 13). The next step is to perform a scan on the availability and suitability of services/tools for data sharing based on this work, with the involvement of stakeholders. We will summarize this information in a separate report.

Data availability

All data underlying the results are available as part of the article and no additional source data are required.

Comments on this article Comments (0)

Version 2
how to cite this article
Ohmann C, Canham S, Banzi R et al. Classification of processes involved in sharing individual participant data from clinical trials [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2018, 7:138
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
Open Peer Review

Current Reviewer Status: ?
Version 1
PUBLISHED 01 Feb 2018
Reviewer Report 20 Mar 2018
Matthias Löbe, Institute for Medical Informatics, Statistics and Epidemiology (IMISE), Leipzig University, Leipzig, Germany 
Approved with Reservations
In this paper, Ohmann et. al. perform a detailed analysis of steps required to share patient microdata from clinical trials with the research community. They provide a process diagram describing the workflow of preparing, transferring and maintaining the data and ... Continue reading
Löbe M. Reviewer Report For: Classification of processes involved in sharing individual participant data from clinical trials [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2018, 7:138
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
    In this paper, Ohmann et. al. perform a detailed analysis of steps required to share patient microdata from clinical trials with the research community. They ... Continue reading
    In this paper, Ohmann et. al. perform a detailed analysis of steps required to share patient microdata from clinical trials with the research community. They ... Continue reading
Reviewer Report 19 Mar 2018
Matthew R. Sydes, MRC Clinical Trials Unit at UCL, Institute of Clinical Trials and Methodology, University College London, London, UK 
Approved with Reservations
This process-orientated manuscript covers a lot of ground in some detail. I have some specific comments:
  1. Section: General
    Comment: The process of reaching these recommendations is unclear to me. Perhaps
... Continue reading
Sydes M. Reviewer Report For: Classification of processes involved in sharing individual participant data from clinical trials [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2018, 7:138
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
    This process-orientated manuscript covers a lot of ground in some detail. I have some specific comments:
    1. Section: GeneralComment: The process of reaching these
    ... Continue reading
    This process-orientated manuscript covers a lot of ground in some detail. I have some specific comments:
    1. Section: GeneralComment: The process of reaching these
    ... Continue reading
Reviewer Report 01 Mar 2018
Florian Naudet, CHU Rennes, Inserm, CIC 1414 (Centre d'Investigation Clinique de Rennes), University of Rennes 1, Rennes, France 
The manuscript Classification of processes involved in sharing individual participant data from clinical trials by Ohmann C, Canham S, Banzi R, Kuchinke W and Battaglia S1 is more than useful for all stakeholders interested in data sharing. It must be accepted with, ... Continue reading
Naudet F. Reviewer Report For: Classification of processes involved in sharing individual participant data from clinical trials [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2018, 7:138
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
    Response to the reviewer in bold and italics 

    The manuscript Classification of processes involved in sharing individual participant data from clinical trials by Ohmann C, Canham S, Banzi R, Kuchinke W and Battaglia ... Continue reading
    Response to the reviewer in bold and italics 

    The manuscript Classification of processes involved in sharing individual participant data from clinical trials by Ohmann C, Canham S, Banzi R, Kuchinke W and Battaglia ... Continue reading

Comments on this article Comments (0)

Version 2
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
