You are here
Frequently Asked Questions (FAQ) - Data Management
Please see Guidelines on FAIR Data Management in Horizon 2020 (http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/h...), section 4. ‘Research Data Management plans during the project life cycle’:
A funded project must submit a first version of its DMP (as a deliverable) within the first 6 months of the project. The DMP needs to be updated over the course of the project whenever significant changes arise, such as (but not limited to):
- new data
- changes in consortium policies (e.g. new innovation potential, decisions to file for a patent)
- changes in consortium composition and external factors (e.g. new consortium members joining or old members leaving)
As a minimum the DMP should be updated and detailed with each periodic report. Furthermore, the consortium can define a timetable for review in the DMP itself.
The DMP is intended to be a living document in which information can be made available in more detail through updates as the implementation of the project progresses and when significant changes occur.
If it is just safekeeping of actual data, any of the big IT companies will likely serve. In general there are (country specific) centers that are helpful or can serve as partners.
One approach is to also archive the software to access the data. That said raw data should be in a well described and documented format in such a form that at the very least one can access by reprogramming a tool in the worst case. Thus it is absolutely important to have good metadata and standards which describe standards. In such cases even new software versions can still access, work and use legacy formats (as an example microarray data from the late 90s are still accessible in 2017).
A DMP should be integrated into the WP (scientific or non-scientific) it fits best. Data management can be part of an already existing task or can form an own task, depending on the relevance of data management in your envisaged project.
Depending on (i) where the data is coming from, (ii) ‘data intensity’.
If the data is OpenData already and from a reliable provider (say data.gov.uk) it is probably fine to just cite this resources. In case of doubt one might want to at least (if copyright etc. allows this) capture the stream or the historical data and deposit this as a dataset. This is definitely possible for not too large data sizes. This has the added advantage, that the analysis based on such data could be reproduced with the exact same ‘primary data’ set. In such case it might thus be useful to archive the underlying data in either a catch all or discipline specific database.
Another case might be webservice that provide an analysis on your primary data or provide computational access to a large internal dataset. As such database might go away one might want to capture exact input and output. Some providers experiment with virtual machines and docker images now, which might be a future perspective for long term preservation.