In today’s data-driven environment, the way we collect and manage data has fundamentally changed. Within the field of data curation, data collection is no longer just about gathering information for immediate use. Instead, it is understood as a structured process of creating data that is well-documented, reusable, and suitable for long-term preservation (Johnston, 2017). This shift is important because it positions data as a lasting resource rather than a temporary research output. Similarly, data repositories are not just storage spaces; they are managed digital platforms designed to preserve, organise, and provide access to data over time (Borgman, 2015). Together, these concepts form the backbone of modern research and information management.
What becomes clear is that the quality of any repository is directly shaped by how data is collected. Traditional methods such as surveys, interviews, and observations are still widely used, but in a curation context, the focus goes further. Data must be accurate, consistent, and accompanied by sufficient documentation to support reuse. As Borgman (2015) argues, data that lacks proper documentation often becomes unusable, regardless of where it is stored. This suggests that effective data collection requires forward-thinking researchers who must anticipate future users, not just immediate needs.
At the same time, repositories play a critical role in ensuring that collected data remains accessible and meaningful. They provide the infrastructure for long-term preservation and enable data sharing, which is essential for transparency and collaboration. A central feature of repositories is metadata—information that explains the context, structure, and content of datasets. Without metadata, data can quickly lose its value because it becomes difficult to interpret or locate (Tenopir et al., 2020). This reinforces an important point: data management does not begin when data is stored, but at the moment it is created.
The rise of open science has further strengthened the relationship between data collection and repositories. The FAIR principles – Findable, Accessible, Interoperable, and Reusable – highlight the need for data to be both well-managed and widely accessible (Wilkinson et al., 2016). However, achieving these principles depends on both high-quality data collection and effective repository systems. If one is weak, the entire system is compromised.
Despite these advances, challenges remain. Many institutions struggle with inconsistent standards, limited technical capacity, and concerns around data ownership and privacy (Borgman, 2015). These issues highlight a key insight: improving repositories alone is not enough. Equal attention must be given to how data is collected and prepared for long-term use.
In conclusion, data collection and repositories are deeply interconnected. Data collection determines the quality and usability of data, while repositories ensure its preservation and accessibility. Viewing them as separate processes limits their effectiveness. Instead, they should be understood as parts of a continuous system that supports sustainable knowledge creation and sharing in the digital age.
Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world. MIT Press.
Johnston, L. R. (2017). Curating research data: Practical strategies for your digital repository. Association of College and Research Libraries.
Tenopir, C., Rice, N. M., Allard, S., Baird, L., Borycz, J., Christian, L., Grant, B., Olendorf, R., & Sandusky, R. J. (2020). Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide. PLOS ONE, 15(3), e0229003.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., et al. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3, 160018.
This is so great. Big up boss.
ReplyDeleteThis is good work.I love how you have blended digital curation and data collection into a concept that can easily be understood
ReplyDelete