Social science research data curation: issues of reuse

Guangyuan Sun, Christopher Soo Guan Khoo



Data curation is attracting a growing interest in the library and information science community. The main purpose of data curation is to support data reuse. This paper discusses the issues of reusing quantitative social science data from three perspectives of searching and browsing for datasets, evaluating the reusability of datasets (including evaluating topical relevance, utility and data quality), and integrating datasets, by comparing dataset searching with online database searching. The paper also discusses using knowledge representation techniques of metadata and ontology, and a graphical visualization interface to support users in browsing, assessing and integrating datasets.


data curation; data reuse; search; browse; reusability assessment; data integration; knowledge representation; graphical visualization

Full Text:



Alexiev, V., M. Breu, and J. Bruijn. 2005. Information integration with ontologies: Experiences from an industrial showcase. Chichester: John Wiley %26 Sons.

Antezana, E., M. Kuiper, and V. Mironov. 2009. "Biological knowledge management: The emerging role of the Semantic Web technologies." Briefings in Bioinformatics 10: 392-407. doi: 10.1093/bib/bbp024

Baker, K. S., and L. Yarmey. 2009. "Data stewardship: Environmental data curation and a Web-of-Repositories." International Journal of Digital Curation 4: 12-27. doi: 10.2218/ijdc.v4i2.90

Baker, S. L., and F. W. Lancaster. 1991. The measurement and evaluation of library services. 2nd ed. Arlington, VA: Information Resources Press.

Bell, S.S. 2015. Librarian's guide to online searching: Cultivating database skills for research and instruction. 4th ed. Santa Barbara, CA: Libraries Unlimited.

Bibb, S. C. G. 2007. "Issues associated with secondary analysis of population health data." Applied Nursing Research 20, 2: 94-99.

Birnholtz, J. P., and M. J. Bietz. 2003. "Data at work: Supporting sharing in science and engineering." Proceedings of the 2003 International ACM SIGGROUP Conference on Supporting Group Work, 339-348. New York. doi: 10.1145/958160.958215

Bodenreider, O. 2008. "Biomedical ontologies in action: Role in knowledge management, data integration and decision support." Yearbook of medical informatics, 67-69.

Borgman, C. L. 1996. "Why are online catalogs still hard to use?" Journal of the American Society for Information Science (1986-1998) 47, 7: 493-503.

Bulmer, M., P. J. Sturgis, and N. Allum. 2009. The secondary analysis of survey data. Los Angeles: SAGE.

Carlson, S., and B. Anderson. 2007. "What are data? The many kinds of data and their implications for data reuse." Journal of Computer-Mediated Communication 12: 635-651. doi:10.1111/j.1083-6101.2007.00342.x

Costa, R., C. Lima, J. Sarraipa, and R. Jardim-Gonçalves. 2013. "Facilitating knowledge sharing and reuse in building and construction domain: An ontology-based approach." Journal of Intelligent Manufacturing 27: 1-20. doi: 10.1007/s10845-013-0856-5

Dayal, U., M. Castellanos, A. Simitsis, and K. Wilkinson. 2009. "Data integration flows for business intelligence." Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, 1-11. St. Petersburg, Russia. doi: 10.1145/1516360.1516362

Dehnhard, I., E. Weichselgartner, and G. Krampen. 2013. "Researcher's willingness to submit data for data sharing: A case study on a data archive for psychology." Data Science Journal 12: 172-180.doi:

Digital Curation Centre. 2016. What is digital curation? (website page).

Elis, D. 2005. "Ellis's model of information-seeking behavior." In Fisher et al. (Eds.), Theories of information behaviour, 138-142). Medford, NJ: Information Today.

Gutmann, M., K. Schürer, D. Donakowski, and H. Beedham. 2004. "The selection, appraisal, and retention of social science data." Data Science Journal 3: 209-221.doi:

Hakim, C. 1982. Secondary analysis in social research: a guide to data sources and methods with examples. London, UK: Allen and Unwin.

Hox, J. J., and H. R. Boeije. 2005. "Data collection, primary vs. secondary." Encyclopedia of social measurement 1: 593-599.

Hyman, H. H. 1972. Secondary analysis of sample surveys: Principles, procedures, and potentialities. New York: Wiley.

Law, M. 2005. "Reduce, reuse, recycle: Issues in the secondary use of research data." IASSIST Quarterly 29, 1: 5-10.

Markus, M. L. 2001. "Toward a theory of knowledge reuse: Types of knowledge reuse situations and factors in reuse success." Journal of Management Information Systems, 18, 57-94. doi:10.1080/07421222.2001.11045671

Neuman, W. L. 2005. Social research methods: Quantitative and qualitative approaches. London: Pearson.

Noy, N. F. 2004. "Semantic integration: A survey of ontology-based approaches." ACM SIGMOD Record 33: 65-70. doi: 10.1145/1041410.1041421

Pian, W., C. S. Khoo, and Y. K. Chang. 2016. "The criteria people use in relevance decisions on health information: An analysis of user eye movements when browsing a health Discussion Forum." Journal of Medical Internet Research 18, 6: e136. doi: 10.2196/jmir.5513

Pollack, C. D. 1998. "Methodological considerations with secondary analyses." Outcomes management for nursing practice 3: 4, 147-152.

Rice, R. 2009. DISC-UK DataShare Project: Final Report. Joint Information Systems Committee.

Rusbridge, C., P. Burnhill, S. Ross, P. Buneman, D. Giaretta, L. Lyon, and M. Atkinson. 2005. "The digital curation center: A vision for digital curation." Local to Global Data Interoperability: Challenges and Technologies, 31-41. doi: 10.1109/LGDI.2005.1612461

Rygielski, C., J. C. Wang, and D. C. Yen. 2002. "Data mining techniques for customer relationship management." Technology in society 24: 483-502. doi: 10.1016/S0160-791X(02)00038-6

Saracevic, T. 1997. "The stratified model of information retrieval interaction: Extension and applications." Proceedings of the American Society for Information Science 34: 313-327.

Schamber, L., M. B. Eisenberg, and M. S. Nilan. 1990. "A re-examination of relevance: Toward a dynamic, situational definition." Information Processing %26 Management 26: 755–775.

Shreeves, S. L., and M. H. Cragin. 2008. "Introduction: Institutional repositories: Current state and future." Library Trends 57: 89-97. doi: 10.1353/lib.0.0037

Singapore Department of Statistics. 2015. Singapore Standard Occupational Classification 2015.

Sun, G. Y. and C. S. G. Khoo. 2015. "Modeling questionnaire survey data to support data curation." Proceedings of the 6th International Conference on Asia-Pacific Library and Information Education and Practice (A-LIEP 2015), 196-211. Manila, Philippines, October 28.

Taylor, R. S. 1962. "The process of asking questions." American Documentation 13, 4: 391-396.

Tenopir, C., S. Allard, K. Douglass, A. Aydinoglu, L. Wu, E. Read, M. Manoff, and M. Frame. 2011. Data sharing by scientists: Practices and perceptions. Plos One, 6, e21101. doi 10.1371/journal.pone.0021101

Wache, H., T. Voegele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. Hübner. 2001. "Ontology-based integration of information: A survey of existing approaches." Proceedings of IJCAI-01 workshop: Ontologies and information sharing, 108-117. Seattle, WA.

Walker, G., and J. Janes. 1999. Online retrieval: A dialogue of theory and practice. 2nd ed. Englewood, CO: Libraries Unlimited.

Walters, T.O. 2009. "Data curation program development in U.S. universities: The Georgia Institute of Technology example." International Journal of Digital Curation 3, 4: 83-92.

Wildemuth, B. M., and A. L. O'Neill. 1995. "The 'known' in known-item searches: empirical support for user-centered design." College and Research Libraries 56, 3: 265-281.

Witt, M. 2008. "Institutional repositories and research data curation in a distributed environment." Library Trends 57, 2: 191-201.

Zenk-Möltgen, W., and G. Lepthien. 2014. "Data sharing in sociology journals." Online Information Review 38: 709-722. doi:

Zimmerman, A. S. 2008. "New knowledge from old data: The role of standards in the sharing and reuse of ecological data." Science, Technology %26 Human Values 33: 631-652. doi: 10.1177/0162243907306704


Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Libellarium (Online). ISSN 1846-9213 © 2008


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.