A couple of weeks back I wrote a post about repositories for all the data that users generate these days and what to do with it. (See our post on Data banks, deposits, … data economy – part 1).
This past week I read an article (see ScienceDaily Genetics of brain structure … article) which partially exemplifies what that post talked about. The research used publicly available genetic information to tease out brain structure hereditary characteristics. The Science Daily article was a summary of research done at the University of Oxford using information provided from the UK Biobank.
Biobank as a data bank
The Biobank has recruited 500K participants from the UK, aged 40-69, between 2006-2010, to share their anonymized health data with researchers and scientist around the world. The Biobank is set up as a Scottish charity, funded by various health organizations in UK both gov’t and private.
In addition to information collected during the baseline assessment:
- 100K participants have worn a 24 hour health monitoring device for a week and 20K have signed up to repeat this activity.
- 500K participants are providing have been genotyped (DNA sequencing to determine hereditary genes)
- 100K participants will be medically scanned (brain, heart, abdomen, bones, carotid artery) with images stored in the Biobank
- 100K participants have signed up to receive questionnaires asking about diet, exercise, work history, digestive health and other medical indicators..
There’s more. Biobank is linking to electronic health records (EHR) of participants to track their health over time. The Biobank is also starting to provide blood analysis and other detailed medical measures of subjects in the study.
UK Biobank (data bank) information uses
“UK Biobank is an open access resource. The Resource is open to bona fide scientists, undertaking health-related research that is in the public good. Approved scientists from the UK and overseas and from academia, government, charity and commercial companies can use the Resource. ….” (from UK Biobank scientists page).
Somewhat like open source code, the Biobank resource is made available to anyone (academia as well as industry), that can make valid use of its data BUT any research derived from its data must be published and made freely available to the Biobank and the world.
Biobank’s papers page documents some of the research that has already been published using their data. It lists the paper on genetics of brain study mentioned above and dozens more.
Differences from Data Banks
In the original data bank post:
- We thought data was only needed by AI/deep learning. That seems naive now. The Biobank shows that AI/deep learning is not the only application/research that needs data.
- We thought data would be collected by only by hyper-scalars and other big web firms during normal user web activity. But their data is not the only data that matters.
- We thought data would be gathered for free. Good data can take many forms, and some may cost money.
- We thought profits from selling data would be split between the bank and users and could fund data bank operations. But in the Biobank, funding came from charitable contributions and data is available for free (to valid researchers).
Data banks can be an invaluable resource and may take many forms. Data that’s difficult to find can be gathered by charities and others that use funding to create, operate and gather the specific information needed for targeted research.
Photo Credit(s): Bank on it by Alan Levine