Linking barcode sequence data from Barcode of Life to GBIF

GBIF.no has started to link barcode sequence data from Barcode of Life Datasystem (BOLD) to datasets from the Norwegian university museums.

iBOL

DNA barcoding is a powerful tool for species identification from any type of biological tissue. DNA barcodes are linked to voucher specimens in scientific collections (museums) and deposited in a public, open-access database (BOLD).

NorBOL is a network of Norwegian biodiversity institutions and individual scientists engaged in DNA barcoding of the fauna and flora of Norway. The Norwegian University Museums provide biological tissue samples from museum specimens. The Norwegian GBIF-node has now linked the first museum specimens in GBIF to their BOLD sequence data.

Linking barcode data from BOLD to specimens in GBIF has a high priority in the current GBIF work-plan. The GBIF Science Committee (represented by chair Rod Page) published in December 2016 a snapshot of the iBOL dataset (doi:10.15468/inygc6) including a total of 2,789,906 occurrences, including 39,956 occurrences from Norway. However, the link to the museum specimens themselves was not maintained.

Examples from the GBIF iBOL 2016 dataset:

 

Linking Norwegian university museum specimens to BOLD

Together with the NorBOL coordinator at the UiO Natural History Museum in Oslo, Gunnhild Marthinsen and Lars Erik Johannessen at the NHMO DNA Bank, we have now started to link the specimens at NHMO together with the corresponding BOLD DNA barcode data records. The most reliable specimen identifier in GBIF is the dwc:occurrenceID. There is also the traditional and (more) human readable dwc:catalogNumber identifying a museum specimen. The BOLD Process ID is the most important identifier for material samples corresponding to the museum specimens. BOLD also provide a "Museum ID" and a "Sample ID" however, nether match exactly the occurrenceID or the catalogNumber in GBIF.

GBIF BOLD
occurrenceKey = 1426521030 Process ID = NOBAS010-14
occurrenceID = urn:catalog:O:F:75130 Museum ID = O-F-75130
catalogNumber = 75130 Sample ID = O-F-75130
eventID/fieldNumber = [blank] Field ID = MY1-0568
GBIF API: 1426521030 BOLD API: NOBAS010-14

 

BOLD
http://bins.boldsystems.org/index.php/Public_RecordView?processid=NOBAS010-14
API: http://www.boldsystems.org/index.php/API_Public/sequence?ids=NOBAS010-14

 

To establish the link, Gunnhild provided a table with only two columns including the BOLD Process ID and the catalogNumber. Based on this list we used the BOLD API to extract data from BOLD. Information from BOLD was mapped to the Darwin Core MeasurementOrFact extension and added to the Museum Specimen specimen published in GBIF. The BOLD Process ID identifiers will be registered in the national Norwegian Museum collection management system (MUSIT) to allow for automatic mapping of all the Norwegian Museum Collections to BOLD in a similar manner. With this pilot we linked 390 fungi specimens to BOLD, and we plan to implement automatic routines for mapping all Norwegian museum collections in GBIF to BOLD.

 

GBIF Portal
URL: https://www.gbif.org/occurrence/1426521030
API: http://api.gbif.org/v1/occurrence/1426521030/verbatim

 

Some of the other examples:


Join us to discuss the mapping of terms from the BOLD API at GitHub!

https://github.com/GBIF-Europe/bold_sequence


Barcode data in GBIF:


Further reading:

 

Tags: GBIF, BOLD, iBOL, DNA sequences
Published Nov. 3, 2017 1:39 PM - Last modified Nov. 4, 2017 2:46 PM