IGC - Instituto Gulbenkian de Ciência
Instituto Gulbenkian de Ciência

Course home page

(online version)

Summary and lecture notes

Eduardo Dalcin, CNIP, Universidade Federal de Pernambuco

Richard White, Biological Sciences, Southampton University

Building and Managing Biological Collections Databases
30 September - 4 October 2002

Summary - Building and managing biological collections databases


Course tutors:

Richard White (www.soton.ac.uk/~rjwhite/), School of Biological Sciences (www.sobs.soton.ac.uk), University of Southampton, UK

Eduardo Dalcin (www.edalcin.com), Coordinator of Centro Nordestino de Informações sobre Plantas - CNIP (www.cnip.org.br), Associação Plantas do Nordeste / Universidade Federal de Pernambuco - UFPE, Recife, Brazil


Course structure

The course will consist of a series of alternating lectures,   practical exercises and discussion sessions.   The discussion sessions will be used to draw conclusions from the practical work, and the lectures will prepare students for the exercises and reinforce and supplement the information obtained from them.   Attendees will work in small groups on the practical exercises.

Summary of course content

·          Introduction to databases and their uses in biodiversity research

·          Practical design considerations, types of data, uses and users

·          Web-based databases and commercial database systems

·          Database design, choice of software, and implementation

·          Managing institutional data centres, data standards, data quality

·          Integration, web publishing, interoperability and networked database projects


Course plan


[MONDAY][TUESDAY][WEDNESDAY][THURSDAY][FRIDAY]

MONDAY morning

Introduction

10:00hs - Lecture 1:   What is a database?   (RW)

-         What is a database?

-         What are they used for?

-         database system components

-         database architectures

11:00hs - Coffee break
11:20hs - Practical Exercise 1a:  Web search for database characteristics

-         typical uses of databases in biodiversity research and for biological collections

-         essential characteristics of a database system

12:00hs - Discussion 1a:  Participants’ own application areas, projects, objectives and needs
12:30hs - Practical Exercise 1b:  Locate biodiversity information systems on the Web

-         tabulate for each one:

-         Organization, name, URL, objectives (what is it for?) and intended users

13:00hs – Lunch

 

MONDAY afternoon

14:30hs - Discussion 1b:   Presenting the findings on organisations, objectives and users
15:30hs - Lecture 2:   Biodiversity database systems (RW & ED)

-         biodiversity informatics

-         data level (nomenclators, checklists, species databases, specimen databases)

-         demonstrations of some systems on the Web

16:00hs - Coffee Break
16:20hs - Practical Exercise 2:  Searching for species information

-         attempt to discover some information about a small number of named plant (or animal) species

17:20hs - Discussion 2:   Problems with database systems

-         user interfaces

-         unreliability which arises from inadequate handling of synonyms and other deficiencies

18:00hs – Close

 

TUESDAY morning

Biodiversity data systems:  information content, uses and users

10:00hs - Lecture 3:   Biodiversity data types   (RW & ED)

-         nomenclatural data

-         curatorial data

-         geographical data, maps

-         descriptive data

-         images

-         bibliographic data

11:00hs - Coffee break
11:20hs - Practical Exercise 3:  Investigate selected biodiversity information systems

-         Select databases from exercise 1b which match your interests

-         add the following columns to your previous table: types of data contained, how it is presented, presence or not of complex search

-         evaluate data content and user interface with good and bad points

12:20hs - Discussion 3:  Usability of biodiversity information systems

-         data types found

-         user interface features (conclusions might include importance of good internal design, not immediately obvious from the user interface).  

-         does the database present the right information in the right way for the intended uses and users?

13:00hs – Lunch

 

TUESDAY afternoon

Database design

14:30hs - Lecture 4:   Data modelling and the relational model   (ED)

-         entities

-         ER diagrams

-         relational model

15:15hs - Practical Exercise 4:  Plan a database for a particular application

-         in groups of 4-6

-         decide the attributes and entities

-         attempt normalisation into an appropriate set of tables

-         use Access to produce a structure diagram (2 or 3 tables)

-         produce an ER diagram using PowerPoint

16:00hs – Coffee Break
16:20hs - Discussion 4:   Database designs

-         present the designs,

-         discuss their pros and cons

17:00hs - Lecture 5:   Models of species diversity information systems (RW)

-         the taxonomic core (synonymic indexing etc.)

-         data models (more detail on data standards later)


          

WEDNESDAY morning

Implementing a data management system

10:00hs - Lecture 6:   Defining needs (RW)

-         communicating with users (needs), potential suppliers (solutions)

-         why needed? - defining needs

-         for personal or institutional use

-         for managing a biological collection

-         for running a web-based biodiversity information system

10:30hs - Discussion 6a:   Users and uses

-         from your own experience and objectives, suggest users and uses

11:00hs – Coffee Break
11:20hs - Practical Exercise 6:  Evaluate existing biological database management systems

-         make a list of systems to evaluate

-         make a list of possible system features (needs, standards, etc.)

-         fill in a table of system characteristics

12:20hs - Discussion 6b:   DBMSs available for biodiversity databases
13:00hs – Lunch

 

WEDNESDAY afternoon

14:30hs - Lecture 7:   Setting up a database management system   (RW)

-         alternatives:   choose existing package or build new one (based on underlying dbms), using tools

-         different types of systems (stand-alone, client-server, web-based)

-         Difference between generic commercial packages (e.g. MS Access) versus specialised biological database packages (BG-Base, BG Recorder, Lucid, Alice,   etc.)

-         getting more information:  pointers to some existing systems, web sites with reviews, discussion lists, etc.

15:15hs - Practical Exercise 7:  Choice of database management software

-         draw up one or more specifications (sets of requirements for a project),

-         evaluate several possible DBMSs

-         for each specification, choose the best DBMS to meet the requirements

-         consider how it can be deployed or implemented

16:00 – Coffee Break
16:20 - Practical Exercise 7 (Cont.)
17:00hs - Discussion 7:   Choice of DBMS
18:00hs – Close

 

THURSDAY morning

Data Management

10:00hs - Lecture 8:   Data centre management (ED)

-         a case study of record systems for living collections

11:00hs – Coffee Break
11:20hs - Discussion 8a:   Discuss your own institution

-         aims

-         facilities

-         deficiencies

-         improvements

12:00hs – Research Talk – Linking Biodiversity Databases – Dr. Richard White
13:00hs – Lunch

THURSDAY afternoon

14:30hs - Practical Exercise 8:  Plan the implementation of an information system or data centre

-         Including resources, staff requirements, timetable, network infrastructure, etc.

15:30hs - Discussion 8b:  Implementation plans
16:00hs – Coffee Break
16:30hs - Lecture 9:   Data standards   (ED)

-         living collections (ITF, standard used by zoos)

-         herbariums (HISPID)

-         other TDWG standards

17:15hs - Practical Exercise 9:  Recognise and classify published biodiversity data standards
17:45hs - Discussion 9:   Using data standards:   adopt, adapt or develop?  
18:00hs - Close

 

FRIDAY morning

Publishing and networking biodiversity information

10:00hs - Lecture 10:   Data quality in biodiversity databases   (ED)

-         data integrity (levels of scrutiny, methods for “data cleansing”)

11:00hs – Coffee Break
11:20hs - Lecture 11:   Assembling and publishing species diversity databases on the Web  (RW & ED)

-         Linking and merging databases

-         Online databases, interfaces and gateways

-         Generating static or dynamic HTML pages (LegumeWeb versus AliceWeb approaches)

-         Introduce and demonstrate the output pages AliceWeb can generate   (ED)

12:00hs - Practical Exercise 11: Evaluate web-based information delivery systems

-         use the ILDIS web site to evaluate both AliceWeb and LegumeWeb

-         other systems

12:40hs - Discussion 11:   Pros and cons of static and dynamic web page generation
13:00hs - Lunch

FRIDAY afternoon

14:30hs - Lecture 12: Interoperability, networking and cooperative projects  (RW and ED)

-         BG networks

-         Australian Virtual Herbarium

-         GSDs

-         GBIF

-         e-Science and the GRID

15:15hs - Practical Exercise 12:  Investigate current research projects and test their prototypes

-         ILDIS ("intelligent linking")

-         Species 2000

-         GRID projects

16:00hs – Coffee Break
16:20hs - Discussion 12:  Conclusions and future steps
17:00hs - Close


Copyright © 2002 by Richard White, Eduardo Dalcin and the Instituto Gulbenkian de Ciência. All rights reserved.