
Biodiversity Data Standards
Overview
Understand the international standards that enable biodiversity data sharing and interoperability. This course introduces Darwin Core, TDWG standards, and the data structures used by GBIF, FBIS, and other biodiversity information systems. Learners will gain practical insight into how standardised fields, vocabularies, and metadata improve data quality, make datasets discoverable, and support reliable re-use across platforms and projects.
What you will learn
- Why standards matter – the role of standardisation in biodiversity informatics and data sharing
- Darwin Core – core terms, extensions, and controlled vocabularies
- TDWG – Biodiversity Information Standards community, guidance, and resources
- Publishing formats – Darwin Core Archive, CSV structures, and APIs
- Taxonomy services – taxonomic backbones and name matching workflows
- Georeferencing – standards, coordinate uncertainty, and documentation practices
- Metadata – documenting datasets for discovery, interpretation, and reuse
- FBIS application – practical application of data standards to FBIS submissions
Course details
- Duration: 6–8 hours
- Prerequisites: Basic understanding of biodiversity data
- Language: English (Portuguese translation available)
- Teacher: Hugo Retief

Data Integration & Quality Control
Overview
Learn practical techniques for integrating data from multiple sources and ensuring data quality. This course focuses on the real-world challenges of combining biodiversity and hydrological datasets, including format conversion, duplicate detection, validation rules, and error correction. You’ll develop an “end-to-end” mindset for data pipelines—how to standardise inputs, apply repeatable checks, and produce trustworthy outputs suitable for analysis and decision support.
What you will learn
- Integration workflows – ETL (Extract, Transform, Load) processes and practical pipeline design
- Multiple formats – working with spreadsheets, databases, text files, and APIs
- Duplicates & matching – record matching approaches and duplicate detection methods
- Validation & QC – validation rules, automated checks, and common error patterns
- Taxonomic verification – name standardisation and reference matching workflows
- Spatial validation – coordinate checking, plausibility tests, and georeferencing correction
- Documentation – audit trails and transparent processing records for reproducibility
- Software tools – using OpenRefine, Excel, Python, and R for data integration tasks
Course details
- Duration: 8–10 hours
- Prerequisites: Biodiversity Data Standards
- Language: English (Portuguese translation available)
- Teacher: Hugo Retief

Environmental Data Analysis
Overview
Develop analytical skills for exploring and interpreting environmental datasets. This course covers statistical methods, visualisation techniques, and software tools commonly used in freshwater ecology and water resource assessment, with practical exercises using INWARDS and FBIS data. Learners will build confidence in working with messy real-world data, selecting appropriate analytical approaches, and communicating results clearly through charts, maps, and dashboards.
What you will learn
- Exploratory data analysis – summary statistics, distributions, and outlier detection
- Time series analysis – approaches for hydrological and water quality datasets
- Multivariate methods – ordination (PCA, NMDS) and clustering for ecological interpretation
- Species–environment relationships – correlation and regression concepts and applications
- Trend & change detection – trend tests and change point analysis
- Visualisation best practice – effective charts, maps, and dashboard communication
- Tools – R, Python, Excel, and QGIS workflows for environmental analysis
- Reporting – communicating analytical results effectively to diverse audiences
Course details
- Duration: 10–12 hours
- Prerequisites: Basic statistics knowledge; Data Integration & Quality Control (recommended)
- Language: English (Portuguese translation available)
- Teacher: Hugo Retief

Open Data & Licensing (CC0, Creative Commons)
Overview
Explore the principles and practices of open data in biodiversity and environmental science. This course covers licensing frameworks, data sharing policies, and the practical steps for publishing open datasets that can be accessed and reused by the global community. You’ll learn how to choose appropriate Creative Commons licences, meet open-data policy requirements, prepare robust metadata, and publish responsibly—especially when working with sensitive species information or community-held knowledge.
What you will learn
- Open data foundations – what open data is and why it matters for science and conservation
- IP and copyright – understanding intellectual property considerations for datasets
- Creative Commons licences – CC0, CC-BY, CC-BY-SA, and when to use each
- Policy requirements – JRS Biodiversity Foundation open data expectations
- Publication readiness – documentation, metadata, and dataset packaging
- Where to publish – GBIF, Zenodo, and GitHub as common platforms
- Citation & attribution – giving and receiving credit for data contributions
- Ethical practice – handling sensitive species data and indigenous knowledge responsibly
Course details
- Duration: 4–5 hours
- Prerequisites: None
- Language: English (Portuguese translation available)
- Teacher: Hugo Retief