Data Structures
Time 9.3 hrs

Difficulty Advanced
Prerequisites Data Processing
Departments Human Technologies
Authors Ross Parker
Groupings Individual
Minimum Year Group None


Data is great, but to be useful it really needs to be structured. In this unit you will learn how to use database technology to organise and query data.


This work is shared under the following license: Creative Commons BY-SA-NC


The Pitch
Why should I bother learning this?
  • Got data? Want to use it? Then you need to organise its storage in order to be able to query it.
  • This is essential if you want to build useful data-based systems, such as web and mobile apps.
What is needed to run this unit?
  • Laptop
  • Internet access
  • Database software (examples use MySQL under MAMP)
Interdisciplinary Links
Do not try and force this. What areas of other subjects might this reflect and/discuss language. For IB, links with ToK.
  • ...
Teacher Reflection
What was successful? What needs changing? Alternative Assessments and Lesson Ideas? What other Differentiation Ideas/Plans could be used?
  • ...
Any CC attribution, thanks, credit, etc.

This page requires you to be logged in to access it. Please login and try again.
5 mins
Got Data?
The Pitch

  • Got data?
  • Want to use it?
  • Then you need to organise its storage in order to be able to query it.
  • This is essential if you want to build useful data-based systems, such as web and mobile apps.

5 mins
Data vs Information
  • If you remember back to the Data Processing unit:
    • Data is raw facts and figures
    • Information is data organised so that it has some meaning and structure.
  • This unit is all about structuring data so that when the time comes to use it, turning it into information is easier and more efficient.
20 mins
What's A Database
  • In the Data Processing unit you used a spreadsheet to store and structure data. Spreadsheets are great for calculations and sorting, but they are slow and limited in their functionality.
  • Databases (strictly we are talking about relational databases, but we'll say database for short) are much more powerful and scalable than spreadsheets, and so are used to power massive, complex systems, such as Google's search database, Amazon's shop data, etc.
  • A database system consists of:
    • Server
      • This is the "database": it stores all the data, and accepts connections to access it.
      • In this unit we will use MySQL, as it is free, open source, powerful and very popular.
      • Other popular databases include Oracle DB and Microsoft SQL Server, but these are proprietary and expensive.
    • Client
      • Can connect to the database and access the data within it.
      • A username and password are generally needed to connect to a database.
      • Some clients are dedicated to managing the database, such as phpMyAdmin, which we will use in this unit.
      • Some clients are applications, that let users access the database, such as Gibbon's web based front end.
    • SQL
      • SQL stands for Structured Query Language
      • It is a common, standard way to give instructions to a database, whether the database happens to be MySQL, Oracle, MS or something else entirely.
      • An example SQL statement from Gibbon is:
        • SELECT surname, preferredName FROM gibbonPerson WHERE status='Full';
      • Once you understand SQL it reads like a simplified sentence.
      • SQL statements can become very complex!
        • SELECT
     AS term,
     AS activity,
              (SELECT GROUP_CONCAT(DISTINCT ORDER BY gibbonDaysOfWeek.sequenceNumber)
                  FROM gibbonActivitySlot
                  JOIN gibbonDaysOfWeek ON (gibbonActivitySlot.gibbonDaysOfWeekID=gibbonDaysOfWeek.gibbonDaysOfWeekID)
                  WHERE gibbonActivitySlot.gibbonActivityID=gibbonActivity.gibbonActivityID) as days,
     AS formGroup,
          FROM gibbonActivity
          LEFT JOIN gibbonSchoolYearTerm ON (gibbonActivity.gibbonSchoolYearTermIDList LIKE concat('%', gibbonSchoolYearTerm.gibbonSchoolYearTermID, '%'))
          LEFT JOIN gibbonActivityStudent ON (gibbonActivityStudent.gibbonActivityID=gibbonActivity.gibbonActivityID)
          JOIN gibbonPerson ON (gibbonActivityStudent.gibbonPersonID=gibbonPerson.gibbonPersonID)
          JOIN gibbonStudentEnrolment ON (gibbonStudentEnrolment.gibbonPersonID=gibbonPerson.gibbonPersonID AND gibbonActivity.gibbonSchoolYearID=gibbonStudentEnrolment.gibbonSchoolYearID)
          JOIN gibbonRollGroup ON (gibbonStudentEnrolment.gibbonRollGroupID=gibbonRollGroup.gibbonRollGroupID)
          WHERE gibbonActivity.gibbonSchoolYearID=(SELECT gibbonSchoolYearID FROM gibbonSchoolYear WHERE status='Current') AND active='Y' AND gibbonPerson.status='Full' AND gibbonActivityStudent.status='Accepted'
          ORDER BY term, activity, surname, preferredName
40 mins
Database Environment
Getting Ready
  • Use our Web Server Setup Guide to get a web server running on your Mac or Chromebook
  • Work out, using online search and trial and error, the following:
    • How to use phpMyAdmin (which is included in MAMP) on a Mac, or JackDB on a Chromebook, to access your MySQL database server.
    • The default username and password for MySQL under MAMP/LAMP.
  • Once you are in, you should see something like this:

140 mins
Example Database
Getting Started
  • Start by creating a database in phpmyadmin or JackDB
    • Call the database "sample"
    • Set the collation to "utf8_general_ci"
  • Import sample data by using your client's import feature and the file.
    • If this is successful, looking at the Structure tab of your database should show 2 tables with rows of data in them:

  • Now use the SQL tab to run the following SQL query (in light below), which displays all animals in the menagerie that are still alive, ordered by date of birth, with youngest first:
    • SELECT * FROM pet WHERE death IS NULL ORDER BY birth DESC;
    • Spend some time studying this query to make sure you understand what it does.
    • The words in CAPITALS are the commands and options, the words in lower case are specific to the sample database.
  • As you work with databases, certain terms are very important. Spend some time researching the following vocabulary list, to make sure you understand each term.
    • Table
    • Row
    • Column
    • Cell
    • Data type
    • Query
    • Result set
    • Relationship
    • Normalisation (often spelled normalization)
  • When a database is normalised, we can use JOIN queries to select data from multiple tables, as a single result set, such as in the example below, which you should run on your database
    • SELECT * FROM pet JOIN event ON ( WHERE death IS NULL ORDER BY birth DESC;
    • Notice that the result is a combination of both the pet and event tables.
    • Notice that some pets appear more than once, if they have made more than one visit.
  • Hopefully you are now wondering what happens if two pets have the same name.
    • This is a design flaw in the example database.
    • To overcome this, each pet should have a unique ID in the pet table (known as a primary key)
    • The event table should have a primary key as well, and should use the pet key to link back to the pet table (this is known as a foreign key).
    • Delete your sample database tables, and import the file.
    • Look at the design of your new tables.
    • Run the following SQL (notice the ON statement is different to the one above):
      • SELECT * FROM pet JOIN event ON (pet.petID=event.petID) WHERE death IS NULL ORDER BY birth DESC;
    • We can make the data here more useable by selecting only certain fields. Try to run this one:
      • SELECT pet.petID, name, owner, birth, eventID, date, type, remark FROM pet JOIN event ON (pet.petID=event.petID) WHERE death IS NULL ORDER BY birth DESC;
    • To improve this database we would want to have an "owner" table, which links to the event table in a one-to-many relationship.
    • One of the most important parts of normalisation is that we don't have the same information duplicated anywhere.
      • In the first sample database, the pet name was listed in the table pet and the table event.
      • To update the name, we need to update multiple rows in multiple tables, and risk errors.
      • In the better sample database, the pet name was only listed in the table pet.
      • We can can update the name much more effectively and reliably.
      • This is the power of normalisation.
280 mins
Your Own Database
Hands On
  • You now need to do the following:
    • Create a new database to represent something in your life (e.g. items in your wardrobe and a log of when you use them).
    • Create the tables you need in your database, making sure to:
      • Use the correct data types to store certain kinds of information (e.g. date for purchase data, decimal for price, etc).
      • Use primary and foreign keys to establish relationships.
      • Normalise your design to the greatest extent possible, avoiding duplicate data.
    • Populate your tables with some realistic data
    • Write some SQL queries to turn your data into information.
  • THIS IS DIFFICULT! Ask your teacher for help, and show them you progress as you go.
  • In SQL JOIN is different from LEFT JOIN.
    • Can you work out what the difference is?
    • Can you work out cases where we want LEFT JOIN, and cases where we want JOIN?
  • Use online documentation to learn more about SQL. It can do a lot of different things.
70 mins
Finishing Up
  • Make a screencast and add a voice over, describing your database structure and the decisions you have made.
  • Include a section where you run some SQL you have written to query your data, and explain what the SQL does.
  • Add a title and CC license, before exporting and submitting the work.
There are no records to display.
Powered by Gibbon v26.0.00dev | © Ross Parker 2010-2023
Created under the GNU GPL at ICHK | Credits | Translators