Entity Relationship Diagrams

  • Data Modeling

    Data modeling is the process used to define data requirements needed to support the business processes within organizations.

    During data modeling, data requirements are initially recorded as a conceptual data model, which is a represents the structure of the database and ignores the specific implementation of it (i.e. we don't care about the type of software or operating system we will be using to actually run the database). The conceptual data model is constructed through collaboration with the business stakeholders to reflect the initial business requirements and structure that the data must support.

    After a conceptual data model is created, it may be mapped to a physical data model that the organization will use. This section is focused on the creation of conceptual data models, but it does include some aspects of a physical data model.

  • ERD Basics

    Learning how to use ERD's is useful for two reasons:

    • Large organizations with complex data needs have professional data modelers that must work closely with a business user (you) in order to create a system that truly meets your needs. Research has demonstrated repeatedly that customer (i.e. business user) involvement in the conceptual creative process leads to better systems. IT systems are one of the largest expenses in an organization, so success is critical. Conceptual data models are one example of a communication channel between the "IT" side of the organization and the "business" side. A diagram is something universal - a language in itself that serves as a translation of requirements between the two sides.
    • If you can understand how data is stored conceptually in a database, then you will also understand how to extract the data from the physical database. If you understand how to extract the data yourself, then you won't have to rely on IT support to get it.

    The formal diagramming standard used to create conceptual data models is called an entity-relationship diagram (ERD). It includes specific standardized notations to indicate each piece of data that needs to be included in the system as well as how the data should be organized and separated to maintain the highest quality (e.g. maximizing the speed of data access and reducing data redundancy). You need to learn how to translate a set of information system requirements into an ERD. ERDs include only three main components: attributes, entities, and relationships. Instances are also mentioned here, but they are not part of an ERD.

  • Attributes

    Attributes (a.k.a. "Properties") are the properties of entities that an organization cares about. Examples of attributes include Color, Employment Date, Name, and Social Security Number. You'll notice that the videos above referred to these as "properties." Multiple names referring to the same thing is not uncommon.

    Domain of an Attribute - A domain is the set of valid values an attribute can have. Domains are important for keeping the quality of data high in a database. Examples of domains include integers between 200 and 399, real numbers with two decimal places, state abbreviations, and so on. Typically, database controls are applied to limit data entry to acceptable domain values. For example, "July 11, 1993" is an acceptable value for Employment Date but "Blueberry Pancakes" is not. There are three main forms of domain specification:

    • Type - examples include integer, real number, date, Currency, text, and boolean
    • Range - GPA has a domain of real numbers between 0 and 4.0. Hourly Wage may be limited to values between the minimum wage and some practical upper limit.
    • Specifically Identified Values - Specifically listed, finite set of values such as state names (e.g., Alabama, Alaska, Arkansas, etc.) or days of the week (Sunday, Monday, etc.)
  • Instances

    An instance is a single row of information containing data for each attribute in a table. Look at the table of employees below:

    Functional Areas

    Each individual employee is an actual instance that we can touch, photograph, and pay. When a conceptual database is mapped to a physical database, a separate table is typically created to represent each entity. Each record within the table represents one instance.

  • Entities

    Entities are persons, places, or things about which an organization wishes to save information. Employee, Order, and Time Sheet are all examples of entities. The table in the "instances" section represents the entity "employee". The singular form of a name is used for entity names. In database design, an entity is a logical concept in the mind of the analyst that represents a category of objects that have common characteristics or attributes that we want to store. For example, when we think about police detectives we have a concept--a logical abstraction--in our mind that represents a type of person who investigates crimes. As a mere concept, an entity, like “Detective”, is not something that we can physically touch, photograph, or buy.

    An analyst typically starts by asking what types of things (entities) the client needs to store information about. The analyst and client agree on a name, in noun form, for each entity of interest such as student, building, customer, vehicle, invoice, product, etc. When we label entities, we use the singular tense because each entity represents a category. For example, we would label an entity “Employee” instead of “Employees” because we know that Employee is a category of people who work for an organization.

  • Relationships

    A relationship in ERDs is simply how two entities are related. For example, how is a customer related to an order? Relationships can be described using cardinality. Cardinality refers to the number of instances in one table that can correspond to one or more instances of a related entity. For example, how many orders can a customer have, and how many customers can be on a single order? Maximum Cardinality is expressed as simply "one" or "many." For example, a customer can have many orders, while an order (at the grocery store for example) can only belong to one customer. Two entities have only three possible maximum cardinality combinations in their relationship:

    1. One-to-One (1:1) - An instance of entity 'A' can relate to one and only one instance of entity 'B,' and an instance of 'B' can relate to only one instance of 'A.' For example, a husband can have only one wife, and a wife can have only one husband.
    2. One-to-Many (1:M) - An instance of entity 'A' can relate to one or many instances of entity 'B,' but an instance of 'B' can relate to only one instance of 'A.' For example, a mother can have many biological children, but a child can have only one biological mother.
    3. Many-to-Many (M:M) - An instance of entity 'A' can relate to one or many instances of 'B,' while an instance of 'B' can relate to one or many instances of 'A.' For example, an uncle can have many nephews, and a nephew can have many uncles.

    The figure below shows the most popular way to represent cardinality. A simple line represents the cardinality of one and a “crow's foot" represents the cardinality of many. This is the diagramming convention for cardinality that we will use.

    Table

    Determining Maximum Cardinality

    When determining the cardinality of a relationship, it is important to think of one instance of the entity (i.e. one customer) under consideration in relation to the other entity (i.e. Order). Use this phrase to talk through each cardinality:

    At maximum, a [FIRST ENTITY NAME] can have [ONE or MANY] [SECOND ENTITY NAME]

    Let us show you how to do this. For example, at maximum a employee can have one position. Once you do it in one way, reverse the entities and say it again. At maximum, a position can have many employees.

    ERD Cardinality

    Determining Minimum Cardinality

    Minimum Cardinality is expressed as simply "one" or "zero." It is also necessary to establish minimum cardinalities. Will we ever store a product record if the product has never (or at least not yet) been ordered? Yes, because we have to store products before they can be ordered. Determining minimum cardinalities works the same way as maximum cardinalities, but the words change slightly:

    At minimum, a [FIRST ENTITY NAME] can have [ONE or ZERO] [SECOND ENTITY NAME]

    Let us show you how to do this. At minimum, a employee can have one position. Once you do it in one way, reverse the entities and say it again. At minimum, a position can have zero employees. We may store positions in our database that aren't currently filled by an employee, but we are assuming that if we have hired an employee that they must have a position. Determining minimum (and maximum for that matter) uses intuition in this class. Wheh you design a database, the needs of the client will likely address these problems.

  • Keys

    All entities must include a primary key. A primary key (PK) is an attribute (or attributes) used to uniquely identify an instance of an entity. The value of the key must be unique for every instance; that value must belong to one and only one row. By enforcing the rule that each instance has a unique value for its identifier, confusion is avoided that would occur if two or more instances had the same primary key value (i.e. two employees with the same ID number). Were this to occur, the user would not know which record to update if information changed about an instance. Primary keys have several characteristics:

    1. Simple vs. Composite - A simple key contains a single attribute. For example, your ID number is 34059323.

      A composite key is a key that contains more than one attribute. For example, Product 13423 was purchased by Customer 1242 (this is mainly used in M:M relationships)

    2. Semantic vs. Artificial - A semantic or "natural" key is a key for which the possible values have an obvious meaning to the user of the data. For example, we may store country as a three letter code, like "USA." The value 'USA' has meaning to the user. Another way to determine whether a key is a semantic key is that it has meaning outside of its use as being a primary key.

      An artificial key has no obvious meaning to the user of the data. It only serves the purpose of uniquely defining each row. Typical examples would be ORDER_ID or INVOICE_ID. The value '12345' has no meaning to the user (it is not a semantic key) as it conveys no inherent information (such as a commonly used name or type) about the entity to which it relates.

    3. Serialized vs. Virtual - Serialized keys are used when it is necessary to track each instance of an entity individually such that each individual instance is assigned a unique identifier. This is best seen in comparison to virtual. For example, car manufacturers and car retailers track each individual car by a serialized vehicle identification number (VIN). The model, make, and year of a car are insufficient to track a unique instance. The VIN differentiates one instance from another instance. For example, there can be multiple 2012 white Toyota Corollas, but each will have a different VIN.

      When a virtual key is used, there is no need to track an individual instance by unique identifier. Instead, the same identifier is assigned to all instances of that entity. Consider boxes of breakfast cereal like Cheerios and Frosted Flakes. For all practical intents and purposes, one instance of a box of Cheerios need not be distinguished from another instance. the same model number is assigned to all instances because there is no need for the seller to serialize and track individual instances. This is important when you determine relationships. For example, if you consider all relationships between a single instance and another entity, you may run in to trouble.

      Quickly listen to this example that assumes all instances are serialized: At maximum, an apple can be found at one store. At maximum, a store has many apples. In reality, an apple is part of a group. This changes how we talk through the relationship; instead of saying an apple, say apples.

      Returning to the cereal example, there is no need to track a specific individual physical box that is sold by a buyer or purchased by a customer. Since a box of cereal is a virtual instance, every box is assigned the same UPC and each instance of a box is not assigned a unique number. A specific customer can buy many types of cereal (Cheerios, Frosted Flakes, etc.) on one order and a specific type of cereal, e.g., Cheerios, can be purchased by many different customers. So instead of a 1:M relationship between customer and cereal, the cardinality is M:M (see Figure below). In addition, a cereal will be recorded in the database before it is ever sold. Therefore, the minimum cardinality is zero. However, an order will not be placed unless there is at least one product on it. Therefore, the minimum cardinality (assuming that cereal is the only thing sold at the store) would be one. Watch the video below for a further explanation.

    Foreign Keys

    A foreign key is an attribute that relates, links, or ties, an instance in one entity to an instance in another entity. In other words, a foreign key is a copy of the primary key in another entity. In the employee table above, we could add another column that lists off each employee's position. It would look like this:

    ERD Cardinality

    That single number represents each employee's position, which is related to the primary key on the position table. The position table includes all of the attributes of that position and has the information for each position stored in each instance. This helps with data integrity, because we don't have to list out all of the position attributes for each person. If we need to update the data, we only need to go to the position table and it will be changed for every person.

    1:M

    Foreign keys behave differently depending on the type of relationship between two entities. In a 1:M relationship (the most simple example), the foreign key will always go the table on the "M" or many side. With employees and positions, at maximum a position can have many employees, so the employee gets the foreign key.

    1:1

    In 1:1 relationships, it is considered ok to put the key on either table. There is a best practice that is ususally followed in industry, however. The minimum cardinalities can be used to determine the best placement. For example, consider a person entity and a passport entity. Unless you have dual citizenship, at maximum a person has one passport and at maximum a passport belongs to one person. Let us ignore dual citizenship and consider this cardinality to be true. Now, at minimum a passport has one person and at minimum a person has zero passport.

    lets turn this into a table, and your name is the first instance on the person table. Go ahead and draw it out with attributes and a primary key. Add one more person to your table and give them attributes. Now, draw a passport table. Give the table attributes and a primary key (color, nation, etc.). Only make one passport on your passport table. Now lets try to add the foreign key in the relationship.

    Add a column to the person table for the PassportID to be added as a foreign key. Put your one passportID under this new column in the instande that represents you. Now look at the table. There is an empty box Where your friend's passport number should be. They don't have a passport, so this results in wasted space in your database. A database must store that there is nothing in that box, which is inefficient.

    Erase the column you added, and add a new column on the passport table for the PersonID to be added as a foreign key. Add your PersonID to the first instance in this column. Look, there isn't any wasted space! This is the best way to do 1:1 relationships: in a 1:1 relationship, add the foreign key to the one that has the zero in the cardinality. If your minimum cardinality is 1:1, it truly doesn't matter, but remember to only add a foreign key to one of the two tables.

    M:M

    In many to many relationships, we use a composite key. If you added the foreign key to just one or the other table or even both, you would be unable to correctly identify how two instances are related. Instead, we create a linking table that contains both keys and any other attributes that would be useful to know. One example of this is order and item.

    An order can contain many items, and an item can be on many orders (think apples and cereal). The ERD looks like this:

    ERD Cardinality

    When you place an order on amazon, you can order many items. We need another table to act as the linking factor to show what order had which items on it. In this case, the linking table contains the line items (or each item bought) shown on the receipt. See the updated diagram below:

    ERD Cardinality

    This is a special relationship, because the table in the middle doesn't actually contain a primary key. The combination of the foreign keys creates a single primary key to accomplish the purpose of a primary key: to uniquely identify each instance. The table could also be used to record the quantity of each item purchased. This is the standard procedure for every M:M relationship. Be careful though, because some M:M relationships actually have another real table in between them. For example, trying to relate a customer and the items with a many to many makes sense, because a customer can buy many items and an item can be purchased by many customers. The middle table could represent this, but the order table is missing that records the date and other information from the sale, like which employee made the sale. A customer is really realated to order, which has the relationship explained above with item.

  • Pulling It All Together

    Understanding ERDs

    the best way to learn ER diagramming is to do it. First, watch the video below. It may help you to pull out a sheet of paper and a pencil (do NOT use a pen; you will need to erase) and copy what is done in the video.