Database Fundamentals (Chepter-1)
Welcome to My “Database for Data Science” blog series! In this journey, we’ll explore the essential aspects of databases that power data science. From understanding structured formats to querying capabilities, we’ll delve into how databases efficiently store and manage vast amounts of data, enabling data-driven insights for a world of possibilities. Whether you’re a budding data scientist or an enthusiast eager to harness the power of data, join me as we uncover the core principles behind databases and their indispensable role in data science. Let’s embark on this enlightening adventure together!
Let's understand the Data!
In computer science, Data refers to any collection of facts, statistics, information, or values. It can be processed, analyzed, and interpreted to derive meaningful insights or support decision-making. Data is fundamental to the functioning of computers and plays a central role in various aspects of computer science. It can take many forms, such as text, numbers, images, audio, video, and more.
ADatabase is an organized collection of data, designed to store, manage, and retrieve information efficiently. They are a fundamental component of modern information systems, serving as repositories for a wide range of data types, including text, numbers, images, audio, video, and more. Databases play a crucial role in various applications, from small-scale projects to large enterprise-level systems.
Key characteristics of databases
- Structured Format: Databases organize data in a structured format using tables, with rows representing individual records and columns representing attributes or fields.
- Data Integrity: Databases ensure data integrity by enforcing constraints and rules to maintain the accuracy and consistency of data.
- Data Retrieval: They provide powerful querying capabilities, allowing users to retrieve specific data based on various criteria.
- Concurrency Control: Databases manage simultaneous access to data by multiple users or applications, ensuring data consistency and preventing conflicts.
- Transaction Management: Databases support transactions, which are sets of operations that must be executed as a single unit, ensuring atomicity, consistency, isolation, and durability (ACID properties).
- Data Security: They offer various security mechanisms to protect data from unauthorized access and ensure confidentiality.
- Scalability: Databases can be scaled vertically (adding more resources to a single server) or horizontally (distributing data across multiple servers) to accommodate growing data requirements.
Types of Databases
- Relational Databases → Also known as SQL databases, these databases use a relational model to organize data into tables with rows and columns. Ex. MySQL, Postgre.
- NoSQL Databases → These databases are designed to handle large amounts of unstructured or semi-structured data, such as documents, images, or videos. Ex. MongoDB.
- Column Databases → These databases store data in columns rather than rows, making them well-suited for data warehousing and analytical applications. (Amazon Redshift, Google BigQuery)
- Graph Databases → These databases are used to store and query graph-structured data, such as social network connections or recommendation systems. (Neo4j, Amazon Neptune)
- Key-value databases → These databases store data as a collection of keys and values, making them well-suited for caching and simple data storage needs (Redis and Amazon, DynamoDB)
Let’s understand the Relational Databases in more depth as we are mainly focused on Relational databases in this series.
Relational Databases
These databases use a relational model to organize data into tables with rows and columns. Data is stored in rows, and each row represents a specific record or piece of information. For example, in a customer database, each row could represent a different customer, with details like name, address, and contact information. The columns, on the other hand, represent different attributes or characteristics of the data. Using the customer database example, the columns might include attributes like customer ID, name, address, email, and phone number.
DBMS (Database Management System)
A database management system (DBMS) is software that allows users to create, manage, and manipulate databases. A DBMS serves as an intermediary between the end-users or applications and the actual physical database, providing an interface to interact with the data.
Functions of DBMS
- Data Definition: It allows users to define the structure of the database, including creating tables, specifying data types for columns, and defining relationships between tables.
- Data Manipulation: A DBMS enables users to insert, update, delete, and retrieve data from the database. This is typically done using a query language like SQL (Structured Query Language).
- Data Security: DBMS provides mechanisms to control access to the database, ensuring that only authorized users can view, modify, or delete data.
- Data Integrity: It enforces rules and constraints to maintain data integrity, preventing invalid or inconsistent data from being entered into the database.
- Concurrency Control: DBMS manages simultaneous access to the database by multiple users or applications, ensuring that transactions are executed without interfering with each other.
DBMS comes in various types, such as relational database management systems (RDBMS), object-oriented database management systems (OODBMS), and NoSQL databases, each catering to specific data management needs.
Keys in Database
A key in a database is an attribute or a set of attributes that uniquely identifies a tuple (row) in a table. Keys play a crucial role in ensuring the integrity and reliability of a database by enforcing unique constraints on the data and establishing relationships between tables.
- Super Key: A Super key is a combination of columns that uniquely identifies any row within a relational database management system (RDBMS) table.
- Candidate key: A candidate key is a minimal Super key, meaning it has no redundant attributes. In other words, it’s the smallest set of attributes that can be used to uniquely identify a tuple (row) in the table.
- Primary Key: A primary key is a unique identifier for each tuple in a table. There can only be one primary key in a table, and it cannot contain null values. It can not be null and can not be repeated
- Alternate Key: An alternate key is a candidate key that is not used as the primary key. (Candidate key - Primary Key)
- Composite Key: A composite key is a primary key that is made up of two or more attributes. Composite keys are used when a single attribute is not sufficient to uniquely identify a tuple in a table.
- Surrogate Key: A surrogate key is a unique, system-generated identifier assigned to each record in a database table. It is independent of the actual data and provides stability and simplicity for managing and linking records.
- Foreign Key: A foreign key is a primary key from one table that is used to establish a relationship with another table.
Cardinality of Relationships
Cardinality in database relationships refers to the number of occurrences of an entity in a relationship with another entity. Cardinality defines the number of instances of one entity that can be associated with a single instance of the related entity.
- For One to One Relation we create 1 Table.
- For One to Many Relation, we create 2 table
- For Many to Many Relation, we create 3table
Examples
1. Person -> Driving License Number (One to One)
2. Student -> college branch (One to Many)
3. Restaurants -> orders (One to Many)
4. Restaurants -> menu (Many to Many)
5. Students -> courses (Many to Many)
Drawbacks of Databases
- Complexity: Setting up and maintaining a database can be complex and time-consuming, especially for large and complex systems.
- Cost: The cost of setting up and maintaining a database, including hardware, software, and personnel, can be high.
- Scalability: As the amount of data stored in a database grows, it can become more difficult to manage, leading to performance and scalability issues.
- Data Integrity: Ensuring the accuracy and consistency of data stored in a database can be a challenge, especially when multiple users are updating the data simultaneously.
- Security: Securing a database from unauthorized access and protecting sensitive information can be difficult, especially with the increasing threat of cyber attacks.
- Data Migration: Moving data from one database to another or upgrading to a new database can be a complex and time-consuming process.
- Flexibility: The structure of a database is often rigid and inflexible, making it
difficult to adapt to changing requirements or to accommodate new types of data.