Use app×
Join Bloom Tuition
One on One Online Tuition
JEE MAIN 2025 Foundation Course
NEET 2025 Foundation Course
CLASS 12 FOUNDATION COURSE
CLASS 10 FOUNDATION COURSE
CLASS 9 FOUNDATION COURSE
CLASS 8 FOUNDATION COURSE
0 votes
91 views
in Computer by (44.8k points)
closed by

Overview of Data Merging.

1 Answer

+1 vote
by (44.9k points)
selected by
 
Best answer

In Data Science, data merging is the process of combining two or more data sets into a single data frame. This process is necessary when we have raw data stored in multiple files or data tables, that we want to analyze all in one go.

However, while merging the data from different sources there are many issues that occur that require corrections for successful data merging. Different data sources will always have different naming conventions than the main data source. They may have different ways of grouping the data and so on. Many times, it happens that the additional data source happens to be created at a very different time by different people with a different objective and use-cases. Owing to all these factors, it should not sound strange if there is a lot of difference between multiple data sources.

In this topic, we will explore various ways of simplifying the process of data merging. There are many places where these data merging techniques will help you. For example, if you have two different systems that operate in parallel with each other. Suppose that you have to perform some analysis of the relationship where you are having a legacy system with a very poorly formatted data that you are willing to integrate with your new system. This is where data merging comes into the picture. Let us now dive deep into data merging techniques.

We can perform data merging by implementing data joins on the databases in frame. 

There are three categories of data joins:

1. One to One Joins 

2. One to Many Joins 

3. Many to Many Joins

One to One Joins

One to one join is probably one of the simplest join techniques. In this type of join, each row in one table is linked to a single row in another table using a “key” column. 

For example, in a company database, each employee has only one Employee ID, and each Employee ID is assigned to only one employee. 

In the database, a one to one relationship looks like this:

Employee Table

Employee Table

In this example, the “key” field in each table is “Employee ID”. This “key” field is designed to contain unique values. In the Employee table, the Employee ID field is the primary key, in the Contact Info table, the Employee ID field is a foreign key.

The one to one relationship returns the related records when the value in the Employee ID field in the Contact Info table is the same as the Employee ID field in the Employees table.

This is how one to one join works, by merging the data tables using this primary key.

One to Many Joins

In a one to many join, one record in a table can be related to one or many records in another table. For example, each student can have multiple books by school library.

In the database, a one to many relationships looks like this:

Library Database

Library Database

In this example, the primary key field in the Students table, Student ID, is designed to contain the unique values. The foreign key field in the Library table, Student ID, is designed to allow multiple instances of the same value.

The one to many relationships returns the related records when the value in the Student ID field in the Library Table is the same as the value in the Student ID field in the Students table.

This is one to many join works, by merging databases using primary key which demonstrates one to many relationships.

One to many Relationship

One to many Relationship

Many to Many Joins

A many to many relationships are said to occur when multiple records in one table are related to multiple records of other table. For example, a many to many relationships exists between students and courses. A student can register for multiple courses. A course can have multiple students.

It is not easy to perform join on tables which have many to many relationships. As a workaround, to perform a join, you can break a many to many relationships into two one to many relationships by using a third table which is called as a join table. Every record in a join table contains a match field that contains the value of the primary keys of two tables that it joins. In a join table, usually these match fields are called as foreign keys. These foreign keys are populated with the data as records in the join table are created from either table that it joins.

The below table demonstrates the Student table, which contains a record for every student. It also contains a Courses table, which contains a record for each course. A join table called Enrollments creates two one to many relationships, the one between each of the two tables.

Students and Course Database

Students and Course Database

The primary key Student ID is a unique identifier for every student in Students table. The primary key Course ID is a unique identifier for every course in the Courses table. The Enrollments table carries the foreign keys Student ID and Course ID.

To set up a join table for many to many relationships,

1. Using the above example, you can create a table called Enrollments. This will act as a join table.

2. In the Enrollments table, make a Student ID field and a Course ID field. 

3. Make a relationship between the two Student ID fields in the tables. Later, make a relationship between two Course ID field in the tables.

We can use this design, if a student registers for four courses, we can ensure that the student has only one record in the Students table and four records in the Enrollments table, one for each course student is enrolled in.

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
asked Nov 11, 2023 in Computer by AshaUsapkar (44.8k points)

Welcome to Sarthaks eConnect: A unique platform where students can interact with teachers/experts/students to get solutions to their queries. Students (upto class 10+2) preparing for All Government Exams, CBSE Board Exam, ICSE Board Exam, State Board Exam, JEE (Mains+Advance) and NEET can ask questions from any subject and get quick answers by subject teachers/ experts/mentors/students.

Categories

...