Tuesday, September 10, 2024

Report Deforestation

I made a Report for Deforestation accross regions, countries and the world.

I analyzed a database using SQL.  

This is the Report:

Report Deforestation

And this is the SQL Code if you want to take a look:

SQL Code

SQL Subqueries

I analyzed a Database for answering questions of Deforestation.

There are 3 tables, forest_area, land_area and regions.

forest_area gives info about forest area for each country.

land_area is the land_area for each country.

and regions show which region each country belongs to.

forest_area and land_area have info for 1990 and 2016 years for each country.


I made a view of all the tables joined to figure out its structure:


This is the result:


That view is used by many queries so it is good to have it as another table.


I made a Subquery,

I first calculated the percentage of forest area relative to land area for all countries in 2016:

Then I grouped them by percent quartiles:

Counting the ocurrences on each group.


This is the result:



Saturday, August 10, 2024

CRUD Operations

I made CRUD Operations on the database to proof that it works correctly:

The most important Query this database was made for is getting employee history:

Q6: Write a Query that returns current and past jobs (include employee name, job title, department, manager name, start and end date for position) for employee Toni Lembeck.




Q5: How many employees are in each department?




Q1: Return a list of employees with Job Titles and Department Names:





Q2: Insert Web Programmer as a new job title:




ETL with PostgreSQL

I made data ingestion with ETL. I passed the data from staging table into its corresponding tables normalized to 3NF.

This is the Excel unnormalized table:


And this is the Physical ERD normalized to 3NF:




I started with salary table, I inserted unique values with the select distinct code:


Education table is the same.

Location table I needed to group by Address, Location Name, City and State, since these are company locations:


With Employee I grouped by EMP_ID, Employee Name and Email, this removes data redundancy for employees in the original table and keeps data integrity:


For Employee History I needed to join all the other tables with staging table so I could add the foreign keys, 

With a join all the asignations are ordered based on the on statement so I put the staging table first so that all the data could be visible.

I made a Left Join with managers because not all employees have managers, so I need to insert empties in that space:


And that's it.

Thursday, August 8, 2024

DDL in PostgreSQL

I made the DDL for the HR Database using the Physical ERD normalized to 3NF.

I used PostgreSQL.


I created the employee_history table with the CREATE TABLE command, I added primary key and all the atrributes:


Then I added constraints to the table for adding the foreing keys with ALTER TABLE:



Database Normalization

I normalized a Database to the 3rth Normal Form. 

I was given a dataset in an Excel spreadsheet, that was unnormalized:

For First Normal Form I made sure there were no repetitive groupings in the data.

and no repetitive data within each cell.


For Second Normal Form I made sure that All columns depend on the Unique ID.


For Third Normal Form I made that there were no transitive dependencies in the data:


The conceptual ERD Looks like this:



For normalization I needed to remove repetitions in the data, so the most important part is that the employees have data history in the information I received, so one important Query that I needed to be made is:

  • Write a Query that returns current and past jobs (including employee name, job title, department, manager name, start and end date for position) for employee any employee,

So i needed an employee_history table that could keep the information I needed for this query:




The logical ERD keeps a table for employee, normalizing its data, there is another for company locations, Education and Salaries.

This design achieves the query that I want and also keeps data integrity in the employee data.


The manager can also be grabbed from the employee table as a second connection to employee_history.

The data types for this Physical ERD are for PostgreSQL.