Week 12-14 (Analysis 101 - #3 Data Analysis & Extraction) : Newbie swotting as a Business Analyst
Hello friends,
Welcome back to my #SwottingAsABusinessAnalyst series. Read my previous post Week 12-14 Part 2 and the Week 11 here.
If you're new here, explore the initial post of my journey 'Swotting' as a budding Business Analyst.
During Weeks 12 to 14, I had intermittent online engagements and returned to wrap up our new learning topic. The highlight of the week was our introduction to Analysis 101!
Training Objectives:
We aimed to achieve the following by the end of the training:
Perform simple data analysis using Excel.
Learn basic query extraction/manipulation using SQL, Excel formulas, etc.
Develop an understanding of pseudocode.
This blogpost is divided into 3 sections, which I am splitting it into 3 different blogpost (it’s too long!) :
Making sense of data (Read here)
Pseudocode (Read here)
Data Analysis & Data Extraction
#3 Data Analysis & Data Extraction
3.1 Data Analysis :
Reference (Further reads) : Mastering the Art of Data Analysis: A Six-Step Guide
Data analysis involves cleaning, analyzing, and visualizing data to derive insights for informed decision-making. We use tools like Excel and SQL to extract, manipulate, and analyze data stored in platforms like AWS. SQL constructs help us retrieve specific data from databases, facilitating effective analysis.
Types of Statistics
Descriptive Statistics : Describe what is going on in a data set. Results cannot be generalised to any other group. I have a 3.45 GPA.
Inferential Statistics : infer / predict trends about a larger population based on a sample. e.g. Pollster samples random selection of a few thousand voters to predict an election. e.g. Using past sales to forecast the future demand
Another way to collect and interprete the data is by using population ad sample :
Population - Represent ALL possible data points and measurements. Population includes all possible data points, but it's difficult to survey everyone due to time and cost. For example, in Malaysia - Polling every single voter in Malaysia to see who they will be voting for next election.
Sample - A portion of the population representing the characteristics of the population. So, we often use samples, which represent characteristics of the population, even though they're less accurate. For example, Randomly polling 3000 on their voting preference to predict the next election. Note: there is always ~2% error as inferential stats are less accurate.
3.2 Data Extraction
Data extraction is the process of retrieving specific information from a database or dataset for analysis or reporting purposes.
Tools used for data extraction :
Excel
SQL
3.2.1 Excel is ideal for basic data analysis and visualization, while SQL excels in managing and querying large databases with precision and speed.
Common function that BA needs to know in Excel is pivot, sorting & filtering v-look up and turning data into meaningful insights on Excel
3.2.2 SQL which stands for Structured Query Language, is a tool used in data analysis to interact with relational databases. It provides a standardized way to query, manipulate, and manage data stored in databases.
In data analysis, SQL is commonly used to:
Retrieve Data: SQL queries can fetch specific data from one or multiple tables in a database. Analysts can specify conditions, sorting, and grouping criteria to tailor the results to their needs.
Filter and Transform Data: SQL allows analysts to filter data based on certain conditions, perform calculations, and transform data using functions and operations like aggregations, joins, and subqueries.
Aggregate Data: Analysts can use SQL to aggregate data by summarizing information, calculating totals, averages, counts, and other statistical measures across groups of data.
Join Data: SQL supports various types of joins to combine data from multiple tables based on related columns, enabling analysts to merge datasets and perform analysis across different sources.
Create and Modify Tables: Analysts can use SQL to create, modify, and manage database tables, including defining their structure (schema), adding or updating records, and altering table properties.
Data base that uses SQL is Amazon Web Services.
AWS (Amazon Web Services) is a cloud computing platform that offers a wide range of services, including computing power, storage, and databases, over the internet. It provides businesses with the flexibility to scale resources up or down based on demand, pay only for what they use, and access a variety of tools and applications to build and manage their IT infrastructure. In essence, AWS enables organizations to offload their IT operations to the cloud, reducing costs, improving scalability, and increasing agility.
For BA, we use AWS to extract data that is stored in the platform. To do that, we use SQL to retrieve, manipulate and analyse data for meaningful insights we need.
Simple analogy (as a newbie like me), AWS is storage where thousands of excel are being kept and SQL is the tool that BA use to command the storage to extract it for us (according to our requirements and specified needs).
In order to do this, we SQL construct to help extract those data for us.
Here's a breakdown of the SQL constructs:
SELECT: Retrieves data from a database table. Example:
SELECT column1, column2 FROM table_name;
SELECT TOP: Retrieves the specified number of rows from the beginning of the result set. Example:
SELECT TOP 5 * FROM table_name;
NULL: Represents a missing or unknown value in a database. Example:
SELECT column1 FROM table_name WHERE column2 IS NULL;
COUNT: Returns the number of rows that match a specified condition. Example:
SELECT COUNT(*) FROM table_name WHERE column1 > 10;
ORDER BY: Sorts the result set in ascending or descending order based on specified columns. Example:
SELECT * FROM table_name ORDER BY column1 ASC;
MIN: Returns the minimum value from a set of values. Example:
SELECT MIN(column1) FROM table_name;
MAX: Returns the maximum value from a set of values. Example:
SELECT MAX(column1) FROM table_name;
AVG: Calculates the average value of a set of values. Example:
SELECT AVG(column1) FROM table_name;
SUM: Calculates the sum of a set of values. Example:
SELECT SUM(column1) FROM table_name;
CASE: Provides conditional logic within a SQL statement. Example:
SELECT CASE WHEN column1 > 10 THEN 'High' WHEN column1 > 5 THEN 'Medium' ELSE 'Low' END AS Priority FROM table_name;
WHERE: Filters rows based on specified conditions. Example:
SELECT * FROM table_name WHERE column1 = 'value';
LIKE: Searches for a specified pattern in a column. Example:
SELECT * FROM table_name WHERE column1 LIKE 'ABC%';
Learn how to write SQL construct : SELECT queries 101
3.3 Data Visualisation
Once data is extracted and cleaned, we visualize it using charts like bar charts, line graphs, and pie charts to convey meaningful insights.
5 common ways to display data are as follows :
Bar chart
Line graph
Pie chart
Histogram
Scatter plot
As the training ends, we went away to do further self learning such as the followings :
Decode the Pseudocode
Real life example analysis - turning the data into a meaningful insight on Excel sheet
Practice how to use SQL to extract data on AWS (Classic)
In conclusion, Weeks 12-14 provided an intense yet exciting learning experience. We delved into decoding pseudocode, analyzed real-life examples, and practiced data extraction using SQL on AWS. Remember, "Everything is workable."
Found my content helpful? Consider to tip me a coffee via Ko-fi!
You can also explore my other blog post on Agile Delivery, Project Solutions, Agile Mindset and Career Bits on WorkWizard. Leave me a comment, would love to hear if we are on the same wavelength!
Read this article on medium : https://medium.com/@wawahalim