Thread
Data mining is a BILLION dollar industry.

Data Science and Programming has made it possible.

Here’s how it works:
We each create thousands of data points every day.

Expand this on a global scale:

That's petabytes or even exabytes of data!

Complex systems have been created to manage and store data in an efficient and performant way.
First, data is collected from the following sources:

🔸 User interaction: Clicks, swipes, words typed, and even time spent on an area or page.

🔸 Server logs: Details about each request made such as the request body and URL.

(cont. 👇)
🔸 Database records: Platform-based data such as transactions made on an e-commerce store.

🔸 Tracking pixels: Embedded pixels that are invisible to the user and communicate to an external server. Typically used to track the effectiveness of ad campaigns.
Once collected, data is processed using real-time processing systems or batch processing systems.

Here, data is cleaned, transformed into the required format, and validated to ensure consistency and correctness.

From there, data is stored using the following options:
🔸Distributed file systems: storing large amounts of data across multiple nodes (e.g. Hadoop Distributed File System (HDFS)).

🔸Columnar databases: stores data as columns rather than rows. Ideal for read-heavy workloads (e.g. Apache Parquet).

(cont.👇)
🔸 Data warehouses: centralized storage for large data that is optimized for analytical querying.

Once stored, the data can be used in the following ways:

🔸 Personalization & targeted ads: delivering ads & messaging based on what our data + ML have indicated is most effective.
🔸Business intelligence: companies use BI tools to analyze & generate reports on key insights such as market trends or customer behavior.

🔸 Predictive analytics: machine learning is used to make predictions about the future behavior of a single consumer or cohort.
Companies understand the power of knowing their audience.

The data points we produce combined with ML and data science has made it so easy for companies to determine the products and messaging we gravitate to the most.
Want more engineering insights like this?

Subscribe to our free newsletter for a weekly roundup of all our best content:

levelupcoding.co/
If this thread was helpful to you:

1. Follow @NikkiSiapno for more valuable programming and tech content

2. Like & Retweet the tweet below to share it with others👇

Mentions
See All