Big Data is ubiquitous. It relates to all aspects of our lives. While Big Data is giving us intelligence and offering new capabilities, Big Data also cause grief and concerns for many of us. Reflecting these concerns, I came across memes calling Big Data as Big Brother.
Big Data is a broad topic. Thus. I plan to share my experience posting several articles in order of importance.
As a technologist, dealing with Big Data is my hobby, passion, and part of my profession. Data analytics and analysis serve fuel to my metalhead. My passion helps me earn my living in technology, so I am grateful to share my expertise with aspiring data professionals and technology enthusiasts.
To introduce Big Data, I want to start with platforms and cover other aspects in my upcoming posts.
In this article, I want to start with and introduce Big Data platforms to beginners. Data platforms are critical because every Big Data business solution requires a specific platform. A Big Data platform is consisting of several layers. These layers perform different functions but they are interrelated.
Let me briefly introduce these layers with some practical examples.
The first layer of the Big Data platform is the shared operational information zone.
The information zone consists of data types such as:
- data in motion,
- data at rest, and
- data in several other forms.
The information zone also includes:
- legacy data sources,
- new data sources,
- master data hubs,
- reference data hubs, and
- content repositories.
The second layer of the data platform is called processing. This substantial layer includes:
- data ingestion,
- operational information,
- landing area,
- analytics zone,
- real-time analytics,
- integrated warehouse,
- data lakes, and
- data mart zones.
This layer needs to have a governance model for metadata catalogue including data security and disaster recovery of systems, storage and hosting and other infrastructure components such as local processing and storage.
The critical infrastructure for Big Data platforms is Cloud computing and Edge Computing processing and storage. The IoT (Internet of Things) backbone also relate to this layer.
The third layer of the data platform is the analytics platform.
The analytics platform consists of:
- process, and
These functions, process and tools can include:
- real-time analytics,
- information planning,
- decision making,
- predictive analytics,
- descriptive analytics,
- prognostic analytics,
- data discovery,
- data visualisations,
- executive dashboard, and
- other analytics features as required in a particular Big Data solution.
This layer is also comprehensive and involves many practitioners such as data architects, data scientists, data speiclists, implementers and administrators.
In addition, substantial input may be required from business stakeholders such as executive decision-makers, CDO (Chief Data Officer), CMO (Chief Marketing Officer), even CFO (Chief Financial Officer).
The fourth layer of the data platform consists of outputs such as:
- business processes,
- decision-making schemes, and
- point of interactions.
This layer of the data platform must be well-governed. Access needs to be provided with established controls for the data platform professionals such as data scientists, data architects, analytics experts, and business users.
After introducing these essential layers, I want to highlight a critical point: level of schema.
Level of schema
Level of the schema for the data platform is a crucial architectural and design consideration. We can classify the schema level under three categories:
- no schema,
- partially structured schema, and
- full structured schema.
Schema reflects the structure of data and databases. We can think of a schema as a blueprint for data management.
Some examples of no schema are:
- video files,
- audio files,
- picture files,
- social media feed,
Some examples of the partial schema are:
- instant messaging logs,
- system logs, and
- call centre logs;.
Some examples of the full structured schema are:
- structured sensor data, and
- relational transaction data.
Related to platforms another critical point is data processing levels.
Data processing levels
The data processing levels are the other architectural considerations.
The processing levels could be:
- raw data,
- validated data,
- transformed data and
- calculated data.
Other structural classifications of data in data platforms are related the business relevance.
We can categorise the business relevance of data as:
- external data,
- personal data,
- departmental data, and
- enterprise data.
Understanding Big Data platform function and components can be useful for all stakeholders of the Big Data solution in business organizations. While business executives like CIO, CISO, CDO, CMO, and CFO need to understand these layers at a high level, data architects, data scientists, data specialists, implementers, testers, and administrators need to understand them in more detailed level.
I hope this brief introduction gives you an overview to understand the Big Data platforms.
Thank you for reading my perspectives.
If you enjoyed this story, you may check my other technology articles on News Break.
Reference: Architecting Big Data & Analytics Solutions by Dr Mehmet Yildiz.