Databricks Fundamentals Flashcards by Dorbens Jean-Pierre (2024)

1

Q

Open Data Lake

A

also known as a data Lakehouse. Databricks Data Intelligence is built on this. The first part of the pyramid.

  • Data Ingestion and storage
  • Data processing and support for continuous data engineering
  • Data Access and Consumption
  • Data Governance – Discoverability, Security, and Compliance
  • Infrastructure and operations
  • All Raw Data
    (Logs, Texts, Audio, Video, and Images)

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

2

Q

Delta Lake

A

-Unified Data Storage for reliability and sharing

  • is a file-based open source storage format. ACID transaction guarantees

1st piece of the Data Intelligence Engine funnel/pyramid (after Open Data Lake)

  • Data layout is automatically optimized based on usage patterns, acid transaction guarantees, (scalable data and metadata handling), (audit history and time travel), (unified streaming and batch processing), (schema enforcement, and schema evolution)

-Features : Predictive I/O, Predictive Optimizations, Liquid Clustering

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

3

Q

Unity Catalog

A

Unified security, governance, and cataloging

  • Context-aware search, auto-describe tables and columns, automated lineage, end-to-end observability and monitoring, sharing ai models

3rd piece of the data lake house funnel/pyramid (after Delta Lake)

  • Securely get insights in natural language

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

4

Q

Data Intelligence Engine

A

Use generative Ai to understand the semantics of your data

  1. Delta Lake
  2. Unity Catalog

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

5

Q

ACID Transaction

A

  • Atomicity: A transaction is treated as a single atomic unit. All steps that make up the transaction must succeed or the entire transaction rolls back. If they all succeed, the changes made by the transaction are permanently committed to the managing system. Consider the transfer transaction example. For the transaction to be committed to the database, the $200 must be successfully deducted from the savings account and added to the checking account. The funds in both accounts must be verified to ensure their accuracy. If any of these tasks fail, all changes roll back and none are committed.
  • Consistency: A transaction must preserve the consistency of the underlying data. The transaction should make no changes that violate the rules or constraints placed on the data. For instance, a database supporting banking transactions might include a rule stating that a customer’s account balance can never be negative. If a transaction attempts to withdraw more money from an account than is available, the transaction will fail, and any changes made to the data will roll back.
  • Isolation: A transaction is isolated from all other transactions. Transactions can run concurrently only if they don’t interfere with each other. Returning to the transfer transaction example, if another transaction were to attempt to withdraw funds from the same savings account, isolation would prevent the second transaction from firing. Without isolation, it might be possible for the second transaction to withdraw more funds than are available in the account after the first transaction was completed.
  • Durability: A transaction that is committed is guaranteed to remain committed – that is, all changes are made permanent and will not be lost if an event such as a power failure should occur. This typically means persisting the changes to nonvolatile storage. If durability were not guaranteed, it would be possible for some or all changes to be lost, affecting the data’s reliability.

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

6

Q

Elements of Data Governance

A

  1. Data cataloging
  2. Data Classification
  3. Auditing data entitlements and access
  4. Data discovery
  5. Data sharing and collaboration
  6. Data Lineage
  7. Data Security
  8. Data quality

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

7

Q

Databricks Data governance

A

Unity Catalog: Unified governance and security

Delta Sharing: Sharing between organizations. Share live data without copying it, open cross-platform sharing, centralized admin and gov

Databricks Marketplace; Commercialization of data assets

Databricks Cleanroom: Private, secure computing

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

8

Q

Databricks Security Architecture

A

  • Control plane
  • Data Plane

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

9

Q

Data Plane

A

  • one of Databrick’s security architecture

-handle the movement of data packets within and between cloud environments.

-where the data is processed by clusters of compute resources

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

10

Q

Control plane

A

  • one of Databrick’s security architecture

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

11

Q

Photon

A

-Increased ETL, ingestion on data lake. Can be built on Spark

  • Loading data into Delta and Parquet, IoT use cases, SQL-based use cases

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

12

Q

Data Warehousing

A

  • Databricks SQL
    • Text to SQL
    • AI-driven queries
    • AI-driven serverless computing
      scales for cost efficiency and peak
      performance
    • AI-driven debugging and
      remediation

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

13

Q

Delta Live Tables (DTL)

A

ETL & Real-Time Analytics

-Automated and scalable streaming ingestion and transformation
-Workload-specific autoscaling
-Intelligent orchestration, error handling, and optimization

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

14

Q

Orchestration

A

  • Workflows

Intelligent ETL processing, AI-driven debugging and remediation, end-to-end observability and monitoring, broad ecosystem integration

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

15

Q

A

GEN AI
- Custom Models
- Model serving
- RAG

End-to-End AI
- MLOPS (MLFLOW)
- AutoML
- Monitoring
- Governance

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

Databricks Fundamentals Flashcards by Dorbens Jean-Pierre (2024)

References

Top Articles
Latest Posts
Article information

Author: Terence Hammes MD

Last Updated:

Views: 6207

Rating: 4.9 / 5 (49 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Terence Hammes MD

Birthday: 1992-04-11

Address: Suite 408 9446 Mercy Mews, West Roxie, CT 04904

Phone: +50312511349175

Job: Product Consulting Liaison

Hobby: Jogging, Motor sports, Nordic skating, Jigsaw puzzles, Bird watching, Nordic skating, Sculpting

Introduction: My name is Terence Hammes MD, I am a inexpensive, energetic, jolly, faithful, cheerful, proud, rich person who loves writing and wants to share my knowledge and understanding with you.