Data Lake and Reporting Architecture for Product Development at Boeing



Boeing is an American multinational corporation that designs, manufactures, and sells airplanes, rotorcraft, rockets, and satellites. With over 150,000 employees and operations in more than 65 countries, it is one of the largest aerospace companies in the world.


Problem Statement


Boeing’s Computer Aided Design (CAD) and Product Lifecycle Management (PLM) data are distributed across different systems and teams. This fragmentation leads to inefficient data access, low data quality, and increased costs in maintenance and management. Boeing wants to streamline data access, improve data quality, and enable the use of data analytics to gain insights into the product development process.




To address the problem, I built a data lake and reporting architecture for Boeing’s CAD and PLM data. The data lake is a centralized repository that consolidates the data from various systems and teams. The data is stored in a scalable, fault-tolerant, and cost-effective manner using Amazon Web Services (AWS) cloud technologies, such as Amazon S3 and Amazon Redshift.

We used AWS Glue to automate the data ingestion and transformation process. The data is extracted from the source systems using AWS Glue connectors, cleaned, enriched, and transformed using Python and Spark scripts, and loaded into the data lake. The data is partitioned by date and other relevant attributes to enable faster query performance.

We designed a reporting architecture that enables users to access the data lake using different reporting tools, such as Tableau, Power BI, and QuickSight. The reporting architecture is built on top of AWS Redshift, a data warehousing solution that supports fast query performance, concurrency, and scalability. The data is modeled using a star schema to enable efficient aggregation and analysis.




The data lake and reporting architecture have enabled Boeing to streamline data access, improve data quality, and gain insights into the product development process. The benefits include:


Improved data quality: The centralized data lake ensures consistent and accurate data across the enterprise. The data quality is improved by identifying and fixing data issues during the data transformation process.


Faster data access: The data lake enables faster access to data by reducing data latency and improving query performance. The data can be queried using different reporting tools, and the results are returned in seconds.


Cost savings: The data lake and reporting architecture have reduced the cost of maintaining and managing the CAD and PLM data. The cloud-based architecture provides a scalable and cost-effective solution that reduces the need for on-premise infrastructure.


Insights into the product development process: The data analytics capabilities have enabled Boeing to gain insights into the product development process. The analytics includes trend analysis, root cause analysis, and predictive modeling. These insights help the company to improve its design, manufacturing, and testing processes.




Boeing has successfully implemented a data lake and reporting architecture for the CAD and PLM data. The solution provides a centralized, scalable, and cost-effective platform that enables faster data access, improved data quality, and data analytics capabilities. The insights gained from the data analytics have enabled the company to improve the product development process and reduce costs. The solution can be extended to other areas of the enterprise to improve the data management and analytics capabilities.

Read More

Scroll to Top