ASML
Internship: Design a modular data solution for data aggregation, storage, analytics and presentation
Introduction
This is an assignment for a master's student in Data Science or a related discipline who is interested in designing a modular data solution that addresses current and future needs for data aggregation, storage, analytics and presentation and is a strong communicator in English.
Job Mission
ASML’s TWINSCAN lithography machine is controlled by software consisting of more than 25 million lines of code. Hundreds of software developers frequently deliver new functionality, bug fixes and code/UX improvements.
These updates are tested and qualified by the Software Test department, using thousands of automated tests, generating 300 GB’s of data per day. This output data
is archived and consists of “structured” data such as output reports with pass/fail criteria, and “unstructured” data such as trace files. A lot of information in this data currently cannot be extracted. A flexible data solution is needed that can answer questions not yet determined, such as “did a certain pattern or issue occur before” or “are there correlations between issues and configurations”.
The goal of the assignment is to design a modular data solution that addresses current and future needs for data aggregation, storage, analytics and presentation.
Key characteristics include decoupling the data itself from the tools used, separation of concerns with well-defined interfaces between modules to allow future improvements per module, and scalability to accommodate future demand.
Your assignment will be to design the test data infrastructure and prepare for implementation. Specific steps:
- Collect requirements
- Define Interfaces: external input and output
- Select ‘of the shelf’ tooling to implement requirements
- Define scalable hardware infrastructure needed to run selected tooling
- Define an implementation plan