Success in machine learning is 90% about data
Many companies struggle to take control of their data due to the pace of change in the data landscape -
as data keeps growing, new tools keep emerging, but engineering best practices are not yet fully formed.
Building a data system that serves machine learning applications adds additional complexities and considerations, such as:
- Supporting feature time-travel - to avoid data leaks into models
- Enabling data scientists to experiment with new features at scale - and version their work
- Feature engineering pipelines and feature stores - for structured and semi-structured data
- Pre-processing pipelines and indexing - for unstructured data such as images, video, or voice
- Monitoring and maintaining data quality - always a challenge, but even more so for ML models