NewTech Friday: betterdata – constructing a synthetic data ecosystem 14 January 2022
betterdata is on a mission to transform the way global enterprises use, share, and store data. We spoke with their co-founder and CEO Uzair Javaid to learn more about their ambitions.
What led to the creation of betterdata?
betterdata is founded by two technology enthusiasts in Data and AI: Kevin Yee and I, Uzair Javaid, who left our jobs to start doing what excites us. I have a Ph.D. in Data Security and Privacy from the National University of Singapore and Kevin was previously at IBM with a background in AI Generative Modeling. In addition, both of us are ethical hackers as well. I was able to hack 670 user wallets on the Ethereum blockchain, whereas Kevin generated fake fingerprints to bypass biometric lock screens in smartphones. In March 2021, both of us met during the Singapore Batch 9 cohort of Entrepreneur First and a spark was ignited on how data privacy and AI can be combined together.
Having an ethical hacking background, we wanted to build a product with a privacy-beyond-security framework that is relevant for the global market. As 120+ countries have already passed their national data protection regulations, over $1.5 billion has been issued in fines to businesses worldwide due to non-compliance and customer expectations are also changing as data privacy standards have started to affect their purchasing behavior. Recognizing these trends, Kevin and I came up with the idea of synthetic data and a unified data platform that is use-case/industry agnostic and can address all data and privacy challenges faced by enterprises today.
Could you present betterdata's offer?
Data teams in enterprises face two major challenges in using their data for innovation - long data access times and low-quality of data. Due to increasing data regulations worldwide, compliance takes 3-12 months to approve access to the real data. Even after approval, data needs to be anonymized which destroys up to 80% of information and is not even safe, e.g., 80% of credit card owners can be re-identified by three transactions only. In addition, real data is flawed as it is very expensive to collect, limited in its scope, biased and imbalanced, which leads to high failure rates of data and AI solutions. Data used to be a data science problem, but it is now becoming a compliance problem as well. With the release of the AI Ethics Code by China, EU, and the UK, AI models and consequently, the data they are trained on, must be fair and de-biased. These challenges are huge barriers for enterprises to innovate with data that result in delayed projects, missed opportunities, poor software testing routines, and long product development cycles.
betterdata's product is meant to solve just this. We have built a one-stop data platform that converts limited data into limitless synthetic data with 99+% utility and 0% privacy risks. Our AI models automatically learn the characteristics of the original data to generate synthetic data that is both statistically and structurally similar to the original data, and does not compromise on privacy as it is compliant with all data regulations worldwide. Because the data is now synthetic, there is no mapping to any of the original data and thus, it can be freely used, shared, and stored. In addition, we use our proprietary technology, i.e., a unique conditional generation process, to set the conditions for data generation and create different permutations from the same dataset. This brings data enhancement capabilities to our product as well such as data extrapolation, bias, and imbalanced data correction, and data simulation, which are some of the biggest challenges that data professionals currently face. This results in better and ethical AI models that align well with the international AI Ethical Guidelines. With our product, we are able to provide a data platform that offers 100% controlled data generation with a wide array of data enhancement and accessible privacy-by-design features.
As such, we have identified that sharing data with external vendors is particularly risky for enterprises because they lose control over the data once it is shared. Each vendor can take 6-18 months to get approved and even after getting approval, they get anonymized data with poor utility. This is why 90% of vendor evaluations fail as their product works on the anonymized data, but not the real data. Enterprises lose millions of dollars annually this way when they can actually save them.
With our product, we reduce the time-to-access data by 90% and project costs by 4x. We also completely remove the legal requirements whenever enterprises share data with vendors or external partners, because the data is synthetic and does not belong to any user in the world. This reduces data sharing from a three-stage process to a single-stage and takes 1-3 hours instead of so many months. Note that our product is use case agnostic and this is only one of the many use cases that we are working on. Our mission is to build a whole synthetic data ecosystem where anyone can buy/acquire data and sell/share data anywhere in the world.
What's coming next for betterdata?
betterdata's product is live with customer engagements in the US, UK, EU, and Asia. Our product can be deployed on cloud as well as on-premise and we are actively looking out for more pilot projects as well as technology partners and resellers to keep building our vision of a synthetic data ecosystem with an open data sharing model. To support the extended data community globally, we will be launching a freemium version of our product as well on our website (www.betterdata.ai) by Q1 2022. As we are gathering interest from investors worldwide, we are currently raising our seed round and will close it by Q2 2022.