Optimize and reduce AI training data access costs via intelligent batching. Companies can analyze their cloud workload usage and data access patterns to identify opportunities to intelligently batch PUT operations when writing to cloud object stores. Data batching technologies can enable more cost-efficient operations by reducing the effective cost of PUT and GET operations when accessing cloud object stores.

The proper use of Data — data about team performance, data about customers, or data about the competition, can be a sort of force multiplier. It has the potential to dramatically help a business to scale. But sadly, many businesses have data but don't know how to properly leverage it. What exactly is useful data? How can you properly utilize data? How can data help a business grow? To address this, we are talking to business leaders who can share stories from their experience about "How To Effectively Leverage Data To Take Your Company To The Next Level". As part of this series, we had the pleasure of interviewing Tarang Vaish.

Tarang Vaish is the Co-founder and CTO of Granica, the world's first AI efficiency platform. Prior to founding Granica he served as a Founding Engineer at Armorblox and held senior engineering roles at Cohesity and Vivante Corporation.

Thank you so much for joining us in this interview series. Before we dive in, our readers would love to "get to know you" a bit better. Can you tell us a bit about your backstory and how you got started?

I was interested in computer science and engineering from an early age, passions which led me to dedicate my education and career to advancing technical innovation. After earning a Bachelor of Technology in computer science from IIT Guwahati and a Master of Science in computer science from Stanford University, I started my career as a software engineer at AMD and went on to hold leadership roles at Vivante Corporation, Cohesity and Armorblox.

While I was working at Armorblox I met Rahul Ponnala, with whom I would eventually found Granica. Our daughters went to the same preschool and became good friends and we found ourselves having long conversations about data storage, engineering and research at school events. We had both witnessed the cost and efficiency limitations of cloud storage in our respective roles, particularly for companies leveraging AI and ML tools. We spent many late nights researching and ideating on a solution that would solve these pain points and create a new era of AI efficiency.

Before we knew it, we left our jobs on the same day to fully dedicate our time to building Granica. Rahul and I have spent the past four years developing a breakthrough team and company and officially launched Granica in June 2023.

Can you share a story about the funniest mistake you made when you were first starting? Can you tell us what lessons or 'take aways' you learned from that?

This is somewhat non-technical, but I used to live less than a mile from my office and assumed my car would be able to get me there even if the fuel gauge level was low. Well, one time I ran out of gas on this commute, and funny enough I still assumed it was some engine problem and not gas. The lesson was to not ignore or procrastinate on alerts right in front of you!

Is there a particular book, podcast, or film that made a significant impact on you? Can you share a story or explain why it resonated with you so much?

Broadway shows and plays always inspire me, as I have participated in such shows in school and I admire how much focus and practice people put into delivering authentic viewing experiences.

I have to call out the TV show Silicon Valley as well, it shows the chaos, drama and fun that goes into startups. The creators of the show took inspiration from Stanford and Dropbox and focused on data compression and how it can "change the world," which was fun to watch. Some of the content is based on real-world lessons and the show was partly educational for me.

Are you working on any new, exciting projects now? How do you think that might help people?

Our Granica engineering team is developing additional security features for Granica's AI efficiency platform, which will be generally available this year. Sensitive data, particularly that which is unstructured, can pose a risk for many companies that rely on cloud and on-premises storage solutions: 93% of company networks are at risk for security breaches. With the additional security features that are under development, Granica customers can be assured their data is protected from potential cyberattacks.

Thank you for all that. Let's now turn to the main focus of our discussion about empowering organizations to be more "data-driven." For the benefit of our readers, can you help explain what exactly it means to be data-driven? On a practical level, what does it look like to use data to make decisions?

Of course. Data-driven companies are those that strategically leverage key metrics to guide their initiatives and goals. In order to be a truly data-driven organization, a company needs to be able to extract meaningful, reliable information from its various business functions and correlate this information to specific business initiatives. Companies can then use these insights to make informed decisions, more easily meet KPIs and free up time and resources to innovate and outperform competitors.

Depending on the company, data-driven decision-making could involve using company data to develop a more intelligent AI model, synthesizing consumer insights data and using it to adjust marketing and sales content, leveraging data about ROI on specific departments' initiatives to inform stakeholders of company performance and more.

Granica helps customers observe and improve key metrics around their data. Generally these are mixed with overall Cloud metrics. As part of the onboarding process, we work with customers to get an initial understanding of their data usage and provide them with key insights on how they are storing their data, accessing it and thinking about its lifecycle. This process also uncovers how Granica can help make their data more efficient.

Which companies can most benefit from tools that empower data collaboration?

Companies of all shapes and sizes, and in practically every industry, can benefit from enhanced data collaboration. After all, data collaboration is foundational in order to maximize the impact of data analytics. Almost every company has multiple data sources they need to bring together, and multiple teams across functions requiring access. In the context of AI, more data makes for better models, and thus AI teams heavily depend on data collaboration.

We'd love to hear about your experiences using data to drive decisions. In your experience, how has data analytics and data collaboration helped improve operations, processes, and customer experiences? We'd love to hear some stories if possible.

Within engineering and product development, across the companies for which I have worked, data analytics has helped us to build higher quality products at a faster velocity. In practice, we create many data "sensors" by instrumenting code across our run-times as well as our CI/CD environments to generate logs and metrics. This sensor data then feeds our engineering dashboards, which we use for real-time updates and troubleshooting as well as historical baselining and trends across many dimensions such as performance, uptime, build pass/fail rates, etc.

We share these analytical dashboards across our teams, so we can collectively and quickly see the impact of changes and whether they are improving (or regressing) our KPIs.

Has the shift towards becoming more data-driven been challenging for some teams or organizations from your vantage point? What are the challenges? How can organizations solve these challenges?

As organizations become increasingly data-driven, many face the cost and efficiency limitations of cloud storage, particularly those focused on AI and ML. The more data companies store in the cloud, the more their cloud bills skyrocket, depleting funds that could be otherwise used to advance innovation. For many, the cost to store, retain and use data in the cloud becomes a barrier which limits the volume of data they can acquire and leverage, limiting the effectiveness of their analytical efforts. Without an effective solution in place to curtail rising data costs and inefficiencies, I have seen data-driven teams encounter limitations such as unreliable data quality, compromised data security and other data management challenges.

Companies can mitigate these data challenges by adding an AI efficiency layer to their tech stacks. An AI efficiency layer can dramatically reduce the cost to store and access data without requiring teams to resort to data archival and/or deletion. It can also break down data silos by preserving the privacy of sensitive information contained within the data, thus facilitating its use across teams. By reducing cost and efficiency limitations, companies can more effectively train AI and ML models, drive forward product innovation and maximize ROI.

None

Based on your experience and success, what are "Five Ways a Company Can Effectively Leverage Data to Take It To The Next Level"?

At Granica our focus is on data, especially data used for AI/ML, and thus we see opportunities for companies to make their AI environments more efficient — i.e., faster, lower cost and more secure. Sophisticated analytical technologies and approaches, all leveraging FinOps best practices, are typically required to achieve material results and this is a part of how Granica's AI efficiency platform helps our customers. Here are some examples:

  1. Optimize and reduce petabyte-scale AI training data storage and transfer costs (and time) via data reduction. Data reduction/rehydration technologies such as compression and deduplication physically reduce the size of data (files and objects) stored and transferred within and between clouds. Companies can analyze their petabyte-scale data sets to identify ideal candidates to apply inline reduction technologies to reduce the cost to store data as well as the cost (and time) to transfer it across zones, regions and clouds.

For example, our customer Quantum Metric provides the leading Continuous Product Design platform for companies to understand their customers' digital journey, enabling organizations to recognize customer needs, quantify the financial impact and prioritize tasks. The Quantum Metric platform empowers a customer-centric culture, aligning business and technical teams to effectively prioritize customers' needs based on business impact. They are using Granica to realize savings of over 40% on over 100TB and 100 million objects per day of data in Google Cloud Storage.

2. Ensure safe use of data in AI via data privacy. Ensuring that all sensitive data is protected from security risk is critical for companies to maintain customer trust. And yet using the data sets containing that information is often critical to the success of analytical initiatives. Given that the largest volume of corporate data is unstructured data typically held in cloud object stores, it is critical for companies to analyze that data to identify and classify any sensitive information, and then to remediate/protect it via various de-identification and access control technologies.

Our Early Access customers use Granica in this way to bolster their data security posture and minimize their risk of breach while also safely leveraging as much data as possible to maximize analytical outcomes.

3. Optimize and reduce AI model training cost and time via the elimination of redundant and/or low value data. When companies collect data for AI and ML solutions, they often capture large quantities of redundant information. This redundant information costs valuable compute cost and time as it is processed throughout ML pipelines. The redundant data also reduces the predictive accuracy of AI models by increasing the amount of information "noise" relative to "signal," reducing model effectiveness. However, companies can analyze their training data to identify and remove redundant information, both at the file level and also within files.

Our customers use Granica in this way to make AI model training and development more efficient (i.e., faster and lower cost with more runs per dollar, as well as more accurate and effective).

4. Optimize and reduce petabyte-scale AI training data at-rest cloud object store costs via tiering or deletion. Companies can analyze their cloud workload usage and data access patterns to identify opportunities to tier large volumes of infrequently accessed training data to lower-cost object store classes. Truly cold data can be archived for long-term retention at a fraction of the cost of Amazon S3 Standard and Google Cloud Storage Standard. Similar to tiering, companies can analyze their data environment to identify candidate data for deletion — regardless of the original intended use and source of the data. Cross-team collaboration is crucial here in order to ensure identified data sets can be safely deleted with little to no business risk.

We find most organizations are hesitant to tier data given the performance and availability trade-offs without first gaining a deep understanding of their data and workload environment, so performing such analysis is a standard part of our new customer onboarding process.

5. Optimize and reduce AI training data access costs via intelligent batching. Companies can analyze their cloud workload usage and data access patterns to identify opportunities to intelligently batch PUT operations when writing to cloud object stores. Data batching technologies can enable more cost-efficient operations by reducing the effective cost of PUT and GET operations when accessing cloud object stores.

For example, our customer Nylas is a communications platform that offers customers API solutions to quickly and securely build email, scheduling and work automation features into their applications. With Nylas, developers get unprecedented access to rich communications data from their end-users, pre-built workflows that automate everyday tasks, embeddable UI/UX components for fast front-end development and comprehensive security features — all delivered via a suite of powerful APIs. Nylas is itself a heavy cloud API user and they use Granica to reduce their Amazon S3 API costs by roughly 90% (10x).

Changing company culture is hard. What would you suggest is needed to change a work culture to become more data-driven?

In my experience, the best way to become data-driven is to infuse key performance indicators (KPIs) into goal setting and planning processes, for example via a quarterly OKR process. Having explicit KPIs (i.e. metrics), with targets thus forces teams to measure what matters, and doing that requires becoming data-driven. Teams will begin instrumenting important technical and business processes and states, creating baselines and measuring impact.

Based on your experience, how do you think the needs for data will evolve and change over the next five years?

Organizations' data demands will continue to grow over the next five years, particularly as AI and ML solutions become increasingly central across all industries. In fact, IDC predicts that the world will exceed 175 zettabytes of data by 2025. With an increased focus on AI and ML, organizations will be forced to contend with larger training datasets and must choose to either deplete a sizable portion of their budgets to support cloud storage costs or adopt a tech solution that will cost-effectively reduce data volume without depleting data quality.

Does your organization have any exciting goals for the near future? What challenges will you need to tackle to reach them? How do you think data analytics can best help you to achieve these goals?

There are a number of exciting goals we are pursuing as a team, including building novel ML models and training the models on GPUs, leveraging ML in our data science software development projects and keeping our cloud infrastructure costs efficient on multiple clouds as we scale up. I anticipate the challenges will come around exploring the right tools and understanding the benefits of using such tools. Data analytics is essential to move fast and correct course, as needed, so it's essential for us to question whether the KPIs for each project are meaningful, and if they are being tracked and understood correctly by teams.

How can our readers further follow your work?

If you are interested in learning more about the power of an AI efficiency platform, you can follow Granica on Twitter and LinkedIn. You are also welcome to follow me on LinkedIn: https://www.linkedin.com/in/tarang-vaish-7a23368/.

Thank you so much for sharing these important insights. We wish you continued success and good health!