Subscribe for updates
Sardine needs the contact information you provide to us to contact you about our products and services.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Optimizing Google Cloud Datastore (Firestore) Costs

Sardine uses Google Cloud Datastore to power part of our investigative UI. Datastore is a highly scalable managed NoSQL database with a flexible interface. Datastore is very important to our infrastructure, so we are always looking to optimize it.

As we scale, we are noticing Datastore costs increasing substantially as well. This led us to dive deeper into understanding the types of costs associated with Datastore and how we could optimize these costs with an eye towards our future scaling of our infrastructure.

Google also has a product called Firestore which is now positioned as the next generation of Datastore. When we got started, native Firestore was not GA. We are currently using Firestore with Datastore compatibility.

Understanding Datastore pricing

Our first step in this investigation was understanding the pricing for Datastore. Datastore charges for:

  • Read Request
  • Write Request
  • Delete Request
  • Storage

After the team’s analysis, we determined that most of Sardine’s cost is from data storage primarily because we have a lot of data (and indexes).

Understanding Datastore’s Indexing

Since storage (and indexing) is where we needed to focus on, we wanted to understand datastore’s indexing system. Since this would help us optimize datastore costs.

Built-in Indexes

Datastore is a flexible document store (similar to MongoDB), and by default, all columns are indexed. This means, if we are not careful, we’ll end up indexing lots of fields and drive up our costs.

Composite Indexes

To support queries with multiple filters, we must add composite indexes. Datastore also requires indexes for sorting, so to support the following query, we will need a composite index of (customer_id, timestamp)

SELECT * FROM transactions WHERE customer_id=1 ORDER BY timestamp ASC

Indexes are ordered, to support timestamp DESC sort, we will need another index.

Analyzing Index usage

Google cloud provides various metadata to understand index usage. We have to look at both built-in and composite index usage to get the correct usage data.

Built-in Indexes

To analyze built-in index usage, the team ran the following query:

SELECT * FROM __Stat_Kind_BuiltinIndex__

This query reveals statistics about each built-in index, including:

  • Index size
  • Last used time
  • Last update time

We found out that we accidentally indexed list fields for each session, which consumed a lot of spaces. And those indexes were not even used.

Composite Indexes

To analyze composite indexes, we ran the following query:

SELECT * FROM __Stat_Kind_CompositeIndex__

This query reveals statistics about each built-in index, including:

  • Index size
  • Last used time
  • Last update time

Dropping indexes

Now that we identified the expensive indexes, the team can start the process of dropping them. For composite indexes, since we manage them in terraform, it’s easy to drop. Built-in indexes are more complex, because they are per-object.

When our dataflow job writes data to datastore we have code like the following:

entity.putProperties("score", 
  DatastoreHelper.makeValue(element.getScore()).build());

The code doesn’t mention anything about index. However, in this example, score field is indexed, because by default all fields are indexed.

entity.putProperties("score", 
  // explicitly set excludeFromIndexes flag
  DatastoreHelper.makeValue(element.getScore()).setExcludeFromIndexes(true).build());

To avoid confusion we built a wrapper function buildIndexedValue and buildUnindexedValue so the code is more explicit like the following:

entity.putProperties("transaction_id", 
	buildIndexedValue(element.getTransactionId());
entity.putProperties("score", 
	buildUnindexedValue(element.getScore());

Additionally, we had to unindex fields from billions of existing records. There is no easy way to perform this as far as we know. So, we wrote a python script that reads data, and writes back with an updated excludeFromIndex flag.

Conclusion

Datastore costs often grow quietly through unnecessary indexes. By doing the following you can get noticeable cost savings:

  • Auditing index stats (__Stat_Kind_BuiltinIndex__, __Stat_Kind_CompositeIndex__),
  • Dropping what’s unused indexes.
  • Explicitly marking non-queryable fields with excludeFromIndexes=true.

Let us know what you think about this approach. Be sure to investigate your data storage solution to look for cost savings and optimization!

If you’re someone who is excited about optimizing cloud usage, apply here!

Share the article
About the author
Kazuki Nishiura
Chief Technology Officer

Share the article
Subscribe for updates
Sardine needs the contact information you provide to us to contact you about our products and services.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Heading

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

Share the article
About the author
This is some text inside of a div block.
This is some text inside of a div block.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.