This feature is in early access and not yet available to all users. To request access, contact support.
Key concepts
When you create an index with dedicated read nodes, Pinecone allocates dedicated storage and compute resources based on your choice of node type, number of shards, and number of replicas.- Dedicated storage ensures that index data is always cached in memory and on disk for warm, low-latency queries. In contrast, for on-demand indexes, caching is best-effort; new and infrequently-accessed data may need to be fetched from object storage, resulting in cold, higher-latency queries.
- Dedicated compute ensures that an index always has the capacity to handle high query rates. In contrast, on-demand indexes share compute resources and are subject to rate limits and throttling.
Dedicated read nodes affects only read performance. Write performance is the same as for on-demand indexes.
Node types
There are two node types:b1
and t1
. Both are suitable for large-scale and demanding workloads, but t1
nodes provide increased processing power and memory. Additionally, t1
nodes cache more data in memory, enabling lower query latency.
Shards
Shards determine the storage capacity of an index. Each shard provides 250 GB of storage, making it straightforward to calculate the number of shards necessary for your index size, including room for growth. For example:Index size | Shards | Capacity |
---|---|---|
100 GB | 1 | 250 GB |
500 GB | 3 | 750 GB |
1 TB | 5 | 1.25 TB |
1.6 TB | 7 | 1.75 TB |
- Relieves storage (disk) fullness. Data is spread across shards, so adding shards reduces the amount of data on each one.
- Relieves memory fullness. With less data stored on each shard, there’s also less data to cache in memory.
You are responsible for allocating enough shards for your index size. If your index exceeds its storage capacity, write operations (upsert, update, delete) are rejected.
Replicas
Replicas multiply the compute resources and data of an index, allowing higher query throughput and availability.-
Query throughput: Each replica duplicates the compute resources available to the index, allowing increased parallel processing and higher queries per second.
- In general, throughput scales linearly with the number of replicas, but performance varies based on the shape of the workload and the complexity of metadata filters.
- To determine the right number of replicas, test your query patterns or contact support.
-
High availability: Replicas ensure your index remains available even if an availability zone experiences an outage.
- When you add a replica, Pinecone places it in a different zone within the same region, up to a maximum of three zones. If you add more than three replicas, additional replicas are placed in zones that already have a replica. This multizone approach allows your index to continue serving queries even if one zone becomes unavailable.
- To achieve high availability, allocate at least n+1 replicas, where n is the minimum number of replicas required to meet your throughput needs. This ensures that, even if a zone (and its replica) fails, your index still has enough capacity to handle your workload without interruption.
As your query throughput and availability requirements change, you can increase or decrease replicas. Adding or removing replicas can be done through the API and does not require downtime, but it can take up to 30 minutes.
Index fullness
Dedicated read nodes store a search index in memory and record data on disk. There are three measures of index fullness:memory_fullness
: How much of the index’s memory capacity is currently in use (0 to 1).storage_fullness
: How much of the index’s storage capacity is currently in use (0 to 1).indexFullness
: The greater ofmemory_fullness
andstorage_fullness
.
storage_fullness
is the limiting factor. However, memory can fill up first in the following scenarios:
b1
nodes, a large namespace (hundreds of millions of records), low-dimension vectors (128 or 256 dimensions), and minimal metadata.t1
nodes, high-dimension vectors (1024 or 1536 dimensions), and lots of metadata.
- Relieves storage (disk) fullness. Data is spread across shards, so adding shards reduces the amount of data on each one.
- Relieves memory fullness. With less data stored on each shard, there’s also less data to cache in memory.
You’re responsible for allocating enough shards to accommodate your index size. If your index exceeds its storage capacity, write operations (upsert, update, delete) are rejected.
Using dedicated read nodes
This feature is in early access and is not yet available to all users. To request access, contact support.
2025-10
of the Pinecone API.
Calculate the size of your index
To decide how many shards to allocate for your index, calculate the total index size and then add some room for growth. Each shard provides 250 GB of storage. To calculate the total size of an index, find the aggregate size of all its records. The size of an individual record is the sum of the following components:- ID size (in bytes)
- Dense vector size (4 bytes * dense dimensions)
-
Sparse vector size (9 bytes * number of non-zero sparse value)
To estimate the sparse vector component of your index size, multiply 9 bytes by the average number of non-zero values per vector.
- Total metadata size (total size of all metadata fields, in bytes)
Create an index
To create a dedicated index, call create an index. In thespec.serverless.read_capacity
object:
- Set
mode
toDedicated
. - Set
dedicated.node_type
to eitherb1
ort1
, depending on the node type you want to use. - Set
dedicated.scaling
toManual
(currently,Manual
is the only option, and it must be included in the request). - Set
dedicated.manual.shards
to the number of shards required to accommodate at least the current size of your index, with a minimum of 1 shard. Each shard provides 250 GB of storage. - Set
dedicated.manual.replicas
to the number of replicas for the index, with a minimum of 0 replicas (an index with 0 replicas is paused).
To determine the number of shards required by your index, see calculate the size of your index.
Add a hosted embedding model (optional)
If you’d like Pinecone to host the model that generates embeddings for your data, so that you use Pinecone’s API to insert and search by text (rather than vectors generated by an external model), configure your index to use a hosted embedding model. To do this, call configure an index, and specify theembed
object in the request body.
Example request:
Remember:
- Replace
chunk_test
with the name of the field in your data that contains the text to be embedded. - Be sure to use a model whose dimension requirements match the dimensions of your index.
It’s also possible to specify a hosted embedding model when creating a dedicated read nodes index. To do this, call create an index with integrated embedding. In the request body, use the
read_capacity
object to configure node type, shards, and replicas.Check index fullness
To check index fullness, call get index stats. Example request:indexFullness
describes how full the index is, on a scale of 0 to 1. It’s set to the greater of memory_fullness
and storage_fullness
.
Add or remove shards
To add or remove shards, contact support. This cannot be done with the API.Add or remove replicas
You can add or remove replicas no more than once per hour, starting one hour after index creation. Each change can take up to 30 minutes to complete.
spec.serverless.read_capacity.dedicated.manual.replicas
to the desired number of replicas.
Example request:
Pause a dedicated index
To pause an index, set the number of replicas to 0. This change should take less than a minute to complete, after which the index blocks all writes and reads.While an index is paused, you cannot write to it or read from it.
Change node types
To change the type of node used for a dedicated index, contact support. This cannot be done with the API.Migrate from on-demand to dedicated
You can change the of your index no more than once every 24 hours. The change can take up to 30 mins to complete.
- Determine the current size of your index.
-
Call configure an index.
In the request body, in the
spec.serverless.read_capacity
object, set the following fields:- Set
mode
toDedicated
. - Set
node_type
to the node type you want to use (b1
ort1
). - Set
shards
to the number of shards required for your index. Each shard provides 250 GB of storage. - Set
replicas
to the number of replicas required for your query throughput needs.
index-to-migrate
to a dedicated index withb1
nodes, 1 shard, and 1 replica:Response: - Set
-
Monitor the status of the migration.
When the migration is complete, the value of
spec.serverless.read_capacity.status.state
isReady
. AnError
state means that you didn’t allocate enough shards for the size of your index. Migrate to dedicated again, using a sufficient number of shards.
Migrate from dedicated to on-demand
To change a dedicated index to on-demand, contact contact support. This can’t be done with the API.Check the status of a change
After changing a dedicated index, check the status of the change by calling describe an index: Example request:spec.serverless.read_capacity.status.state
field. Possible values include:
Ready
: The dedicated index is ready to serve queries.Scaling
: A change to the node type, number of shards, or number of replicas is in progress.Migrating
: A change to the is in progress.Error
: You did not allocate enough shards for the size of your index. Migrate to dedicated again, using a sufficient number of shards.
Limits
Read limits
On dedicated indexes, read operations (query, list, fetch) have no rate limits. However, if your query rate exceeds the compute capacity of your index, you may observe decreased query throughput. In such cases, consider adding replicas to increase the compute resources of the index.Write limits
- On dedicated indexes, write operations (upsert, update, delete) have the same rate limits as on-demand indexes.
- Writes that would cause your index to exceed its storage capacity are rejected. In such cases, consider adding shards to increase available storage. To determine how close to the limit you are, check index fullness.
Operational limits
Metric | Limit |
---|---|
Min shards per index | 1 |
Max namespaces per index | 1 |
Node type or changes | 1 per 24 hours |
Max shard or replica changes | 1 per hour |
Other limits
- To increase or decrease shards, contact support.
- To change node types, contact support.
- Dedicated indexes do not support backups or bulk imports.
memory_fullness
is an approximation and doesn’t yet account for metadata.
Cost
The cost of an index that uses dedicated read nodes is calculated by this formula:(Dedicated read nodes costs)
+ (storage costs)
+ (write costs)
-
(Dedicated read nodes costs)
are calculated as:Node type rates vary based on pricing plan and cloud region. For exact rates, contact Pinecone. -
(Storage costs)
are the same as for on-demand indexes. -
(Write costs)
are the same as for on-demand indexes.
Example cost calculations
b1 nodes, 2 shards, 2 replicas - Standard plan
b1 nodes, 2 shards, 2 replicas - Standard plan
If the Standard plan rate for
b1
nodes is $548.96/month, the cost of dedicated read nodes would be as follows:t1 nodes, 2 shards, 2 replicas - Standard plan
t1 nodes, 2 shards, 2 replicas - Standard plan
If the Standard plan rate for
t1
nodes is $1,758.53/month, the cost of dedicated read nodes would be as follows: