Dedicated Read Nodes

Overview

Pinecone indexes built on dedicated read nodes use provisioned read hardware to provide predictable, consistent performance at sustained, high query volumes. They’re designed for large-scale vector workloads such as semantic search, recommendation engines, and mission-critical services. Dedicated read nodes differ from on-demand indexes in how they handle read operations. While on-demand indexes use shared, multi-tenant capacity for reads, dedicated read nodes provision exclusive hardware for reads—memory, local SSDs, and compute. Both index types use Pinecone’s serverless infrastructure for writes and storage. When you create a dedicated read nodes index, Pinecone provisions resources based on your choice of node type, number of shards, and number of replicas. These resources include local SSDs and memory that cache all your index data, and provide dedicated query executors to handle read operations (query, fetch, list). This architecture eliminates cold starts and ensures consistent low-latency performance, even under heavy load. Dedicated read nodes support dense, sparse, or hybrid indexes, giving you flexibility in your search and retrieval strategy. Because storage (shards) and compute (replicas) scale independently, you can optimize for your specific workload characteristics.

Read path for dedicated read nodes

On-demand vs dedicated

On-demand indexes and dedicated read nodes are both built on Pinecone’s serverless infrastructure. They use the same write path, storage layer, and data operations API. However, every dedicated read nodes index has isolated hardware for read operations (query, fetch, list), allowing these operations to run on dedicated query executors. This affects performance, cost, and how you scale:

Feature	On-demand	Dedicated read nodes
Read infrastructure	Multi-tenant compute resources shared across customers	Isolated, provisioned query executors dedicated to your index
Read costs	Pay per read unit (1 RU per 1 GB of namespace size per query, minimum 0.25 RU)	Fixed hourly rate for read capacity based on node type, shards, and replicas
Other costs	Storage and write costs based on usage	Storage and write costs based on usage (same as on-demand)
Caching	Best-effort; frequently accessed data is cached, but cold queries fetch from object storage	Guaranteed; all index data always warm in memory and on local SSDs
Read rate limits	2,000 RUs/second per index (adjustable)	No read rate limits (only bounded by CPU capacity)
Scaling	Automatic; Pinecone handles capacity	Manual; add shards for storage, add replicas for throughput
Best for	Variable workloads, multi-tenant applications with many namespaces, low to moderate query rates	Sustained high query rates, large single-namespace workloads, predictable performance and cost

When to use dedicated read nodes

Dedicated read nodes are ideal for workloads with millions to billions of records and predictable query rates. They provide performance and cost benefits compared to on-demand for high-throughput workloads, and may be required when your workload exceeds on-demand rate limits. There’s no universal formula for choosing between on-demand and dedicated read nodes—performance and cost vary by workload (vector dimensionality, metadata filtering, and query patterns). Consider the following factors when making your decision:

Predictable, consistent performance and cost

With dedicated read nodes, you allocate dedicated read hardware for your index, and your data is cached in memory and on local SSDs. This provides:

Consistent low latency under heavy load.
No cold starts (fetching data from object storage).
Performance isolation from other workloads.
Linear scaling by adding replicas.
Predictable costs based on fixed hourly rates for provisioned hardware.

If predictable performance and cost are critical for your application, dedicated read nodes may be a better fit than on-demand.

High throughput without rate limits or throttling

On-demand indexes are subject to read unit rate limits (default: 2,000 RUs/second per index).A high query volume on a large index can exceed these limits. For example, a 15 GB namespace at 150 QPS requires approximately 2,250 RUs/second (15 RUs per query × 150 QPS), which exceeds the default rate limit.Dedicated read nodes have no read rate limits and provide dedicated capacity for predictable QPS without throttling (bounded only by CPU capacity), making them better suited for high-throughput workloads.

Recommendation engines and real-time use cases

Recommendation engines for use cases such as e-commerce and media require very high throughput and low latency to maintain positive user experiences. Dedicated read nodes are purpose-built for these use cases, providing:

Consistent performance for thousands of queries per second
Low latency for real-time recommendations
Scalability to billion-vector datasets
No performance degradation during traffic spikes

Similar requirements apply to other real-time use cases like semantic search at scale, personalization engines, and mission-critical services with strict performance SLOs.

Single namespace workload (multi-namespace coming soon)

Dedicated read nodes indexes support only a single namespace. If your application requires multiple namespaces, on-demand is a better fit.

Multi-namespace support is coming soon. For early access, contact us.

When NOT to use dedicated read nodes

On-demand indexes are better suited for workloads with unpredictable or highly variable traffic patterns. For example:

RAG systems with variable query volumes
Agentic applications with sporadic usage
Prototypes and development environments with intermittent activity
Scheduled jobs with infrequent, batch-style queries

Additionally, on-demand is better for indexes with many namespaces, even if you have high query volumes. Dedicated read nodes currently only support single-namespace indexes, so multi-tenant applications requiring namespace-based isolation should use on-demand until multi-namespace support is available.For these scenarios, on-demand’s elasticity and usage-based pricing provide better cost efficiency than provisioning dedicated capacity.

Dedicated read nodes can handle predictable traffic spikes efficiently if you scale replicas proactively via the API. For example, you can provision extra replicas before a scheduled email campaign and scale back down afterward. Auto-scaling will be available in a future release.

Cost considerations

On-demand and dedicated read nodes have different cost structures. The key difference is read costs: on-demand uses usage-based pricing, while dedicated read nodes use a fixed hourly rate based on provisioned hardware. Write and storage costs are usage-based for both modes.Dedicated read nodes become cost-effective when you have predictable, sustained query volumes that make full use of your provisioned capacity. With unpredictable or low query volumes, you pay hourly rates even when your machines sit idle, making on-demand’s usage-based pricing more economical.For detailed cost information, comparison tables, and estimation tools, see the Cost section of this guide.

Test results for your workload

Performance depends on your specific workload — index size, vector dimensionality, metadata filtering, query patterns, throughput requirements, and latency requirements. Testing is the only way to know for sure whether dedicated read nodes are right for your scenario.For a step-by-step guide to testing, see Test your workload.

If you need guidance choosing a capacity mode (on-demand or dedicated read nodes) or sizing your index configuration, contact us.

Key concepts

Before creating a dedicated read nodes index, understand the configuration options that determine capacity and performance.

Node types

A node is the basic unit of compute and cache storage capacity for a dedicated read nodes index. Each shard runs on one node, so the node type you choose determines the performance characteristics and cost of your index. The total number of nodes in your index is calculated as shards × replicas. For example, an index with two shards and two replicas uses four nodes. There are two node types: b1 and t1. Both are suitable for large-scale and demanding workloads, but they differ in processing power and memory capacity, and they cache different data.

	b1 (Balanced)	t1 (Performance)
Memory caching	Vector index stored in memory	Vector index + vector projections cached in memory
Use case	Predictable performance for sustained query rates with balanced cost efficiency	Highest performance for the most demanding workloads with extreme query volumes and strict latency requirements
Storage	250 GB per shard	250 GB per shard
Compute & memory	Base-level compute and memory resources	~4x more compute and memory than `b1`
Cost	Lower-cost option	~3x the cost of `b1`

Consider using t1 nodes if your performance requirements are not met by b1 nodes, or if t1 nodes are more cost-effective than b1 nodes for your workload.

When choosing a node type, remember that:

Both types of nodes provide 250 GB of storage per shard. The difference is in compute and memory, which affects query performance.
Because t1 nodes cache more data in memory than b1 nodes, an index may require more shards on t1 than on b1 (for the same data).
You can change node types after creating your index.

Shards

Shards determine the storage capacity of an index. Each shard provides 250 GB of storage, and data is split across all the shards in an index. To respond to a query, the index gathers data from all shards as needed. To determine how many shards you need, calculate your index size and then calculate the number of shards.

It’s your responsibility to allocate enough shards to accommodate the size of your index. If your index exceeds the capacity of its shards, write operations (upsert, update, delete) are blocked, but read operations continue to work normally.

Replicas

Replicas multiply the compute resources and data of an index, allowing for higher query throughput and availability. Each replica is a complete copy of your index data and has its own dedicated compute resources.

Throughput scales approximately linearly with replicas. For example, if one replica handles 50 QPS at your target latency, two replicas should handle approximately 100 QPS.
You can scale replicas up or down with no downtime using the API. See Add or remove replicas.
Minimum: 0 replicas (pauses the index).
For high availability, use at least two replicas. The recommended approach is to allocate n+1 replicas where n is your minimum for throughput. Pinecone distributes replicas across availability zones (up to three per region), so if one zone fails, remaining replicas continue serving queries.

To determine how many replicas you need, test your workload and then calculate the number of replicas.

Actual performance varies based on workload characteristics (query complexity, vector dimensions, metadata characteristics), metadata filter selectivity, and node type (b1 vs t1). Always test with your specific workload.

Index fullness

Index fullness measures how much of your index’s allocated capacity is being used. To ensure predictable performance, dedicated read nodes cache your data in memory and on local SSD.

You can use Pinecone’s API to check index fullness. There are three metrics to monitor: memoryFullness, storageFullness, and indexFullness.
indexFullness is the maximum of memoryFullness and storageFullness.
Usually, storage fills up first. However, memory can be the limiting factor when you have b1 nodes with many low-dimension vectors, or when you have t1 nodes with high-dimension vectors and lots of metadata.
Monitor fullness regularly and add shards before your index reaches capacity. When indexFullness reaches 1.0 (100%), write operations (upsert, update, delete) are blocked, but read operations continue to work normally.

Add shards when index fullness reaches 70-80%, especially if you expect continued growth. Adding shards reduces storage fullness (index data is spread across shards, so each stores less) and memory fullness (with less data per shard, there’s less to cache in memory), helping you avoid write failures.

Test your workload

To choose between on-demand and dedicated read nodes, or to optimize your dedicated read nodes configuration, test with your actual workload. Performance varies based on factors such as the size of your index,vector dimensionality, metadata characteristics, and query patterns.

Calculate the size of your index

Calculate the size of your index to determine how many shards it requires.

Create and populate a test index

Create a dedicated read nodes index with representative data for your workload. You’ll use this index for testing.

If you don’t restore your test index from a backup, you can upsert or import your data.

Migrate your test index to dedicated read nodes (if necessary)

If your test index is on-demand, migrate it to dedicated read nodes. To start, use a single b1 replica.

Don’t migrate your production index yet. At this point, you’re just testing your workload.

Run a load test

Run realistic query patterns against your test index, gradually increasing QPS. For example, start at 10 QPS for about 30 minutes, then increase by 10 QPS increments while monitoring latency. Identify the QPS where latency exceeds your target threshold.

Calculate replicas

Throughput scales approximately linearly with replicas. For example, if one replica handles 50 QPS at your target latency, two replicas should handle approximately 100 QPS. However, performance can vary based on metadata filter selectivity.To calculate the number of replicas required for your target QPS, use this formula, rounding up:

Minimum replicas = (Required QPS) / (QPS per replica)

For more information, see Number of replicas.

Adjust and re-test

To meet your performance and cost goals, adjust your configuration as needed and re-test:

Add or remove shards for storage capacity
Add or remove replicas for throughput
Change node types for different performance characteristics

Continue iterating until you meet your requirements with room for growth.

Calculate the size of your index

To determine how many shards your index requires, calculate your index size and then use the formula in the section below.

Index size

A record can include a dense vector, a sparse vector, or both. Use the formula for your index type to calculate total size:

Dense index
Sparse index
Hybrid index

A dense index contains records with one dense vector each.

Dense index records can also contain sparse vectors (when the index metric is set to dotproduct), which can be useful for hybrid search. To learn how to calculate the size of a hybrid index, see Hybrid index.

Calculate dense index size (assuming no sparse vectors)

Index size = Number of records × (
               ID size + 
               Metadata size +
               Dense vector dimensions × 4 bytes
             )

Where:

ID size and Metadata size are measured in bytes, averaged across all records.
Each Dense vector dimension uses 4 bytes.

Example dense index calculationsThese examples assume 8-byte IDs:

Records	Dense vector dimensions	Avg metadata size	Index size
500,000	768	500 bytes	1.79 GB
1,000,000	1536	1,000 bytes	7.15 GB
5,000,000	1024	15,000 bytes	95.5 GB
10,000,000	1536	1,000 bytes	71.5 GB

Example: 500,000 records × (8-byte ID + (768 dense vector dimensions × 4 bytes) + 500 bytes of metadata) = 1.79 GB

A sparse index contains records with one sparse vector each.Calculate sparse index size

Index size = Number of records × (
               ID size + 
               Metadata size +
               Number of non-zero sparse values × 8 bytes
             )

Where:

ID size and Metadata size are measured in bytes, averaged across all records.
Number of non-zero sparse values: Average number across all records. To find the count for a single record, check the length of the sparse vector’s indices or values array. Each non-zero value uses 8 bytes.

Example sparse index calculationsThese examples assume 8-byte IDs:

Records	Avg number of non-zero sparse values	Avg metadata size	Index size
500,000	10	500 bytes	0.29 GB
1,000,000	50	1,000 bytes	1.41 GB
5,000,000	100	15,000 bytes	79.0 GB
10,000,000	50	1,000 bytes	14.1 GB

Example: 500,000 records × (8-byte ID + (10 non-zero sparse values × 8 bytes) + 500 bytes of metadata) = 0.29 GB

A hybrid index contains records that each have one dense vector and an optional sparse vector.Calculate hybrid index size

Index size = Number of records × (
               ID size + 
               Metadata size +
               Dense vector dimensions × 4 bytes + 
               Number of non-zero sparse values × 8 bytes
             )

Where:

ID size and Metadata size are measured in bytes, averaged across all records.
Each Dense vector dimension uses 4 bytes.
Number of non-zero sparse values: Average number across all records, including those without sparse vectors. To find the count for a single record, check the length of the sparse vector’s indices or values array. Each non-zero value uses 8 bytes.

Example hybrid index calculationsThese examples assume 8-byte IDs:

Records	Dense vector dimensions	Avg number of non-zero sparse values	Avg metadata size	Index size
500,000	768	10	500 bytes	1.83 GB
1,000,000	1536	50	1,000 bytes	7.54 GB
5,000,000	1024	100	15,000 bytes	99.5 GB
10,000,000	1536	50	1,000 bytes	75.4 GB

Example: 500,000 records × (8-byte ID + (768 dense vector dimensions × 4 bytes) + (10 non-zero sparse values × 8 bytes) + 500 bytes of metadata) = 1.83 GB

Number of shards

To calculate the number of shards your index requires, divide the size of your index by 250 GB and round up:

Minimum shards = (Index size) / (250 GB per shard)

To maintain optimal performance, provision additional shards to keep your index at 70-80% capacity. For example, a 500 GB index should have three shards (750 GB capacity = 67% full), not two shards (500 GB capacity = 100% full). Example shard calculations

Index size	Minimum shards	Recommended shards
~71 GB	1 (250 GB; 28% full)	1 (250 GB; 28% full)
~300 GB	2 (500 GB; 60% full)	2 (500 GB; 60% full)
~400 GB	2 (500 GB; 80% full)	3 (750 GB; 53% full)

Other considerations

Every index must have at least one shard. However, you can pause an index by reducing its replicas to 0.
After you’ve created your index, monitor its fullness. When your index approaches capacity, you can add shards.

Number of replicas

To calculate the number of replicas your index requires, first test your workload to find the QPS a single replica can handle at your target latency. Then, use this formula, and round up:

Minimum replicas = (Required QPS) / (QPS per replica)

For example, if one replica handles 50 QPS at your target latency and you need 150 QPS, you need three replicas. Other considerations

Throughput scales approximately linearly with replicas, but performance can vary based on metadata filter selectivity.
For high availability, allocate n+1 replicas where n is your minimum for throughput. Pinecone distributes replicas across availability zones.

Create a dedicated read nodes index

You can create a dedicated read nodes index from scratch or from a backup of an existing index.

From scratch

To create a new dedicated read nodes index from scratch, call Create an index. In the request body, in the spec.serverless.read_capacity object, set the following fields:

Field	Value	Notes
`mode`	`Dedicated`
`dedicated.node_type`	`b1` or `t1`	See node types
`dedicated.scaling`	`Manual`	Currently the only option
`dedicated.manual.shards`	Number of shards needed	Minimum 1 shard; each shard provides 250 GB of storage
`dedicated.manual.replicas`	Number of replicas needed	Minimum 0 (this pauses the index)

To learn how to determine the number of shards and replicas your index requires, see Calculate the size of your index.

Example

PINECONE_API_KEY="YOUR_API_KEY"

curl -X POST "https://api.pinecone.io/indexes" \
     -H "Accept: application/json" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "name": "example-dedicated-index",
           "dimension": 1024,
           "metric": "cosine",
           "deletion_protection": "enabled",
           "tags": {
             "environment": "production"
           },
           "vector_type": "dense",
           "spec": {
             "serverless": {
               "cloud": "aws",
               "region": "us-east-1",
               "read_capacity": {
                 "mode": "Dedicated",
                 "dedicated": {
                   "node_type": "b1",
                   "scaling": "Manual",
                   "manual": {
                     "shards": 2,
                     "replicas": 1
                   }
                 }
               }
             }
           }
         }'

Example response:

{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": false,
    "state": "Initializing"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 2, // <---- desired state
            "replicas": 1
          }
        },
        "status": {
          "state": "Migrating",
          "current_shards": null, // <---- current state
          "current_replicas": null
        }
      }
    }
  },
  "deletion_protection": "enabled",
  "tags": {
    "environment": "production"
  }
}

The response includes two status fields:

Field	Description
`status.state`	Overall index status (for example, `Initializing`, `Ready`, `Terminating`)
`spec.serverless.read_capacity.status.state`	Read capacity status (`Migrating`, `Scaling`, `Ready`, `Error`)

When creating a dedicated read nodes index, status.state transitions to Ready as soon as the index is ready for reads and writes.However, spec.serverless.read_capacity.status.state remains Migrating until the index scales to its full read capacity, at which point it transitions to Ready.

After creating the index, upsert or import your data.

To upsert and search with text instead of vectors, you can configure your index to use a hosted embedding model. Call Configure an index and specify the embed object in the request body.

From a backup

To create a dedicated read nodes index from a backup:

Restore the backup. This creates a new on-demand index with the same data as the original.
If the restored index has multiple namespaces, delete all of them except the one you want to keep. Dedicated read nodes currently only support one namespace.
Migrate the index to dedicated read nodes.

Migrate to dedicated read nodes

To migrate an existing on-demand index to dedicated read nodes, follow these steps:

Create a backup of your index

Create a backup of your index. If you later find that on-demand is preferable, you can restore the backup to a new on-demand index or contact support to migrate back.

Delete extra namespaces

If your index has multiple namespaces, delete all of them except the one you want to keep. Dedicated read nodes currently only support a single namespace.

If this is a production index, be sure to make a backup before deleting namespaces. Or, if you need multiple namespaces, contact support to discuss early access to multi-namespace support for dedicated read nodes.

Calculate your index size

Calculate your index size to determine how many shards you need.

Migrate the index

To migrate the index, call Configure an index. In the request body, in the spec.serverless.read_capacity object, set the following fields:

Field	Value	Notes
`mode`	`Dedicated`
`dedicated.node_type`	`b1` or `t1`	See node types
`dedicated.scaling`	`Manual`	Currently the only option
`dedicated.manual.shards`	Number of shards needed	Minimum 1 shard; each shard provides 250 GB of storage
`dedicated.manual.replicas`	Number of replicas needed	Minimum 0 (this pauses the index)

ExampleThis example migrates an index to dedicated read nodes using b1 nodes, one shard, and one replica:

PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X PATCH "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Accept: application/json" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "spec": {
             "serverless": {
               "read_capacity": {
                 "mode": "Dedicated",
                 "dedicated": {
                   "node_type": "b1",
                   "scaling": "Manual",
                   "manual": {
                     "shards": 1,
                     "replicas": 1
                   }
                 }
               }
             }
           }
         }'

Example response:

{
  "name": "example-index-to-migrate",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-index-to-migrate-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 1, // <---- desired state
            "replicas": 1
          }
        },
        "status": {
          "state": "Migrating",
          "current_shards": null, //<---- current state
          "current_replicas": null
        }
      }
    }
  },
  "deletion_protection": "disabled",
  "tags": null,
  "embed": {
    "model": "llama-text-embed-v2",
    "field_map": {
      "text": "text"
    },
    "dimension": 1024,
    "metric": "cosine",
    "write_parameters": {
      "dimension": 1024,
      "input_type": "passage",
      "truncate": "END"
    },
    "read_parameters": {
      "dimension": 1024,
      "input_type": "query",
      "truncate": "END"
    },
    "vector_type": "dense"
  }
}

The response includes two status fields:

Field	Description
`status.state`	Overall index status (for example, `Initializing`, `Ready`, `Terminating`)
`spec.serverless.read_capacity.status.state`	Read capacity status (`Migrating`, `Scaling`, `Ready`, `Error`)

If status.state is set to Error, the allocated number of shards was insufficient for the size of the index. Try again, adding more shards as needed.

Monitor the migration

Monitor the status of the migration. When the migration is complete, spec.serverless.read_capacity.status.state is Ready.

Monitor index performance

After migrating, monitor your index performance to verify that it meets expectations.

Manage your index

The following sections describe how to manage a dedicated read nodes index using version 2025-10 of the Pinecone API.

Add a hosted embedding model

To upsert and search with text instead of vectors, you can configure your index to use a hosted embedding model. To do this, call Configure an index and provide an embed object in the request body. In this object:

For the text field, specify the name of the field in your data that contains the text to be embedded.
Specify a model whose dimension requirements match the dimensions of your index.

ExampleExample request:

PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X PATCH "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "embed": {
             "field_map": {
               "text": "chunk_text"
             },
             "model": "llama-text-embed-v2",
             "read_parameters": {
               "input_type": "query",
               "truncate": "NONE"
             },
             "write_parameters": {
               "input_type": "passage"
             }
           }
         }'

Example response:

{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 2,
            "replicas": 1
          }
        },
        "status": {
          "state": "Ready",
          "current_shards": 2,
          "current_replicas": 1
        }
      }
    }
  },
  "deletion_protection": "enabled",
  "tags": {
    "environment": "testing"
  },
  "embed": {
    "model": "llama-text-embed-v2",
    "field_map": {
      "text": "chunk_text"
    },
    "dimension": 1024,
    "metric": "cosine",
    "write_parameters": {
      "dimension": 1024,
      "input_type": "passage",
      "truncate": "END"
    },
    "read_parameters": {
      "dimension": 1024,
      "input_type": "query",
      "truncate": "NONE"
    },
    "vector_type": "dense"
  }
}

You can also create a dedicated read nodes index when calling Create an index with integrated embedding. In the request body, use the read_capacity object to configure node type, shards, and replicas for dedicated read nodes.

Monitor index fullness

To check index fullness, call Get index stats.ExampleExample request:

# To get the unique host for an index,
# see https://docs.pinecone.io/guides/manage-data/target-an-index
PINECONE_API_KEY="YOUR_API_KEY"
INDEX_HOST="YOUR_INDEX_HOST"

curl -X GET "https://$INDEX_HOST/describe_index_stats" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10"

Example response:

{
  "namespaces": {
    "__default__": {
      "vectorCount": 705000
    }
  },
  "indexFullness": 0.01,
  "totalVectorCount": 705000,
  "dimension": 1536,
  "metric": "cosine",
  "vectorType": "dense",
  "memoryFullness": 0.01,
  "storageFullness": 0.01
}

In the response, indexFullness describes how full the index is, on a scale of 0 to 1. It’s set to the greater of memoryFullness and storageFullness.

Add or remove shards

To add or remove shards, call Configure an index. This operation does not require downtime, but can take up to 30 minutes to complete. In the request body, set the following fields:

Field	Value	Notes
`spec.serverless.read_capacity.mode`	`Dedicated`
`spec.serverless.read_capacity.dedicated.scaling`	`Manual`
`spec.serverless.read_capacity.dedicated.manual.shards`	Desired number of shards	Each shard provides 250 GB of storage

ExampleExample request:

PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X PATCH "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Accept: application/json" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "spec": {
             "serverless": {
               "read_capacity": {
                 "mode": "Dedicated",
                 "dedicated": {
                   "scaling": "Manual",
                   "manual": {
                     "shards": 3
                   }
                 }
               }
             }
           }
         }'

Example response:

{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 3, // <---- desired state
            "replicas": 1
          }
        },
        "status": {
          "state": "Scaling",
          "current_shards": 2, // <---- current state
          "current_replicas": 1
        }
      }
    }
  },
  "deletion_protection": "disabled",
  "tags": null,
  "embed": {
    "model": "llama-text-embed-v2",
    "field_map": {
      "text": "text"
    },
    "dimension": 1024,
    "metric": "cosine",
    "write_parameters": {
      "dimension": 1024,
      "input_type": "passage",
      "truncate": "END"
    },
    "read_parameters": {
      "dimension": 1024,
      "input_type": "query",
      "truncate": "END"
    },
    "vector_type": "dense"
  }
}

Configuration change limits:

You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
A new configuration change can only be initiated after the previous configuration change has completed.
Each configuration change can take up to 30 minutes to complete.
Read and write operations continue normally during configuration changes.

Add or remove replicas

To add or remove replicas, call Configure an index. This operation does not require downtime, but can take up to 30 minutes to complete. In the request body, set the following fields:

Field	Value	Notes
`spec.serverless.read_capacity.mode`	`Dedicated`
`spec.serverless.read_capacity.dedicated.scaling`	`Manual`
`spec.serverless.read_capacity.dedicated.manual.replicas`	Desired number of replicas	Add replicas to increase query throughput

ExampleExample request:

PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X PATCH "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Accept: application/json" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "spec": {
             "serverless": {
               "read_capacity": {
                 "mode": "Dedicated",
                 "dedicated": {
                   "scaling": "Manual",
                   "manual": {
                     "replicas": 2
                   }
                 }
               }
             }
           }
         }'

Example response:

{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 1,
            "replicas": 2 // <---- desired state
          }
        },
        "status": {
          "state": "Scaling",
          "current_shards": 1,
          "current_replicas": 1 // <---- current state
        }
      }
    }
  },
  "deletion_protection": "disabled",
  "tags": null,
  "embed": {
    "model": "llama-text-embed-v2",
    "field_map": {
      "text": "text"
    },
    "dimension": 1024,
    "metric": "cosine",
    "write_parameters": {
      "dimension": 1024,
      "input_type": "passage",
      "truncate": "END"
    },
    "read_parameters": {
      "dimension": 1024,
      "input_type": "query",
      "truncate": "END"
    },
    "vector_type": "dense"
  }
}

Configuration change limits:

You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
A new configuration change can only be initiated after the previous configuration change has completed.
Each configuration change can take up to 30 minutes to complete.
Read and write operations continue normally during configuration changes.

Change node types

You can change node types in either direction (b1 → t1 or t1 → b1). This operation does not require downtime, but can take up to 30 minutes to complete.

The most predictable way to increase throughput is by increasing replicas.

t1 nodes cache more data in memory than b1 nodes. Because of this, switching from b1 to t1 may require more shards.If your new configuration doesn’t have enough shards, the configuration change will fail with an error telling you how many shards are required. Update the request and retry.In the meantime, your index will continue to function normally in its original configuration.

To change node types, call Configure an index. In the request body, set the following fields:

Field	Value	Notes
`spec.serverless.read_capacity.mode`	`Dedicated`
`spec.serverless.read_capacity.dedicated.node_type`	`b1` or `t1`	See node types

ExampleExample request to change from b1 to t1:

PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X PATCH "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Accept: application/json" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "spec": {
             "serverless": {
               "read_capacity": {
                 "mode": "Dedicated",
                 "dedicated": {
                   "node_type": "t1"
                 }
               }
             }
           }
         }'

Example response:

{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "t1",
          "scaling": "Manual",
          "manual": {
            "shards": 1,
            "replicas": 1
          }
        },
        "status": {
          "state": "Scaling",
          "current_shards": 1,
          "current_replicas": 1
        }
      }
    }
  },
  "deletion_protection": "disabled",
  "tags": null,
  "embed": {
    "model": "llama-text-embed-v2",
    "field_map": {
      "text": "text"
    },
    "dimension": 1024,
    "metric": "cosine",
    "write_parameters": {
      "dimension": 1024,
      "input_type": "passage",
      "truncate": "END"
    },
    "read_parameters": {
      "dimension": 1024,
      "input_type": "query",
      "truncate": "END"
    },
    "vector_type": "dense"
  }
}

Configuration change limits:

You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
A new configuration change can only be initiated after the previous configuration change has completed.
Each configuration change can take up to 30 minutes to complete.
Read and write operations continue normally during configuration changes.

Pause an index

To pause an index, set the number of replicas to 0. This operation can take up to 30 minutes to complete.

While an index is paused, you cannot write to it or read from it. For a paused index, you’re billed for storage, but not for node costs, reads, or writes.

Check the status of a configuration change

After making a configuration change to a dedicated read nodes index (changing shards, replicas, or node type), check the status of the change by calling Describe an index.ExampleExample request:

PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X GET "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" 

Example response (index scaling from one to two replicas):

{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1536,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 1,
            "replicas": 2 // <---- desired state
          }
        },
        "status": {
          "state": "Scaling", 
          "current_shards": 1,
          "current_replicas": 1 // <---- current state
        }
      }
    }
  },
  "deletion_protection": "enabled",
  "tags": {
    "tag0": "value0"
  }
}

The response includes two status fields:

Field	Description
`status.state`	Overall index status (for example, `Initializing`, `Ready`, `Terminating`)
`spec.serverless.read_capacity.status.state`	Read capacity status (`Migrating`, `Scaling`, `Ready`, `Error`)

When changing node types, shards, or replicas, monitor the read capacity status (spec.serverless.read_capacity.status.state). Possible values:

State	Description
`Ready`	The change is complete and the index is ready to serve queries at full capacity.
`Scaling`	A change to the number of shards or replicas is in progress.
`Migrating`	A change to the node type or read capacity is in progress.
`Error`	The operation failed. For migrations to dedicated, this typically means you didn’t allocate enough shards for your index size. Check `error_message` for details, and retry with more shards.

During changes to shards, replicas, and node type, the index-level status (status.state) remains Ready. This is because the index can handle reads and writes while its dedicated read capacity scales.

Configuration change limits:

You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
A new configuration change can only be initiated after the previous configuration change has completed.
Each configuration change can take up to 30 minutes to complete.
Read and write operations continue normally during configuration changes.

Migrate from dedicated to on-demand

To change a dedicated read nodes index to on-demand, contact support. This can’t be done with the API.

Limits

The following limits apply to dedicated read nodes:

Read limits

Dedicated read nodes indexes are not subject to read-operation rate limits, like on-demand indexes are. However, if your query rate exceeds the compute capacity of your index, you may observe decreased query throughput. In such cases, consider adding replicas to increase the compute resources of the index.

Write limits

On dedicated read nodes indexes, write operations (upsert, update, delete) have the same rate limits as on-demand indexes.Writes that would cause your index to exceed its storage capacity are blocked. In such cases, consider adding shards to increase available storage. To determine how close to the write limit you are, check index fullness.

Namespace limits

Currently, dedicated read nodes indexes only support a single namespace. However, multi-namespace support is coming soon. For early access, contact support.

Shard, replica, and node limits

ShardsThe minimum number of shards per index is 1.ReplicasThe minimum number of replicas per index is 0, which pauses the index.NodesThe maximum number of nodes per project is 20. This is a project limit, not an index limit.To calculate your total node count, multiply shards × replicas for each of your project’s indexes, and then sum the results. This total must not exceed 20. For example, if you have two indexes that each have two shards and three replicas, your total node count is (2 × 3) + (2 × 3) = 12 nodes.To increase your project’s node limit, contact support.

Configuration change limits:

You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
A new configuration change can only be initiated after the previous configuration change has completed.
Each configuration change can take up to 30 minutes to complete.
Read and write operations continue normally during configuration changes.

memoryFullness

memoryFullness is an approximation and doesn’t yet account for metadata. For more information, see Index fullness.

Migrating from dedicated to on-demand

To migrate an index from dedicated to on-demand, contact support. This cannot be done with the API.

Cost

The cost of an index has three components: read costs, write costs, and storage costs. On-demand and dedicated read nodes share infrastructure for writes and storage, so these costs are the same. However, dedicated read nodes provision dedicated hardware for read operations (query, fetch, list), which changes how read costs are calculated.

Cost component	On-demand	Dedicated read nodes
Read costs	Usage-based: 1 RU per 1 GB namespace size per query	Fixed hourly rate: Based on node type, shards, and replicas
Write costs	Usage-based	Usage-based (same as on-demand)
Storage costs	Usage-based	Usage-based (same as on-demand)

If you use a hosted model for search or reranking, there are additional inference costs.

Calculate dedicated read nodes costs

To calculate the total cost of a dedicated read nodes index, use this formula:

(Node rate × shards × replicas) + storage costs + write costs

Term	Description
Node rate	Monthly rate for the node type (`b1` or `t1`), which varies by cloud region. See Pinecone pricing.
Shards	Number of shards allocated
Replicas	Number of replicas allocated
Storage costs	Usage-based, same as on-demand
Write costs	Usage-based, same as on-demand

For help estimating costs, use the Pinecone pricing calculator or contact us.

Example: If the rate for b1 nodes on aws-us-east-1 is $168.21/month ($0.23/hour), an index with two shards and two replicas would cost:

168.21 × 2 × 2 = $672.84/month, plus storage and write costs

Get started

Index data

Search

Optimize

Manage data

Manage cost

Move to production

Admin

Operations

Using pods

Overview

On-demand vs dedicated

When to use dedicated read nodes

Key concepts

Node types

Shards

Replicas

Index fullness

Test your workload

Calculate the size of your index

Index size

Number of shards

Number of replicas

Create a dedicated read nodes index

From scratch

From a backup

Migrate to dedicated read nodes

Manage your index

Limits

Cost

Get started

Index data

Search

Optimize

Manage data

Manage cost

Move to production

Admin

Operations

Using pods

​Overview

​On-demand vs dedicated

​When to use dedicated read nodes

​Key concepts

​Node types

​Shards

​Replicas

​Index fullness

​Test your workload

​Calculate the size of your index

​Index size

​Number of shards

​Number of replicas

​Create a dedicated read nodes index

​From scratch

​From a backup

​Migrate to dedicated read nodes

​Manage your index

​Limits

​Cost

Overview

On-demand vs dedicated

When to use dedicated read nodes

Key concepts

Node types

Shards

Replicas

Index fullness

Test your workload

Calculate the size of your index

Index size

Number of shards

Number of replicas

Create a dedicated read nodes index

From scratch

From a backup

Migrate to dedicated read nodes

Manage your index

Limits

Cost