Vector Databases
Vector databases can transform complex unstructured data into multi-dimensional coordinate values, enabling computation of nearly any concept. Once data enters the mathematical domain, it becomes calculable and searchable. These databases serve as external knowledge repositories for large language models, providing up-to-date comprehensive information for real-time queries. They also grant LLMs long-term memory capabilities, preventing conversation discontinuities. Vecter databases have become essential companions for language models.
For documentation and resources: https://img-bss.csdnimg.cn/1113tusoutuanli.pdf
Note: The official documentation contains some errors. The sample code repository at the end of the documentation may be challenging for beginners due to its complexity. Learning a new database system can feel overwhelming without proper guidance.
Connecting to the Database
First, create a free instence and enable external network access. When configuring the firewall, set the whitelist to 0.0.0.0/0 to accept all incoming connections. Retrieve the database credentials (username and password) from the dashboard for authentication.
Creating a Database
import tcvectordb
from tcvectordb.model.enum import FieldType, IndexType, MetricType, ReadConsistency
# Initialize the client
client = tcvectordb.VectorDBClient(
url='http://example-host',
username='admin',
key='your-secret-key',
read_consistency=ReadConsistency.EVENTUAL_CONSISTENCY,
timeout=30
)
# Create a database instance
db = client.create_database(database_name='literature_db')
print(db.database_name)
# Enumerate all available databases
databases = client.list_databases()
for database in databases:
print(database.database_name)
After creating the database, you need to create a collection instead of a traditional table, as vector databases use different terminology.
Creating a Collection
Collections are analogous to tables but store vectors instead of scalar values. The collection can integrate embedding functionality directly, eliminating the need for external embedding models to convert text into vectors.
Index Configuration
When working with text fields corresponding to vectors, avoid creating indexes on them. This wastes memory without providing benefits. For business filtering with WHERE conditions during queries, you must define separate indexes on those specific fields. Vector databases support dynamic schemas, allowing arbitrary fields during data ingestion without predefinition—similar to MongoDB.
The primary key id and vector field vector are mandatory and fixed. Ensure consistency: when using an integrated embedding model, the vector field dimension must match the model's output dimension (768 for BGE_BASE_ZH).
db = client.database('literature_db')
# Configure indexes for filtering and vector search
index = Index(
FilterIndex(name='id', field_type=FieldType.String, index_type=IndexType.PRIMARY_KEY),
VectorIndex(
name='vector',
dimension=768,
index_type=IndexType.HNSW,
metric_type=MetricType.COSINE,
params=HNSWParams(m=16, efconstruction=200)
),
FilterIndex(name='author', field_type=FieldType.String, index_type=IndexType.FILTER),
FilterIndex(name='title', field_type=FieldType.String, index_type=IndexType.FILTER)
)
# Configure built-in embedding model
embeddings = Embedding(vector_field='vector', field='text', model=EmbeddingModel.BGE_BASE_ZH)
# Create the collection
collection = db.create_collection(
name='classic_novels',
shard=1,
replicas=0,
description='Collection for classic Chinese literature',
embedding=embeddings,
index=index
)
print(vars(collection))
The database and collection are now established. The underlying principles resemble traditional databases like MySQL, though the terminology differs.
Inserting and Replacing Data
During insertion, if a document with the same ID already exists, the original document gets deleted and replaced with the new data. Many fields remain optional—similar to MongoDB's flexible schema—but the text field must match the embedding configuration field name to enable vector conversion.
collection = db.collection('classic_novels')
# Insert data with automatic index rebuilding
result = collection.upsert(
documents=[
Document(
id='001',
text="The rise and fall of empires follows cyclical patterns.",
author='Luo Guanzhong',
title='Romance of the Three Kingdoms',
chapter=21
),
Document(
id='002',
text="Before heaven and earth took shape, chaos reigned supreme.",
author='Wu Cheng\'en',
title='Journey to the West',
chapter=22
),
Document(
id='003',
text="Zhen Shiyin glimpsed eternity in dreams, while Jia Yucun harbored aspirations.",
author='Cao Xueqin',
title='Dream of the Red Chamber',
chapter=23
)
],
build_index=True
)
Setting build_index=True ensures immediate queryability. With build_index=False, inserted data remains unsearchable until the index is rebuilt separately.
Querying Data
Vector database queries support two modes: exact matching and similarity search. Exact queries filter non-vector fields with precise conditions. Since indexes were created on author and title, you can use these fields for filtering. Similarity search operates on vector embeddings of text, returning results ranked by relevance.
results = collection.searchByText(
embeddingItems=['Empires rise and fall in cycles.'],
filter=Filter(Filter.In("title", ["Romance of the Three Kingdoms", "Journey to the West"])),
params=SearchParams(ef=200),
limit=3,
retrieve_vector=False,
output_fields=['title', 'author']
)
for group in results.get("documents"):
for doc in group:
print(doc)
Text-based searches can return similar results, while filter conditions narrow down the scope. The web console also provides an interface for constructing exact queries by specifying WHERE conditions.
Deletion operations folow similar patterns and can be explored once the basic workflow is familiar. The interface resembles Kibana for Elasticsearch queries, making it intuitive for developers with search engine experience. Subsequent articles will cover image and video vector operations. The Python SDK demonstrates excellent usability, offering a smoother development experience compared to traditional database connectors.