Project Structure and Modules
Primary Directories
- APIs (
apis/): Defines the gRPC and RESTful service contracts, including request and response payloads for inference operations. - Core (
core/): The foundational engine handling resource allocation, request routing, model lifecycle oversight, and version control. - Model Servers (
model_servers/): Houses the primary server executable, runtime configurations, and initialization logic. - Servables (
servables/): Concrete implementations for serving various model formats (e.g., TensorFlow SavedModels), managing the actual loading and execution. - Batching (
batching/): Implements request scheduling and queuing to combine multiple inference requests, optimizing hardware utilization. - Configuration (
config/): Parses and manages server settings and model deployment configurations. - Sources (
sources/): Monitors and retrieves model artifacts from diverse storage backends, triggering updates. - Utilities (
util/): Shared helper functions, observability hooks (logging/metrics), and common abstractions.
Auxiliary Directories
example/: Usage demonstrations and sample implementations.tools/: Development and build utilities.third_party/: External dependency management.
Design Principles
- Decoupled Modularity: Distinct separation of concerns across components.
- Extensibility: Pluggable architecture for new model formats and storage sources.
- Performance-Oriented: Latency reduction via dynamic batching.
- Production Readiness: Built-in versioning, state monitoring, and logging.
Service Contracts and Protocol Buffers
Primary RPCs
- Prediction Service (
prediction_service.proto)Predict: Standard inference endpoint.Classify: Dedicated categorical output endpoint.Regress: Dedicated numerical output endpoint.
- Model Management (
model_service.proto)GetModelStatus: Queries the current state of a deployed model.HandleReloadConfigRequest: Dynamically reloads model configurations.
Key Data Structures
- Model Definitions:
ModelSpec (identifier and version), SignatureDef (input/output graph signatures), MetaGraphDef (graph metadata). - Inputs:
Input, ExampleList, ExampleListWithContext. - Inference Payloads
- Prediction:
PredictRequest / PredictResponse - Classification:
ClassificationRequest / ClassificationResult - Regression:
RegressionRequest / RegressionResult
- Metadata & Status:
GetModelMetadataRequest / GetModelMetadataResponse, ModelStatus, ServiceStatus. - Observability:
PredictionLog, SessionLog, LogMetadata.
Interaction Patterns
Inference Request
// Endpoint: PredictionService.Predict
// Request
{
model_spec: { identifier: "my_classifier", version_label: "stable" },
tensor_inputs: { "features": input_array }
}
// Response
{
tensor_outputs: { "probabilities": output_array }
}
Status Query
// Endpoint: ModelService.GetModelStatus
// Request
{
model_spec: { identifier: "my_classifier" }
}
// Response
{
model_version_status: [
{ version: 3, state: SERVING }
]
}
Classification Request
// Endpoint: PredictionService.Classify
// Request
{
model_spec: { identifier: "sentiment_analyzer" },
input: { samples: [...] }
}
// Response
{
result: { categories: [...] }
}
End-to-End Request Pipeline
Client Request
│
├─► REST Protocol ──► HTTPServer
│ │
│ └─► RESTHandler (Translates to internal format)
│
└─► gRPC Protocol ──► PredictionServiceImpl
│
▼
[Core Processing Engine - ServerCore]
│
├─► 1. Request Validation & Preprocessing
│
├─► 2. AspiredVersionsManager
│ ├─► Version Targeting
│ └─► Lifecycle State Machine
│
├─► 3. Model Loader
│ └─► Fetches and instantiates Servable
│
├─► 4. Batching Scheduler
│ └─► Aggregates concurrent requests
│
└─► 5. Execution Engine
└─► Evaluates TensorFlow Graph
│
▼
[Response Handler]
│
▼
Client Response