DeepSeek API Integration
DeepSeek provides a convenient API service for integrating large language models into applications. This section covers the process of registering an account, testing the API with Postman, and implementing the API call in Python.
1.1 Account Registration and Funding
- Visit the DeepSeek Open Platform and click the "Register" button in the top right corner. Complete the registration using your phone number or email address. Note that real-name verification is required to enable the funding feature.
- After successful registration, navigate to the console homepage and click "Recharge Center" on the left sidebar. Choose a suitable recharge amount (small amounts like 10 yuan are supported for testing) and complete the payment. You can check your available balance in the "Account Balance" section.
- Once funded, create your API keys. Remember to save them securely!
1.2 Testing DeepSeek API with Postman
Postman is a popular tool for testing APIs. It allows you to visually verify the request format and response results.
Step 1: Download and Install Postman
Visit the Postman official download page and download the appropriate version for your operating system. Install it following the on-screen instructions.
Step 2: Configure the API Request
Refer to the DeepSeek official documentation for the "First API Call" page.
Open Postman, create a new request, and configure it according to the documentation. Set the appropriate method (usually POST), URL, headers (including the API key), and request body. Then, click "Send" to test the API.
1.3 Implemanting DeepSeek API in Python
After verifying the API with Postman, you can integrate the API call logic in to your Python code.
Step 1: Install Dependencies
DeepSeek API is compaitble with the OpenAI Python SDK, which is the recommended approach. Open your terminal and run the following command:
pip3 install openai
Step 2: Write Python Code and Modify Parameters
Create a new Python file (e.g., deepseek\_api\_integration.py) and use the following code as a reference. Modify the deepseek\_api\_key, selected\_model, and messages parameters as needed:
import openai
# Replace with your actual DeepSeek API key
deepseek_api_key = "YOUR_DEEPSEEK_API_KEY"
# Choose the model you want to use
selected_model = "deepseek-chat"
# Define the conversation messages
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
# Make the API call
response = openai.ChatCompletion.create(
api_key=deepseek_api_key,
model=selected_model,
messages=messages
)
# Print the response
print(response.choices[0].message['content'])
Run the script to see the API response.
Qwen3-0.6B Local Deployment
For scenarios requiring offline operation or private deployment, you can download and run the Qwen3-0.6B model locally. This section covers downloading the model from ModelScope and implementing local inference.
2.1 Preparing to Download Qwen3-0.6B
ModelScope is a popular open-source model library in China, providing official download channels for Qwen series models.
- Visit the ModelScope homepage and search for "Qwen3". Find the "Qwen/Qwen3-0.6B" model. Ensure you select the officially released model to avoid third-party mirrors.
- Check the model description to confirm the hardware requirements. Qwen3-0.6B has low memory requirements and can run on an 8GB GPU. It can also run on a CPU, but the speed will be slower.
2.2 Installing Dependencies and Downloading the Model
Step 1: Install ModelScope SDK
Open your terminal and run the following command to install the ModelScope Python SDK for downloading models:
pip install modelscope
Step 2: Download the Model via Terminal Command
Execute the following command in your terminal, specifying the model name and download path (the default download location is the user directory, but you can specify a custom path using the --local\_dir option. It's recommended to cd into your working directory before downloading!):
modelscope download --model Qwen/Qwen3-0.6B
2.3 Implementing Local Inference with Python
Create a new Python file (e.g., qwen\_local\_inference.py) and use the transformers library to load the model and perform dialogue inference. The following code includes key comments:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Define the path to the downloaded model
model_directory = "models/Qwen/Qwen3-0___6B"
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(model_directory, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_directory)
# Ensure the pad token is set
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# Determine the device (GPU if available, otherwise CPU)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
print("Performing inference, please wait...")
# User input
user_query = "Tell me a story"
# Define the conversation messages
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": user_query}
]
# Process the input using the correct method
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
add_special_tokens=True
).to(device)
# Explicitly create the attention mask to resolve warnings
attention_mask = (inputs != tokenizer.pad_token_id).long().to(device)
# Define inference parameters
gen_kwargs = dict(
do_sample=True,
top_k=10,
top_p=0.8,
temperature=0.6,
max_length=512,
attention_mask=attention_mask # Add the attention mask parameter
)
# Generate the response
with torch.no_grad():
outputs = model.generate(inputs, **gen_kwargs)
# Decode the output
input_length = inputs.shape[1]
new_length = outputs[0][input_length:]
response = tokenizer.decode(new_length, skip_special_tokens=True)
print(response)
Run the script to see the model's response.