File Hashing Strategy
To establish file uniqueness, implement an MD5 hash calculation using a sampling approach. This method processes the entire first and last chunks while sampling portions of intermediate chunks:
async generateFileHash(chunks) {
return new Promise(resolve => {
const hasher = new SparkMD5.ArrayBuffer();
const samples = [];
const chunkSize = CHUNK_SIZE;
chunks.forEach((chunk, idx) => {
if (idx === 0 || idx === chunks.length - 1) {
samples.push(chunk.data);
} else {
samples.push(chunk.data.slice(0, 2));
samples.push(chunk.data.slice(chunkSize / 2, chunkSize / 2 + 2));
samples.push(chunk.data.slice(chunkSize - 2, chunkSize));
}
});
const reader = new FileReader();
reader.readAsArrayBuffer(new Blob(samples));
reader.onload = e => {
hasher.append(e.target.result);
resolve(hasher.end());
};
});
}
Pre-Upload Validation
Before uploading, verify file existence and identify existing chunks:
async validateUpload(fileHash) {
const response = await axios.get(`/api/files/validate?hash=${fileHash}`);
const { exists, partialChunks } = response.data;
this.existingChunks = partialChunks || [];
return exists;
}
Backend validation implementation:
@GetMapping("/validate")
public UploadValidation validateUpload(String fileHash, HttpServletRequest request) {
String userId = JwtUtil.extractUserId(request.getHeader("Authorization"));
if (fileService.isFileExists(fileHash, userId)) {
return new UploadValidation(true, null);
}
Integer[] existingChunks = fileService.getExistingChunks(fileHash);
return new UploadValidation(false, existingChunks);
}
Chunk Upload Process
Upload chunks with concurrency control and progress tracking:
async uploadChunks(chunkData) {
const uploadQueue = chunkData.map(({ data, index, ...meta }) => {
const form = new FormData();
form.append("chunk", data);
form.append("hash", meta.fileHash);
form.append("index", index);
form.append("name", meta.fileName);
return { form, index };
});
const MAX_CONCURRENT = 4;
const activeTasks = [];
let processed = this.existingChunks.length;
for (const item of uploadQueue) {
if (this.existingChunks.includes(item.index)) continue;
const task = axios.post("/api/files/upload", item.form);
task.then(() => {
activeTasks.splice(activeTasks.indexOf(task), 1);
processed++;
this.progress = Math.floor((processed / chunkData.length) * 100);
});
activeTasks.push(task);
if (activeTasks.length >= MAX_CONCURRENT) {
await Promise.race(activeTasks);
}
}
await Promise.all(activeTasks);
}
Backend chunk handling:
@PostMapping("/upload")
public Integer handleChunk(@RequestParam("chunk") MultipartFile chunk,
@RequestParam String hash,
@RequestParam Integer index) {
if (redisService.existsInSet(hash, index)) {
return index;
}
String tempPath = TEMP_DIR + hash + "_" + index + ".part";
try (FileOutputStream stream = new FileOutputStream(tempPath)) {
stream.write(chunk.getBytes());
redisService.addToSet(hash, index);
return index;
} catch (IOException e) {
throw new ServiceException("Chunk upload failed");
}
}
File Assemb Process
Initiate file assembly after chunk upload completion:
assembleFile() {
axios.post("/api/files/assemble", {
fileHash: this.fileHash,
fileName: this.fileName,
totalChunks: this.chunks.length
}).then(response => {
if (response.data.success) {
this.progress = 100;
// Handle success
}
});
}
Backend assembly implementation:
@PostMapping("/assemble")
public FileAssemblyResult assembleFile(@RequestBody AssemblyRequest request) throws IOException {
String progressKey = request.fileHash + "-progress";
int startIndex = redisService.getProgress(progressKey) + 1;
try (FileOutputStream output = new FileOutputStream(FINAL_DIR + request.fileHash)) {
for (int i = startIndex; i < request.totalChunks; i++) {
String chunkPath = TEMP_DIR + request.fileHash + "_" + i + ".part";
Files.copy(Paths.get(chunkPath), output);
new File(chunkPath).delete();
redisService.updateProgress(progressKey, i);
}
}
fileService.saveFileMetadata(request, userId);
redisService.cleanup(request.fileHash, progressKey);
return new FileAssemblyResult(true, "File assembled successfully");
}
Fault Tolerance Mechanisms
The implementation provides resilience through:
- Hash-bassed file identification for deduplication
- Redis-tracked chunk upload progress
- Resumable assembly operations
- Partial chunk re-upload capability