Project Information
| Course | Repository Link |
|---|---|
| Software Engineering | Project Repo |
| Objective | Build a basic engineering project |
Personal Software Process (PSP) Summary
| Stage | Task Description | Estimated (Min) | Actual (Min) |
|---|---|---|---|
| Planning | Estimation | 30 | 30 |
| Development | Core Implementation | 720 | 805 |
| Analysis & Learning | 120 | 150 | |
| Documentation | 60 | 60 | |
| Review | 45 | 45 | |
| Standards | 45 | 50 | |
| Design | 60 | 70 | |
| Coding | 180 | 200 | |
| Code Review | 30 | 45 | |
| Testing | 180 | 185 | |
| Reporting | Documentation & Retrospective | 80 | 90 |
| Test Report | 60 | 70 | |
| Size Measurement | 10 | 15 | |
| Process Improvement | 10 | 5 | |
| Total | 830 | 925 |
Interface Design and Implementation
The core logic relies on calculating Cosine Similarity between text vectors. The implementation involves tokenizing input strings, merging vocabulary sets, and computing dot products and magnitudes to determine the similarity score.
Performance Analysis
Profiling indicates that the most frequent operations involve byte[] array manipulations and String processing, particularly during the tokenization and file reading stages.
Unit Testing Suite
1. Tokenization Module
Expand Code
public class TokenizerTest {
@Test
public void testComplexSentence() {
String input = "In the dark attic, on the moonbed, inside the turtle's dream.";
List<String> predicted = Arrays.asList("dark", "attic", "moonbed", "inside", "turtle", "dream");
List<String> result = TextTokenizer.tokenize(input);
assertEquals(predicted, result);
}
@Test
public void testPunctuationRemoval() {
String input = "Ah! Ah! Ah!";
List<String> predicted = Arrays.asList("Ah", "Ah", "Ah");
List<String> result = TextTokenizer.tokenize(input);
assertEquals(predicted, result);
}
@Test
public void testVocabularyMerge() {
List<String> first = Arrays.asList("apple", "banana");
List<String> second = Arrays.asList("cherry", "date");
List<String> predicted = Arrays.asList("apple", "banana", "cherry", "date");
List<String> result = VocabularyUtil.merge(first, second);
assertEquals(predicted, result);
}
}
2. Frequency Calculation Module
Expand Code
public class FrequencyCalculatorTest {
@Test
public void testWeightedFrequency() {
// Input where item count equals its value: 5 appears 5 times, 4 appears 4 times, etc.
List<String> source = Arrays.asList(
"5", "5", "5", "5", "5",
"4", "4", "4", "4",
"3", "3", "3",
"2", "2",
"1"
);
List<String> vocab = Arrays.asList("0", "1", "2", "3", "4", "5");
int[] predicted = {0, 1, 2, 3, 4, 5};
int[] result = FrequencyAnalyzer.compute(source, vocab);
assertArrayEquals(predicted, result);
}
@Test
public void testEmptySource() {
List<String> source = Arrays.asList();
List<String> vocab = Arrays.asList("0", "1", "2");
int[] predicted = {0, 0, 0};
int[] result = FrequencyAnalyzer.compute(source, vocab);
assertArrayEquals(predicted, result);
}
@Test
public void testEmptyVocabulary() {
List<String> source = Arrays.asList("1", "2");
List<String> vocab = Arrays.asList();
int[] predicted = {};
int[] result = FrequencyAnalyzer.compute(source, vocab);
assertArrayEquals(predicted, result);
}
}
3. Cosine Similarity Module
Expand Code
public class SimilarityMetricTest {
@Test
public void testIdenticalVectors() {
int[] vA = {5, 10, 15};
int[] vB = {5, 10, 15};
double score = SimilarityMetric.calculate(vA, vB);
assertEquals(1.0, score, 0.0001);
}
@Test
public void testOrthogonalVectors() {
int[] vA = {1, 0, 0};
int[] vB = {0, 2, 0};
double score = SimilarityMetric.calculate(vA, vB);
assertEquals(0.0, score, 0.0001);
}
@Test
public void testOppositeDirection() {
int[] vA = {2, 4, 6};
int[] vB = {-2, -4, -6};
double score = SimilarityMetric.calculate(vA, vB);
assertEquals(-1.0, score, 0.0001);
}
@Test
public void testProportionalVectors() {
int[] vA = {3, 3, 3};
int[] vB = {9, 9, 9};
double score = SimilarityMetric.calculate(vA, vB);
assertEquals(1.0, score, 0.0001);
}
@Test
public void testZeroVector() {
int[] vA = {0, 0, 0};
int[] vB = {1, 5, 9};
double score = SimilarityMetric.calculate(vA, vB);
assertEquals(0.0, score, 0.0001);
}
}
4. Test Coverage
The unit test suite achieves a high percentage of code coverage, ensuring critical paths in tokenization, frequency analysis, and vector mathematics are verified.
Exception Handling Strategy
- Argument Parsing: Handles cases where command-line arguments are missing or malformed.
- File I/O: Manages exceptions related to missing input files or permissions issues.
- Output Operations: Catches errors occurring during the writing of results to the file system.