Server-side document processing frequently requires extracting raw text from Word files or packaging unformatted strings into .docx containers. Traditional approaches using Microsoft.Office.Interop.Word demand local Office installations, suffer from version mismatches, and often cause memory leaks in unattended environments. Leveraging a dedicated .NET document processing library eliminates these constraints, enabling reliable, headless format translation.
Dependency Configuration
The conversion workflow relies on a third-party .NET package capable of parsing and generating both formats. Install the library via the NuGet Package Manager Console:
Install-Package Spire.Doc
Extracting Plain Text from Word Documents
Loading a .doc or .docx file and exporting it as a .txt file strips rich formatting while preserving structural line breaks. The underlying engine automatically maps Word paragraphs to newline characters, ensuring readable output without manual string manipulation.
using Spire.Doc;
using System.IO;
namespace DocumentFormatConverter
{
public class WordToTextProcessor
{
public static void ExtractText(string sourcePath, string destinationPath)
{
using (var wordFile = new Document())
{
wordFile.LoadFromFile(sourcePath);
wordFile.SaveToFile(destinationPath, FileFormat.Txt);
}
}
}
}
Processing Behavior:
- Content Filtering: Tables, embedded images, headers, and font attributes are discarded during export. Only raw character data and paragraph delimiters remain.
- Memory Safety: Wrapping the
Documentinstance in ausingblock guarantees deterministic disposal, preventing unmanaged resource accumulation in long-running applications.
Generating Word Documents from Raw Text
Converting a .txt file into a Word document involves parsing line feeds and reconstructing them as native paragraph objects. The resulting file adopts default document styling, which can be programmatically overridden before serialization.
using Spire.Doc;
using System.IO;
namespace DocumentFormatConverter
{
public class TextToWordProcessor
{
public static void BuildDocument(string textSource, string outputPath)
{
using (var targetDoc = new Document())
{
targetDoc.LoadText(textSource);
// Optional: Apply base formatting to all sections
foreach (Section section in targetDoc.Sections)
{
foreach (Paragraph para in section.Paragraphs)
{
para.Format.LineSpacing = 15f;
}
}
targetDoc.SaveToFile(outputPath, FileFormat.Docx2016);
}
}
}
}
Conversion Characteristics:
- Paragraph Mapping: Each carriage return or line feed in the source file translates directly to a distinct
Paragraphobject in the Word structure. - Styling Flexibility: Developers can iterate through generated sections to inject uniform margins, font families, or spacing rules prior to saving.
- Structural Limitations: Since plain text lacks metadata, the reconstruction process cannot infer or restore complex layouts such as multi-column arrangements, footnotes, or embedded objects from the original source.