Anthropic skill

docx

Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.

claude, google
#skill
#agent
#workflow
#pdf
#documents
Last updated

about 12 hours ago

Repository path

skills/docx

Source files

0

Use cases

| Task | Approach | |------|----------| | Read/analyze content | `pandoc` or unpack for raw XML | | Create new document | Use `docx-js` - see Creating New Documents below | | Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |
**Landscape orientation:** docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap: ```javascript size: { width: 12240, // Pass SHORT edge as width height: 15840, // Pass LONG edge as height orientation: PageOrientation.LANDSCAPE // docx-js swaps them in the XML }, // Content width = 15840 - left margin - right margin (uses the long edge) ```
Use Arial as the default font (universally supported). Keep titles black for readability.
```javascript const doc = new Document({ styles: { default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default paragraphStyles: [ // IMPORTANT: Use exact IDs to override built-in styles { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true, run: { size: 32, bold: true, font: "Arial" }, paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true, run: { size: 28, bold: true, font: "Arial" }, paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } }, ] }, sections: [{ children: [ new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }), ] }] }); ```
### Lists (NEVER use unicode bullets)

Install / setup

# DOCX creation, editing, and analysis ## Overview A .docx file is a ZIP archive containing XML files. ## Quick Reference | Task | Approach | |------|----------| | Read/analyze content | `pandoc` or unpack for raw XML | | Create new document | Use `docx-js` - see Creating New Documents below | | Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below | ### Converting .doc to .docx Legacy `.doc` files must be converted before editing: ```bash python scripts/office/soffice.py --headless --convert-to docx document.doc ``` ### Reading Content ```bash # Text extraction with tracked changes pandoc --track-changes=all document.docx -o output.md # Raw XML access python scripts/office/unpack.py document.docx unpacked/ ``` ### Converting to Images ```bash python scripts/office/soffice.py --headless --convert-to pdf document.docx pdftoppm -jpeg -r 150 document.pdf page ``` ### Accepting Tracked Changes To produce a clean document with all tracked changes accepted (requires LibreOffice): ```bash python scripts/accept_changes.py input.docx output.docx ``` --- ## Creating New Documents Generate .docx files with JavaScript, then validate. Install: `npm install -g docx` ### Setup ```javascript const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun, Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink, InternalHyperlink, Bookmark, FootnoteReferenceRun, PositionalTab, PositionalTabAlignment, PositionalTabRelativeTo, PositionalTabLeader, TabStopType, TabStopPosition, Column, SectionType, TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType, VerticalAlign, PageNumber, PageBreak } = require('docx'); const doc = new Document({ sections: [{ children: [/* content */] }] }); Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer)); ``` ### Validation After creating the file, validate it. If validation fails, unpack, fix the XML, and repack. ```bash python scripts/office/validate.py doc.docx ``` ### Page Size ```javascript // CRITICAL: docx-js defaults to A4, not US Letter // Always set page size explicitly for consistent results sections: [{ properties: { page: { size: { width: 12240, // 8.5 inches in DXA height: 15840 // 11 inches in DXA }, margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins } }, children: [/* content */] }] ``` **Common page sizes (DXA units, 1440 DXA = 1 inch):** | Paper | Width | Height | Content Width (1" margins) | |-------|-------|--------|---------------------------| | US Letter | 12,240 | 15,840 | 9,360 | | A4 (default) | 11,906 | 16,838 | 9,026 | **Landscape orientation:** docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap: ```javascript size: { width: 12240, // Pass SHORT edge as width height: 15840, // Pass LONG edge as height orientation: PageOrientation.LANDSCAPE // docx-js swaps them in the XML }, // Content width = 15840 - left margin - right margin (uses the long edge) ``` ### Styles (Override Built-in Headings) Use Arial as the default font (universally supported). Keep titles black for readability. ```javascript const doc = new Document({ styles: { default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default paragraphStyles: [ // IMPORTANT: Use exact IDs to override built-in styles { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true, run: { size: 32, bold: true, font: "Arial" }, paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true, run: { size: 28, bold: true, font: "Arial" }, paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } }, ] }, sections: [{ children: [ new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }), ] }] }); ``` ### Lists (NEVER use unicode bullets) ```javascript // ❌ WRONG - never manually insert bullet characters new Paragraph({ children: [new TextRun("• Item")] }) // BAD new Paragraph({ children: [new TextRun("\u2022 Item")] }) // BAD // ✅ CORRECT - use numbering config with LevelFormat.BULLET const doc = new Document({ numbering: { config: [ { reference: "bullets", levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT, style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] }, { reference: "numbers", levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT, style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] }, ] }, sections: [{ children: [ new Paragraph({ numbering: { reference: "bullets", level: 0 }, children: [new TextRun("Bullet item")] }), new Paragraph({ numbering: { reference: "numbers", level: 0 }, children: [new TextRun("Numbered item")] }), ] }] }); // ⚠️ Each reference creates INDEPENDENT numbering // Same reference = continues (1,2,3 then 4,5,6) // Different reference = restarts (1,2,3 then 1,2,3) ``` ### Tables **CRITICAL: Tables need dual widths** - set both `columnWidths` on the table AND `width` on each cell. Without both, tables render incorrectly on some platforms. ```javascript // CRITICAL: Always set table width for consistent rendering // CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" }; const borders = { top: border, bottom: border, left: border, right: border }; new Table({ width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs) columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch) rows: [ new TableRow({ children: [ new TableCell({ borders, width: { size: 4680, type: WidthType.DXA }, // Also set on each cell shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width) children: [new Paragraph({ children: [new TextRun("Cell")] })] }) ] }) ] }) ``` **Table width calculation:** Always use `WidthType.DXA` — `WidthType.PERCENTAGE` breaks in Google Docs. ```javascript // Table width = sum of columnWidths = content width // US Letter with 1" margins: 12240 - 2880 = 9360 DXA width: { size: 9360, type: WidthType.DXA }, columnWidths: [7000, 2360] // Must sum to table width ``` **Width rules:** - **Always use `WidthType.DXA`** — never `WidthType.PERCENTAGE` (incompatible with Google Docs) - Table width must equal the sum of `columnWidths` - Cell `width` must match corresponding `columnWidth` - Cell `margins` are internal padding - they reduce content area, not add to cell width - For full-width tables: use content width (page width minus left and right margins) ### Images ```javascript // CRITICAL: type parameter is REQUIRED new Paragraph({ children: [new ImageRun({ type: "png", // Required: png, jpg, jpeg, gif, bmp, svg data: fs.readFileSync("image.png"), transformation: { width: 200, height: 150 }, altText: { title: "Title", description: "Desc", name: "Name" } // All three required })] }) ``` ### Page Breaks ```javascript // CRITICAL: PageBreak must be inside a Paragraph new Paragraph({ children: [new PageBreak()] }) // Or use pageBreakBefore new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] }) ``` ### Hyperlinks ```javascript // External link new Paragraph({ children: [new ExternalHyperlink({ children: [new TextRun({ text: "Click here", style: "Hyperlink" })], link: "https://example.com", })] }) // Internal link (bookmark + reference) // 1. Create bookmark at destination new Paragraph({ heading: HeadingLevel.HEADING_1, children: [ new Bookmark({ id: "chapter1", children: [new TextRun("Chapter 1")] }), ]}) // 2. Link to it new Paragraph({ children: [new InternalHyperlink({ children: [new TextRun({ text: "See Chapter 1", style: "Hyperlink" })], anchor: "chapter1", })]}) ``` ### Footnotes ```javascript const doc = new Document({ footnotes: { 1: { children: [new Paragraph("Source: Annual Report 2024")] }, 2: { children: [new Paragraph("See appendix for methodology")] }, }, sections: [{ children: [new Paragraph({ children: [ new TextRun("Revenue grew 15%"), new FootnoteReferenceRun(1), new TextRun(" using adjusted metrics"), new FootnoteReferenceRun(2), ], })] }] }); ``` ### Tab Stops ```javascript // Right-align text on same line (e.g., date opposite a title) new Paragraph({ children: [ new TextRun("Company Name"), new TextRun("\tJanuary 2025"), ], tabStops: [{ type: TabStopType.RIGHT, position: TabStopPosition.MAX }], }) // Dot leader (e.g., TOC-style) new Paragraph({ children: [ new TextRun("Introduction"), new TextRun({ children: [ new PositionalTab({ alignment: PositionalTabAlignment.RIGHT, relativeTo: PositionalTabRelativeTo.MARGIN, leader: PositionalTabLeader.DOT, }), "3", ]}), ], }) ``` ### Multi-Column Layouts ```javascript // Equal-width columns sections: [{ properties: { column: { count: 2, // number of columns space: 720, // gap between columns in DXA (720 = 0.5 inch) equalWidth: true, separate: true, // vertical line between columns }, }, children: [/* content flows naturally across columns */] }] // Custom-width columns (equalWidth must be false) sections: [{ properties: {

Overview

A .docx file is a ZIP archive containing XML files.

Quick Reference

| Task | Approach | |------|----------| | Read/analyze content | `pandoc` or unpack for raw XML | | Create new document | Use `docx-js` - see Creating New Documents below | | Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |

Converting .doc to .docx

Legacy `.doc` files must be converted before editing: ```bash python scripts/office/soffice.py --headless --convert-to docx document.doc ```

Reading Content

```bash

Text extraction with tracked changes

pandoc --track-changes=all document.docx -o output.md

Repository files