Introduction

The following example demonstrates how to load an existing PDF document, get text from a specific area of that document & then output that text into a newly created PDF document.

Get text located in a specific area of a PDF document (C#)	Copy Code
public static void GetTextFromArea() { Console.WriteLine( "=== GET TEXT FROM AN AREA ===" ); var outputFileName = "GetTextFromArea.pdf"; var outputPath = TextsSample.TextsSampleOutputDirectory + outputFileName; // Load a pdf document. using( var pdfInput = PdfDocument.Load( TextsSampleResourcesDirectory + @"Two Page Text Only - from libre office.pdf" ) ) { // Get first page of input document. var page = pdfInput.Pages[ 0 ]; // Get Text from a specific area of the first page. var areaText = page.GetTextFromArea( new Rectangle( 297, 77, 75, 12 ) ); // Create an output Pdf to display areaText. using( var pdfoutput = PdfDocument.Create( outputPath ) ) { // Get first page of output pdf. var outputPage = pdfoutput.Pages[ 0 ]; // Set the title. var titleFont = pdfoutput.Fonts.GetStandardFont( StandardFontType.Helvetica ); outputPage.AddParagraph( "Get Text From Area", TextStyle.WithFont( titleFont, 15 ), new ParagraphStyle( ParagraphHorizontalAlignment.Center ) ); // Display the areaText. var textStyle = TextStyle.WithFont( titleFont, 12 ); outputPage.AddText( $"The text found in the area (297, 77, 375, 88) is: \"{areaText}\".", new Point( 110, 145 ), textStyle ); // Save the output document. pdfoutput.Save(); Console.WriteLine( $"Created: {outputFileName}" ); } } }

Get text located in a specific area of a PDF document (C#)

Copy Code

public static void GetTextFromArea()
{
  Console.WriteLine( "=== GET TEXT FROM AN AREA ===" );
  var outputFileName = "GetTextFromArea.pdf";
  var outputPath = TextsSample.TextsSampleOutputDirectory + outputFileName;

  // Load a pdf document.
  using( var pdfInput = PdfDocument.Load( TextsSampleResourcesDirectory + @"Two Page Text Only - from libre office.pdf" ) )
  {
    // Get first page of input document.
    var page = pdfInput.Pages[ 0 ];

    // Get Text from a specific area of the first page.
    var areaText = page.GetTextFromArea( new Rectangle( 297, 77, 75, 12 ) );

    // Create an output Pdf to display areaText.
    using( var pdfoutput = PdfDocument.Create( outputPath ) )
    {
      // Get first page of output pdf.
      var outputPage = pdfoutput.Pages[ 0 ];

      // Set the title.
      var titleFont = pdfoutput.Fonts.GetStandardFont( StandardFontType.Helvetica );
      outputPage.AddParagraph( "Get Text From Area", TextStyle.WithFont( titleFont, 15 ), new ParagraphStyle( ParagraphHorizontalAlignment.Center ) );

      // Display the areaText.
      var textStyle = TextStyle.WithFont( titleFont, 12 );
      outputPage.AddText( $"The text found in the area (297, 77, 375, 88) is: \"{areaText}\".", new Point( 110, 145 ), textStyle );

      // Save the output document.
      pdfoutput.Save();
      Console.WriteLine( $"Created: {outputFileName}" );
    }
  }
}

Learn More

To know more about how to extract only the hyperlinks from a PDF document, please consult the "Get Hyperlinks" example.

Business Suite

Ultimate Suite

Introduction