Researchers Develop a Secure Way to Hide Information in Text

Amy J. Born | May 21, 2018

Computer scientists at Columbia Engineering have developed a way to alter text in order to protect its validity or provide additional information, without changing its appearance.

Someone using FontCode would supply a secret message and a carrier text document. FontCode converts the secret message to a bit string (ASCII or Unicode) and then into a sequence of integers. Each integer is assigned to a five-letter block in the regular text where the numbered locations of each letter sum to the integer. Source: Changxi Zheng/Columbia EngineeringFontCode, developed by associate professor of computer science Changxi Zheng and two students, uses imperceptible changes in the shape of fonts to embed metadata, a URL or digital signature without noticeably changing the look of the document. These changes, known as perturbations, work with most fonts and document types. The information remains encoded even when the document is printed or converted to a different file type.

"Changing any letter, punctuation mark, or symbol into a slightly different form allows you to change the meaning of the document," said Chang Xiao, the lead author of the paper on FontCode. "This hidden information, though not visible to humans, is machine-readable just as barcodes and QR codes are instantly readable by computers. However, unlike barcodes and QR codes, FontCode doesn't mar the visual aesthetics of the printed material, and its presence can remain secret."

In addition to embedding information, FontCode can encrypt data to increase the level of security. The perturbations are stored in numbered locations in a codebook. Parties wishing to protect the encoded information would agree on a key that specifies the number or location of each perturbation.

In order to ensure that the original message can be recovered even when some letters are not recognized correctly, the team employed a 1,700-year-old Chinese remainder theorem that uses remainders and multiple divisors to identify unknown numbers.

"Imagine having three unknown variables," said Zheng. "With three linear equations, you should be able to solve for all three. If you increase the number of equations from three to five, you can solve the three unknowns as long as you know any three out of the five equations."

The researchers showed that messages with 25 percent of the perturbations unrecognized could still be recovered and they claim that, theoretically, the error rate could go even higher.

More information is available on the project website. The paper will be presented at SIGGRAPH in Vancouver, British Columbia, August 12-16.