Software by Design (tm)
Visit our new division - Infinite Gravity Data Services

Home
Up

 

Send us
Feedback & Ideas
FeedBack

Digital Timestamping

Vandenberg Systems is embarking on a corporate-wide project to use Digital Timestamping to protect our intellectual property rights; to provide a verifiable and extremely detailed audit trail of our activities, and to protect the rights of our clients and their data.

This page is not currently meant as a "production" page. It is a collection of our notes and ideas, and will be expanded as time goes on.

While we maintain copyright and intellectual rights on our work, we believe that in order for a timestamping system to be secure and to be legally accepted, the system and algorithm must be open and easily examined. That is the reason for placing this information on our web site.

 

moreinfo1.gif (2219 bytes)

Reasons for timestamping

There are many situation where it is important to be able to establish that a document existed on a certain date. Instances such as copyright disputes or patent infringement cases are common examples.

In a paper-based world, you can take your documents to a notary and have them sign & seal the documents. Or you could use a lawyer to safeguard copies of the information.

In a digital world, you could print the information and do the same, but there are other issues at stake as well. For example: How would you notarize a file that, when printed, might use up 10 million sheets of paper?

With Digital Timestamping, you can perform your own "notarization", and any time, as often as required. And with the right system & algorithms, you can "sandwich" the authentication certificate in time. You can say, unequivocally, that the document existed, for example,  after January 1, 1999 at 12:38pm and before January 3, 1999 at 3:45am.  You can, with sufficient inputs, authenticate a data set to within less than a second.

More commonly, and with considerably less data storage, you can lock a record in time within a 1-hour interval. For most office, R&D, and general business environments, this is sufficient.

Hash codes

A hash code (also called a message digest) can be thought of as a fingerprint for a set of data. Hash functions have long been used in computer science for a great many years to produce "fingerprints" of a data set. A simple hash function may take all the characters of a data set and add them up, producing a number. This is not a very good hash function, but it serves to illustrate. We know this is a bad hash function simply because it produces the same output no matter which order the characters are in. But we can say this: If two hash codes are not the same, then the input is not the same.

One-way hash functions are an extremely secure form of hash function. They are termed 'one-way' because it is easy to compute a hash value from a set of data, but it is extremely hard to create a set of data that will produce a particular hash value.

As an example of what a hash code looks like, we computed the hash of an example file (example 1). The value is:
B745ACE71DEB8B0DB0DEB55C31B379DA1DEE
26E894F5ADCCA4B56BDCBFD2183458D32B79

We then created a second example file (example 2). The two files are identical, except for one character. (Can you find it?). The hash value of this file is:
19010336D0F8A845BC52179A21770E268205E
F0AAD65311A70D414CF6374D3BAD4C18B23

Notice how different the hash codes are. The input files are different by only 1 bit. This is an example of a good characteristic in a hash function. Each bit in the input data set can and will affect each and every bit of the output. We have demonstrated that here.

We use two common and trusted hashing algorithms - SHA-1 and MD5. We concatenate the output of both functions into a 288-bit (36-byte) value. The reason we use both values is twofold: 1) we're just plain paranoid, and 2) it's been suggested that MD5 has some subtle weaknesses.

Using two algorithms strengthens our hashing functions. Cryptographers of the future will have to break both algorithms in order to break the strength of the combined hash functions.

For reference, the probability of finding a bitstring that produces the same hash code as another (different) bitstring is 1 in 2288, or 4.97 x 1086.

Hashing with passwords

We have demonstrated how changing a single bit in file radically changes the hash value. Adding extra information to the file does the same thing. Using a password in conjunction with a hashing function is trivial. Simply append or prepend your password to the data set you are hashing. The only way someone will be able to duplicate the hash value would be to have the same data set and the same password.

Chaining

Now that we know we can use a password with hash values, we have something upon which we can build the next level. We know that for any given hash value, there is only one data set that produces that value. We know that we could add some security to that by concatenating a password to the data set prior to hashing.

Now imagine what would happen if we chained the hash values together. That is, if, instead of a single password used for all data sets, we use the previous hash value as the password for the current data set.

Because every hash value is based on both the data set and the previous hash value, we can prove that the value was computed after the previous hash value. We can also prove the reverse - that a given hash value came prior to the next one.

This is essentially the same idea as Cipher Block Chaining (CBC) mode encryption, used extensively with symmetric key block ciphers.

Setting the bottom limit on the time frame

The bottom limit, or the earliest possible time, is based on the previous record. Because requests must, by definition, be processed sequentially, a given request cannot have been timestamped earlier than the one before it, in sequence.

To ensure that our timestamping intervals are not too long, we read data from the NIST servers and use the timestamp bitstring received as a request to the timestamping system.

We also chose to use extracts from a variety of web pages and email lists as a source of information to back up the NIST information. For example, the New York Times (nytimes.com) updates their top stories on 10-minute intervals. Copies of their web page are submitted for timestamping. Information & data & events occur on a very widely-witnessed scale, and it this widely-witnessed aspect that lends credibility to the bottom limit. We also store information from other newspapers and messages from government organizations as source of validation data.

Setting the upper limit on the time frame

The upper limit on an interval can be set by publishing a master 'hash of hashes'. Without publication, there would be nothing to stop us from changing a data set, and re-computing each and every hash value that came after.

By publishing a master hash value in a widely witnessed arena, we effectively "lock" the data upon which the master hash value depends. Remember, even a single-bit change in a data set will cause a complete change in the hash value. There is only one set of data that could create the master hash value, and by publishing the value, we are telling everyone what the value is. Now we can't go back and change anything, because the master hash value would be different.

At Vandenberg Systems, we will be publishing the information here on our web pages, in a newspaper, and in several news groups. See here for a list of hash codes posted to newsgroups.

Checkpoints

The 'master hash' is produced whenever the system checkpoints the timestamp signatures. During a checkpoint operation, all the timestamp signatures that were produced after the last checkpoint are concatenated into one long bitstring, a hash is produced from the bitstring, and that hash is timestamped.

The new master hash is computed by concatenating the checkpoint signature with the previous master hash value and then timestamping that.

Thus, all timestamps, including the checkpoints and master hash values are linked, and locked in time.

Each checkpoint has it's own ID number associated with it.

Repository

The files than have been timestamped are copied into a folder called the Repository. Each file is compressed in a ZIP-format archive to reduce space, and the file containing the timestamp certificate data ("certificate.txt") is stored in the archive as well.

Repository data is moved to CD on a regular basis.

Linking Protocol

The basic record-linking protocol is as follows:

Cn = H(n, IDn, Tn, Hn; n-1, IDn-1, Tn-1, Hn-1; H(n-1, IDn-1, Tn-1, Hn-1))

Hash function H() is the concatenation of the output of SHA-1 and MD5, and is 288 bits in length.

Cn is the hash value computed on the certificate data.

Checkpoints and master hash values are also stored in the same data file, and are subject to the same linking.

Certificates

A certificate contains all the data required to authenticate a timestamp. It includes all the elements used in the Linking Protocol described above, plus a unique name for the timestamp record. The name is based on the the SHA-1 component of Cn, and two 32-bit integer values representing the checkpoint group that the certificate is found in, and the index of the timestamp within that checkpoint group, respectively.

Future Uses

We envision uses for Digital Timestamps, particularly ones using cryptographic hash functions and linking protocols in the realm of on-line transaction processing, lottery ticket sales, auditing and many other areas where the security of the data and it's existence at a particular point in time are important.

Our primary use for this system is to protect our intellectual property rights as well as the rights of our clients.

 

© Copyright 1994-2005 Vandenberg Systems Incorporated. All Rights Reserved. Terms of Use.