Research Hell, um, I mean Research Query

By Holly Tucker (W&M Editor)

Hi everyone…we interrupt our regular programming for an urgent call for help.

I’m in the thick of writing and research for my next book, which is under contract with W.W. Norton.  I’m not allowed to say much about it right now…other than that I may genuinely be more excited about this one than I was for my last one  (Blood Work:  A Tale of Medicine and Murder in the Scientific Revolution).

So here’s the challenge:

The current book requires me to consult  thousands upon thousands of pages of court records and interrogation reports.  Many of these reports exist only in manuscript scribble–which brings its own special form of suffering, err, delight.  But for the moment, I’m focusing on about 5 hefty volumes of transcribed texts.

The documents in are in PDF format and–another wrinkle–all in French. They contain testimonies from hundreds of people, who were each ostensibly up to some fabulously wicked things, OR who were being accused of similarly fabulously wicked things.

To make sense of the avalanche of accounts, I need to put together an index or concordance of each witness and each person who is named in the accounts. This is critical for me to be able to figure out who was up to what, when, and why.

Really, I just want something simple like this.

Curie, Marie.   5, 29, 523, 1502.  (ideally with the page numbers hot-linked to location in the PDF itself)

Curie, Pierre.  16, 99, 504, 1412.

I could create a hand-coded index, a process that would take me well into my twilight years.  Or I could extract each letter/report/record contained in the PDF and import each into my database (Devonthink Pro Office) and tap the “see also” AI functions of DTPO.  But, call me crazy, I’d like to finish this book at some point in the foreseeable future.  (And so would my editor…who is expecting the manuscript in the next 15 months.  Gulp.)

I’ve looked at book indexing programs like Cindex ($500+!), PDF Index Generator (which could work, if the computerized voice on the tutorials didn’t bug me so much),  as well as many articles on indexing strategies (see, for example, this ProfHacker article).

Adobe Acrobat Pro will let me save individual searches into nifty files.  But it doesn’t seem to be able to organize multiple searches into a single hot-linked document.  So I could end up with hundreds of different files for the hundreds of different people/historical figures I’m trying to track right now. (Cue sound of head pounding on wall.)

For those of you who know me, I usually get excited about finding just the right tool for just the right research need.  For the record, the programs I could not live without are:  Dropbox, Devonthink Pro Office (database), Scrivener (writing program), Sente (citation manager), and Aeon Timeline, which has been amazing as I plot chronology for the book.


Alas, as much as I wish that I were a true Research Sensei, I admit only to being a clueless dolt. This indexing question is truly kicking me in the pants.

So help this poor soul?  Share Tips?  Strategies?  (NB: No deep programming skills required, please.  The whole point is to simplify the process, rather than complicate it with a steep learning curve.)

And if all else fails, offers to buy me a drink and join me in commiseration?




  1. Heinrich C. Kuhn says

    Are these PDFs of yours text-files (i.a. the manuscripts have already been transcribed), or just images?

    If they should be text files: my approach would be: transfer them into MS-Word files.

    Then write a Macro which goes through your text and indexes automatically every string which starts with a capital letter and where the second word of the string does also start with a capital letter (as your texts are in French only names of persons and geographical items should be capitalised, and going for two-word-strings should eliminate most of the geographical terms).
    Then order Word to create your index.

    You would then have to react manually to different spellings of the mane(s) of one and the same person etc., but that should not take too much time.



    • says

      As I read through the primary docs, it doesn’t look like the person call-outs are in a predictable (First, Last Name) pattern. :( Will see if I can find other patterns to search for.

    • says

      Hmmm…food for thought. I’ll read through the text and get a much better sense of how the names are presented. (I’m not sure they’re always First Last). Once I see a pattern, I’ll give this a try.

  2. mona everett says

    My head is spinnijng. Where do I send the drink money? LOL! No, wait,. I’ll just drink it. You won’t have time. Sorry.

    • says

      Good whiskey? Count me in! I’m trying now to use Adobe Acrobat annotate tool. The trouble now is getting the file into a format that the OCR likes. The thought is that I’ll read for names and annotate each. Then I can search for specific names in the annotations. It’s still not as automated as I’d like though…

Leave a Reply

Your email address will not be published. Required fields are marked *