Latest Activity

NAVEEN KUMAR SRIVASTAVA posted a status
"Please share deputation jobs"
7 hours ago
NAVEEN KUMAR SRIVASTAVA updated their profile
7 hours ago
Rakesh Sehrawat posted a status
"Anyone, in the journey of JRF this session?"
Tuesday
AYYAPPAN .Z updated their profile
May 3
Sanjana saini and Pritam Kumar Gupta are now friends
May 3
AYYAPPAN .Z left a comment for Balesh Kumar
May 3
AYYAPPAN .Z replied to SUNNY JOSEPH's discussion Academic Status to Librarian: UGC Letter and Court Order
May 3
Rahul kumar updated their profile
May 2
Rafeeque Ansari updated their profile
May 1
AYYAPPAN .Z replied to SUNNY JOSEPH's discussion Academic Status to Librarian: UGC Letter and Court Order
Apr 30
Suresh Vyas shared a profile on Facebook
Apr 29
Suresh Vyas shared a profile on Facebook
Apr 29
NADIM AHAMAD KHAN posted an event

hort Term Course on Koha and DSpace at Central Library deen dayal upadhyaya Gorakhpur University

April 30, 2026 to May 6, 2026
Apr 28
Dr. Sandip Pathak posted an event
Thumbnail

13th Convention Planner 2026 at Gauhati University,  Guwahati, Assam.

September 18, 2026 at 9am to September 20, 2026 at 7pm
Apr 28
Profile IconKannan G, Otturu pedapapaiah, Ardas Singh and 6 more joined LIS Links
Apr 28
Sanjana saini and Akshay Subhash Gadade are now friends
Apr 24
Dr. U. PRAMANATHAN replied to Dr. U. PRAMANATHAN's discussion Call for Book Chapters: Digital Leadership in Library & Information Centers (Innovative Strategies in Library Management) by Ess Ess Publications.
Apr 23
Dr.Rajeshwar kumar G replied to Dr. U. PRAMANATHAN's discussion Call for Book Chapters: Digital Leadership in Library & Information Centers (Innovative Strategies in Library Management) by Ess Ess Publications.
Apr 23
Arunima Giri might attend Dr. U. PRAMANATHAN's event
Apr 23
Shruthi updated their profile
Apr 23

Dear Friends

 

We are in need of a PDF Metadata Extractor Information, preferably free and not online. Please share the information if anybody using it. Actually it is for using in combination with DSpace software, but we can not go online with our collection.

Any help will be highly appreciated.

Thank you

Subeesh A C

Views: 1038

Reply to This

Replies to This Forum

Try ExitTool

http://www.sno.phy.queensu.ca/~phil/exiftool/

I have been using it for extracting metadata from PDFs for using in DSpace.  It is possible to extract metadata from all PDFs at one go, if you are familiar with command line options.

S. Baskar

Thank you very much sir

But I think the tool is extracting data from document properties in my try. Are you getting the appropriate data with exiftool?

Subeesh A C

Hi,

Using the below command, you can extract all metadata (i.e. all metadata tags associated with the PDF document) from hundreds of PDF documents and save it as CSV file which could be used for doing batch import within DSpace.  

In case, if you require only specific tags, then you have to mention the required metadata tags for extracting.  I have given an example below for your understanding.

To extract all available metadata tags from the PDF documents and save it as a CSV file

---------------------------------------------------------------------------------------------------------------------

exiftool -csv  *.pdf > output.csv

To extract specific metadata tags from the PDF documents and save it as a CSV file

-----------------------------------------------------------------------------------------------------------------------------

exiftool  -TAG -Title   -TAG -Author  -TAG -Producer  -TAG -Subject -TAG -Description -TAG -Type -TAG -Keywords -TAG -ISBN -TAG -Isbn -TAG -Createdate -TAG -CourseID  -TAG -FileSize -TAG -PageCount -TAG -PDFVersion -d %Y-%m-%d  *.pdf -csv > output.csv

Hope this helps.


S. Baskar

LinuXpert Systems

ExifTool Tag Names

The tables listed below give the names of all tags recognized by ExifTool.

http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/index.html

Thank you very much sir

I have created a small uitlity for extracting information from pdf files  few years ago . it will extract data from all files in a folder and save in tab delimited text file.

you can try it. hope it helps. pls let me know.

i have uploaded the program to google drive. Click here to download

with regards

Mujib Rahiman

KV Kanjikode

Thanks sir, I will surely let you know.

Regards 

Subeesh A C

Sir

I have checked your software, its a great effort if you have coded it yourself. As I see most of the software(s) are not able to identify the pdf files metadata as we require. I think the problem is mostly revolve around  the structure of pdf files itself. In my case the pdf files are not having any standard structure (+ OCR ) in it for the algorithm to extract as it did for any appropriate one. Since we are in hurry and we require more metadata for the current work, we are thinking of indexing it and filtering it later through various categories. Anyway thanks for your reply.

Regards 

Subeesh A C

RSS

© 2026   Created by Dr. Badan Barman.   Powered by

Badges  |  Report an Issue  |  Terms of Service

LIS Links whatsApp