Soruda tam olarak tarif edilen iş birden fazla alana giriyor, o yüzden soruyu biraz daraltarak cevaplayacağım.
Offline olarak tabii ki pdf dosyalarını yükleyip ayrıştırarak bilgi çekmek mümkün ama bu çetrefilli ve en iyi yol değil çünkü mesela makaleler güncellendiğinde farkında olmayız. Bunları yapmayalım diye Arxiv bir api yapmış, url üzerinden arama sorguları yapabiliyoruz. Bunu kullanmak çok kolay da değil zor da değil, dokümantasyon burada:
https://arxiv.org/help/api/user-manual#search_query_and_id_list
Bu api yazılmış bir python wrapper (türkçesine ne diyeceğimi bilemedim) kütüphanesi var adı arxiv, pip install arxiv
komutuyla yükleyebilirsiniz. Bunu kullanarak her türlü arama ve detaya ulaşmak mümkün. Mesela
import arxiv
arxiv.query(id_list=["1809.04468"])
# Çıktı
[{'id': 'http://arxiv.org/abs/1809.04468v5',
'guidislink': True,
'updated': '2020-03-31T17:23:21Z',
'updated_parsed': time.struct_time(tm_year=2020, tm_mon=3, tm_mday=31, tm_hour=17, tm_min=23, tm_sec=21, tm_wday=1, tm_yday=91, tm_isdst=0),
'published': '2018-09-12T14:06:25Z',
'published_parsed': time.struct_time(tm_year=2018, tm_mon=9, tm_mday=12, tm_hour=14, tm_min=6, tm_sec=25, tm_wday=2, tm_yday=255, tm_isdst=0),
'title': 'An Integrated First-Order Theory of Points and Intervals over Linear\n Orders (Part II)',
'title_detail': {'type': 'text/plain',
'language': None,
'base': 'http://export.arxiv.org/api/query?search_query=&id_list=1809.04468&start=0&max_results=1000&sortBy=relevance&sortOrder=descending',
'value': 'An Integrated First-Order Theory of Points and Intervals over Linear\n Orders (Part II)'},
'summary': 'There are two natural and well-studied approaches to temporal ontology and\nreasoning: point-based and interval-based. Usually, interval-based temporal\nreasoning deals with points as a particular case of duration-less intervals. A\nrecent result by Balbiani, Goranko, and Sciavicco presented an explicit\ntwo-sorted point-interval temporal framework in which time instants (points)\nand time periods (intervals) are considered on a par, allowing the perspective\nto shift between these within the formal discourse. We consider here two-sorted\nfirst-order languages based on the same principle, and therefore including\nrelations, as first studied by Reich, among others, between points, between\nintervals, and inter-sort. We give complete classifications of its\nsub-languages in terms of relative expressive power, thus determining how many,\nand which, are the intrinsically different extensions of two-sorted first-order\nlogic with one or more such relations. This approach roots out the classical\nproblem of whether or not points should be included in a interval-based\nsemantics. In this Part II, we deal with the cases of all dense and the case of\nall unbounded linearly ordered sets.',
'summary_detail': {'type': 'text/plain',
'language': None,
'base': 'http://export.arxiv.org/api/query?search_query=&id_list=1809.04468&start=0&max_results=1000&sortBy=relevance&sortOrder=descending',
'value': 'There are two natural and well-studied approaches to temporal ontology and\nreasoning: point-based and interval-based. Usually, interval-based temporal\nreasoning deals with points as a particular case of duration-less intervals. A\nrecent result by Balbiani, Goranko, and Sciavicco presented an explicit\ntwo-sorted point-interval temporal framework in which time instants (points)\nand time periods (intervals) are considered on a par, allowing the perspective\nto shift between these within the formal discourse. We consider here two-sorted\nfirst-order languages based on the same principle, and therefore including\nrelations, as first studied by Reich, among others, between points, between\nintervals, and inter-sort. We give complete classifications of its\nsub-languages in terms of relative expressive power, thus determining how many,\nand which, are the intrinsically different extensions of two-sorted first-order\nlogic with one or more such relations. This approach roots out the classical\nproblem of whether or not points should be included in a interval-based\nsemantics. In this Part II, we deal with the cases of all dense and the case of\nall unbounded linearly ordered sets.'},
'authors': ['Willem Conradie', 'Salih Durhan', 'Guido Sciavicco'],
'author_detail': {'name': 'Guido Sciavicco'},
'author': 'Guido Sciavicco',
'arxiv_comment': "This is Part II of the paper `An Integrated First-Order Theory of\n Points and Intervals over Linear Orders' arXiv:1805.08425v2. Therefore the\n introduction, preliminaries and conclusions of the two papers are the same.\n This version implements a few minor corrections and an update to the\n affiliation of the second author",
'links': [{'href': 'http://arxiv.org/abs/1809.04468v5',
'rel': 'alternate',
'type': 'text/html'},
{'title': 'pdf',
'href': 'http://arxiv.org/pdf/1809.04468v5',
'rel': 'related',
'type': 'application/pdf'}],
'arxiv_primary_category': {'term': 'cs.LO',
'scheme': 'http://arxiv.org/schemas/atom'},
'tags': [{'term': 'cs.LO',
'scheme': 'http://arxiv.org/schemas/atom',
'label': None},
{'term': '03B44', 'scheme': 'http://arxiv.org/schemas/atom', 'label': None},
{'term': 'F.4.1; I.2.4',
'scheme': 'http://arxiv.org/schemas/atom',
'label': None}],
'pdf_url': 'http://arxiv.org/pdf/1809.04468v5',
'affiliation': 'None',
'arxiv_url': 'http://arxiv.org/abs/1809.04468v5',
'journal_reference': 'Logical Methods in Computer Science, Volume 16, Issue 2, Logic for\n knowledge representation (April 1, 2020) lmcs:6260',
'doi': None}]
İlk bakışta karışık görünüyor ama çıktı aslında 1 uzunlukta bir liste, ve listenin tek elemanı da bir sözlük. Bu sözlüğün içerisinde aradığımız bilgileri bulabiliriz.
a = arxiv.query(id_list=["1809.04468"])
print(a[0]['title'])
print(a[0]['authors'])
# Çıktı
An Integrated First-Order Theory of Points and Intervals over LinearOrders (Part II)
['Willem Conradie', 'Salih Durhan', 'Guido Sciavicco']
Bu kütüphaneyle sadece makale numarası değil daha karmaşık sorgular yapmak da mümkün, kütüphanenin git sayfası:
https://github.com/lukasschwab/arxiv.py
Temel bir kaç örnek:
import arxiv
# Anahtar kelime sorgusu
# Zaman aşımı hatası ya da beklenenden fazla sonuçla karşılaşmamak için max_results
arxiv.query(query="logic", max_results=100)
# Farklı alanlarda anahtar kelimelerle sorgu, au=author cat=category
# Sorgu terimlerinin nasıl yazıldığı için arxiv api dokümantasyonuna başvurun
arxiv.query(query="au:esentepe_ozgur AND cat:math.AC")
# Birden fazla makale numarası ile
arxiv.query(id_list=["1807.05471", "1809.04468"])
Elde bu bilgiler olduktan sonra kendi yerel arşivinizdeki dosyalardan makale numaralarını alıp, bunların isimlerini değşitirmek mümkün. Bunun nasıl yapılacağı için bence ayrı bir soru lazım.