Lattes is an unique and largest platform for academic curriculumns. There you can find information about the academic work of all Brazilian scholars. It includes institution of PhD, current employer, field of work, all publications metadata and more. It is an unique and reliable source of information for bibliometric studies.
Package GetLattesData
is a wrap up of functions I’ve
been using for accessing the data. In the past, one could download the
data directly, without any manual work. Currently, 2024-11-12, a manual
captcha break is necessary. Therefore, using this package requires the
manual download of the zip files with the xml data.
Let’s consider a simple example of accessing information about my academic CV and a coleague. Both zip files are available locally within the package as an example. If you want to run this example for other scholars, you will have to download their xml zip files from Lattes. After opening the Lattes website (see an example here), click in the XML buttom in the top righ corner. Once the captcha is once again solved, you will download a zip file with the xml content.
Since I work in the business department of UFRGS, the impact of my
publications is localy set by the Qualis ranking of Management,
Accounting and Tourism
('ADMINISTRAÇÃO PÚBLICA E DE EMPRESAS, CIÊNCIAS CONTÁBEIS E TURISMO'
).
Qualis is the local journal ranking in Brazil. You can read more about
Qualis in Wikipedia and here.
Now, based on the zip file and field of Qualis, we use
GetLattesData
to access information available in
Lattes:
library(GetLattesData)
# get files from pkg (you can download from other researchers in lattes website)
f.in <- c(system.file('extdata/3262699324398819.zip', package = 'GetLattesData'),
system.file('extdata/8373564643000623.zip', package = 'GetLattesData'))
# set qualis
field.qualis = 'ADMINISTRAÇÃO PÚBLICA E DE EMPRESAS, CIÊNCIAS CONTÁBEIS E TURISMO'
# get data
l.out <- gld_get_lattes_data_from_zip(f.in,
field.qualis = field.qualis )
##
## Reading 3262699324398819.zip - Marcelo Scherer Perlin
## Found 32 published papers
## Found 0 accepted paper(s)
## Found 13 supervisions
## Found 4 published books
## Found 1 book chapters
## Found 24 conference papers
## Found 10 employment registries
## Found 5 projects
## Found 89 coauthors
## Reading 8373564643000623.zip - Denis Borenstein
## Found 75 published papers
## Found 2 accepted paper(s)
## Found 97 supervisions
## Found 1 published books
## Found 6 book chapters
## Found 89 conference papers
## Found 44 employment registries
## Found 18 projects
## Found 198 coauthors
The output my.l
is a list with the following
dataframes:
## [1] "tpesq" "tpublic.published" "tpublic.accepted"
## [4] "tsupervisions" "tbooks" "tconferences"
## [7] "t_atprof" "tprojects" "tcoauthors"
The first is a dataframe with information about researchers:
## tibble [2 × 18] (S3: tbl_df/tbl/data.frame)
## $ name : chr [1:2] "Marcelo Scherer Perlin" "Denis Borenstein"
## $ name_in_citations: chr [1:2] "PERLIN, M. S.;PERLIN, MARCELO;PERLIN, MARCELO SCHERER;PERLIN, MARCELO SCHERER;PERLIN, MARCELO S." "BORENSTEIN, D.;Borenstein, Denis;Denis Borenstein"
## $ last.update : Date[1:2], format: "2024-04-22" "2018-08-24"
## $ bsc.institution : chr [1:2] "Universidade Federal de Santa Maria" "Universidade Federal do Rio de Janeiro"
## $ bsc.start.year : chr [1:2] "2001" "1981"
## $ bsc.end.year : chr [1:2] "2005" "1986"
## $ bsc.course : chr [1:2] "Administração de empresas" "Engenharia Naval"
## $ msc.institution : chr [1:2] "Universidade Federal do Rio Grande do Sul" "Universidade Federal do Rio Grande do Sul"
## $ msc.start.year : chr [1:2] "2005" "1989"
## $ msc.end.year : chr [1:2] "2007" "1991"
## $ phd.institution : chr [1:2] "University of Reading" "University of Strathclyde"
## $ phd.start.year : num [1:2] 2007 1991
## $ phd.end.year : num [1:2] 2010 1995
## $ country.origin : chr [1:2] "Brasil" "Brasil"
## $ major.field : chr [1:2] "CIENCIAS_SOCIAIS_APLICADAS" "ENGENHARIAS"
## $ minor.field : chr [1:2] "Administração" "Engenharia de Produção"
## $ id.file : chr [1:2] "3262699324398819.zip" "8373564643000623.zip"
## $ last_update : Date[1:2], format: "2024-04-22" "2018-08-24"
The second dataframe contains information about all published publications, including Qualis and SJR:
## Rows: 107
## Columns: 14
## $ id.file <chr> "3262699324398819.zip", "3262699324398819.zip", "32…
## $ name <chr> "Marcelo Scherer Perlin", "Marcelo Scherer Perlin",…
## $ article.title <chr> "Teoria do Caos aplicada aos Contratos de Café no M…
## $ year <dbl> 2006, 2009, 2007, 2011, 2013, 2013, 2013, 2013, 201…
## $ language <chr> "Português", "Inglês", "Inglês", "Inglês", "Portugu…
## $ journal.title <chr> "READ - Revista Eletrônica da Administração (UFRGS)…
## $ contry.publication <chr> "Brasil", "", "", "", "", "", "", "", "", "", "", "…
## $ ISSN <chr> "-", "1753-9641", "1413-2311", "1749-9135", "1679-0…
## $ order.aut <dbl> 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 3, 1, …
## $ n.authors <dbl> 2, 1, 2, 2, 1, 3, 3, 3, 2, 2, 1, 3, 2, 4, 5, 3, 2, …
## $ DOI <chr> "", "10.1057/jdhf.2009.4", "", "", "", "10.1007/s10…
## $ qualis <chr> NA, NA, "A3", NA, "A4", "A3", "A4", "A1", "A2", "A4…
## $ SJR <dbl> NA, NA, NA, NA, NA, 0.421, NA, 0.689, 0.163, NA, 0.…
## $ H.SJR <int> NA, NA, NA, NA, NA, 27, NA, 60, 13, NA, 12, 2, NA, …
Other dataframes in l.out
included information about
accepted papers, supervisions, books and conferences.
GetLattesData
GetLattesData
makes it easy to create academic reports
for a large number of researchers. See next, where we plot the number of
publications for each researcher, conditioning on Qualis ranking.
tpublic.published <- l.out$tpublic.published
library(ggplot2)
p <- ggplot(tpublic.published, aes(x = qualis)) +
geom_bar(position = 'identity') + facet_wrap(~name) +
labs(x = paste0('Qualis: ', field.qualis))
print(p)
We can also use dplyr
to do some simple assessment of
academic productivity:
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
my.tab <- tpublic.published %>%
group_by(name) %>%
summarise(n.papers = n(),
max.SJR = max(SJR, na.rm = T),
mean.SJR = mean(SJR, na.rm = T),
n.A1.qualis = sum(qualis == 'A1', na.rm = T),
n.A2.qualis = sum(qualis == 'A2', na.rm = T),
median.authorship = median(as.numeric(order.aut), na.rm = T ))
knitr::kable(my.tab)
name | n.papers | max.SJR | mean.SJR | n.A1.qualis | n.A2.qualis | median.authorship |
---|---|---|---|---|---|---|
Denis Borenstein | 75 | 3.205 | 1.4516111 | 25 | 13 | 2 |
Marcelo Scherer Perlin | 32 | 1.269 | 0.5723684 | 9 | 7 | 1 |