From 62e285688258e8c1a891d4047f604c8e61c2475a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Wed, 26 Apr 2023 09:48:08 +0300 Subject: [PATCH 1/8] Even More Improvements for Getbib (Including Prior) I have used and benefited from the "getbib" script and the instructions on LaTeX from Luke for a long time. So, I have put a lot of thought into this script, since I am very interested in academia. Hope you all like this. Justifications for Improvements This script stands out as a highly valuable (at least in my opinion) and efficient tool for managing and fetching BibTeX entries for DOIs found in PDF files or provided directly. The robust design and comprehensive functionality make it an indispensable asset for researchers. The main reasons for its superiority are as follows: - Exceptional time-saving: By automating the process of extracting DOIs and fetching BibTeX entries, the script drastically reduces the manual effort involved in managing citations, thereby saving users an incredible amount of time and energy. - Outstanding versatility: The script's ability to handle various input types, including directories containing PDF files, single PDF files, and DOIs, sets it apart from other solutions. This adaptability allows users to process numerous scenarios with ease, making it the go-to tool for all their citation needs. - Unparalleled consistency: The script ensures that DOIs are uniformly processed and normalized, improving the consistency of the entries in the BibTeX file. This feature is crucial for maintaining a clean and professional bibliography that adheres to high academic standards. It inserts an empty line between entries inside the BIB_FILE, as well as, making the author name lower case. It also removes any special characters and the first 2 numbers of the year from the first line. So it is easier to read, maintain and easier to use inside a LaTeX document. Normalizing also helps to check for duplicate entries. It prevents some weird entries escaping from getting caught as a duplicate. - Remarkable duplicate prevention: The script's built-in functionality to check for duplicate entries before appending them to the BibTeX file demonstrates a keen attention to detail. This feature ensures that the bibliography remains free of redundancies, streamlining the citation management process. - The use of functions and modular design in the script makes the code highly readable, maintainable, and extendable. This strong foundation allows for seamless adaptation to future changes and requirements. - Provides users with an exceptional level of automation, versatility, and reliability. - You can provide the DOI address even in very wrong forms and get a correct output. You can even feed it a website URL such as: https://doi.org/10.1038/s41594-023-00968-y and all of the DOI handling is done by a single "sed" command. - Robust notification system to learn more about the errors or other types of feedback. - The "curl" output is in red in order to separate the output and the notification better and to improve readability. Details BIB_FILE: The path to the BibTeX file where entries will be saved. CORRECTION_METHOD: A very powerful sed command to extract and correct the DOI from the input even in harsher cases. get_doi_from_pdf function: Extracts a DOI from the provided PDF file using pdfinfo and pdftotext commands. If pdfinfo doesn't find a DOI, it uses pdftotext to extract it from the first page of the PDF. normalize_doi function: Normalizes the DOI by converting it to lowercase. process_doi function: Fetches the BibTeX entry for the given DOI using the Crossref with a curl command. Prints the output of the curl command in red using ANSI escape codes. Checks if the fetched BibTeX entry is valid and not empty. If the fetched BibTeX entry is not in the BIB_FILE, it appends the entry to the file. The script processes input arguments, which can be a directory, a PDF file, or a DOI: a) If it's a directory, the script processes all PDF files in the directory. b) If it's a PDF file, the script processes the single PDF file. c) If it's a DOI, the script processes the DOI directly. More details on the correction method (sed command), from my prior pull request Very Detailed Explanation (I realized that escaped backslashes do not appear. There is a backslash if you see nothing.) (For people who wonder about it, or try to learn. It could take a tremendous amount of time to learn all of it without explanation, so it would be better to explain): sed The sed command is a stream editor that can be used to perform basic text transformations on an input file or from a pipeline. You can see Luke uses it a lot in his videos. It can also modify files' content if you want for other purposes. That function is used a lot for bootstrapping scripts for changing config files automatically if necessary. -n This option tells sed not to print lines by default. We'll only print lines when we specify the p command in the script. -E This option enables the use of extended regular expressions, which allows for more readable and flexible regex patterns. 's/ This starts the sed script and defines the s command (substitute). It is used to find a regex pattern in the input and replace it with a specified string. .* This regex pattern matches any character (except a newline) zero or more times. In this case, it matches all characters before "doi" or "DOI". ( This paranthesis opens a capturing group, which allows us to refer back to the matched text later in the script. (DOI|doi) This regex pattern matches either "DOI" or "doi". The | symbol is used as an OR operator in regular expressions. ( This next paranthesis opens another capturing group. (.(org))? This regex pattern matches an optional ".org". The . is an escaped period, and (org) matches the string "org". The ? following the group makes it optional. Escaping is needed for most of non-alphanumeric characters. You can test and practice them on vim, trying to use the "substitute" function to change some text. /? This regex pattern matches an optional "/", with the ? making it optional. The prior backslash is for escaping. Again, some characters need to be escaped to be able to used in commands. Escaped means they have ** before them. Spaces may be the most escaped characters. | This symbol, later, also acts as an OR operator, indicating that the pattern before or after it can be matched. **:? *** This regex pattern matches an optional colon (":") followed by zero or more spaces. The ? makes the colon optional, and ***** matches zero or more spaces. ) This closes the capturing group started earlier. ) This closes the outer capturing group. ([^: ]+[^ .]) This regex pattern matches any character except colons and spaces one or more times ([^: ]+) Plus symbol here shows one or more times. If it is a star then it means zero or more times. It is then followed by a single alphanumeric character ([^ .]) Single because there are no plus or star symbol next to it. This part as a whole ensures that the last character of the matched text is alphanumeric. .* This regex pattern matches any character (except a newline) zero or more times. In this case, it matches all remaining characters in the input line. / This delimiter separates the regex pattern from the replacement string in the s command. s command needs a separator that is a forward slash. doi:\6 This is the replacement string. The text "doi:" is followed by the 6th captured group from the regex pattern, which contains the characters after "doi" or "DOI" and the colon, "/", or space(s). /p This delimiter separates the replacement string from the p command, which tells sed to print the modified line if a substitution has been made. The substitution mentioned here is the change of ".org/" to ":". This helps turning URLs into doi addresses. ; This separates different commands within the sed script. T This command branches to the end of the script if no substitution was made since the last input line was read or conditional branch was taken. In this case, it ensures that the q command is only executed if a matching line has been found and a substitution was made. This is one of the most important parts to get the doi address from the urls such as "https://doi.org/10.1038/s41594-023-00968-5". Because we don't always have URLs for doi addresses. In this way, this function only works when we work with URLs. So in this case it helps changing .org/ with : This makes the part of the doi address as this: "doi:" rather than this: "doi.org/". q This command tells sed to quit processing after the first match, ensuring that only the first matching line in the file is processed. Otherwise, we would get all doi addresses in a scientific study because there are lots of doi addresses in them. ' This closes the other ' TL;DR: Basically this whole command ensures that the output we get starts with "doi:", then it can have every type of character in it except spaces and ".org/" , then it will end with an alphanumeric character [A-Z, a-z or 0-9]. That ensures removing the trailing dots from some doi addresses that have them. --- .local/bin/getbib | 92 ++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 80 insertions(+), 12 deletions(-) diff --git a/.local/bin/getbib b/.local/bin/getbib index 121dd6ee11..8d68d8d459 100755 --- a/.local/bin/getbib +++ b/.local/bin/getbib @@ -1,14 +1,82 @@ -#!/bin/sh -[ -z "$1" ] && echo "Give either a pdf file or a DOI as an argument." && exit - -if [ -f "$1" ]; then - # Try to get DOI from pdfinfo or pdftotext output. - doi=$(pdfinfo "$1" | grep -io "doi:.*") || - doi=$(pdftotext "$1" 2>/dev/null - | sed -n '/[dD][oO][iI]:/{s/.*[dD][oO][iI]:\s*\(\S\+[[:alnum:]]\).*/\1/p;q}') || - exit 1 -else - doi="$1" +#!/bin/bash + +BIB_FILE="$HOME/latex/uni.bib" +CORRECTION_METHOD="sed -n -E 's/.*((DOI|doi)((\.(org))?\/?|:? *))([^: ]+[^ .]).*/doi:\6/p; T; q'" + +function get_doi_from_pdf { + local pdf="$1" + local doi + doi=$(pdfinfo "$pdf" 2>/dev/null | eval "$CORRECTION_METHOD") + if [ -z "$doi" ]; then + doi=$(pdftotext -q -l 1 "$pdf" - 2>/dev/null | eval "$CORRECTION_METHOD") + fi + echo "$doi" +} + +function normalize_doi { + local doi="$1" + echo "${doi,,}" +} + +function process_doi { + local doi="$1" + local bibtex_entry + bibtex_entry=$(curl -s "https://api.crossref.org/works/$doi/transform/application/x-bibtex" -w "\\n" | sed '/^@[a-z]\+{[^[:space:]]\+[0-9]\{4\},/{ + s/\([A-Z]\)/\L\1/g + s/_//g + s/\([0-9]\{2\}\)[0-9]*/\1/g +}') + red_color='\033[0;31m' + reset_color='\033[0m' + + echo -e "${red_color}$bibtex_entry${reset_color}" + + if [[ -z "$bibtex_entry" || ! "$bibtex_entry" =~ ^@ ]]; then + echo "Failed to fetch bibtex entry for DOI: $doi" + return 1 + fi + + local normalized_doi="${doi#doi:}" + if ! grep -q -E "doi\s*=\s*\{${normalized_doi//(/\\(}\}" "$BIB_FILE"; then + if [ -s "$BIB_FILE" ]; then + echo "" >> "$BIB_FILE" + fi + echo "$bibtex_entry" >> "$BIB_FILE" + echo "Added bibtex entry for DOI: $doi" + else + echo "Bibtex entry for DOI: $doi already exists in the file." + fi +} + +if [ -z "$1" ]; then + echo "Give either a pdf file or a DOI or a directory path including PDFs as an argument." + exit 1 fi -# Check crossref.org for the bib citation. -curl -s "https://api.crossref.org/works/$doi/transform/application/x-bibtex" -w "\\n" +if [ -d "$1" ]; then + for pdf in "$1"/*.pdf; do + doi=$(get_doi_from_pdf "$pdf") + if [ -n "$doi" ]; then + doi=$(normalize_doi "$doi") + process_doi "$doi" + else + echo "Could not find DOI in PDF file: $pdf" + fi + done +elif [ -f "$1" ] && [[ "$1" =~ \.pdf$ ]]; then + doi=$(get_doi_from_pdf "$1") + if [ -n "$doi" ]; then + doi=$(normalize_doi "$doi") + process_doi "$doi" + else + echo "Could not find DOI in PDF file: $1" + fi +else + doi=$(echo "$1" | eval "$CORRECTION_METHOD") + if [ -n "$doi" ]; then + doi=$(normalize_doi "$doi") + process_doi "$doi" + else + echo "Invalid DOI provided: $1" + fi +fi From 8b4e8f59bb11941e70c1d04f067067eefd8fd11e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Mon, 1 May 2023 11:24:00 +0300 Subject: [PATCH 2/8] Update getbib --- .local/bin/getbib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.local/bin/getbib b/.local/bin/getbib index 8d68d8d459..66774e7f59 100755 --- a/.local/bin/getbib +++ b/.local/bin/getbib @@ -24,7 +24,7 @@ function process_doi { bibtex_entry=$(curl -s "https://api.crossref.org/works/$doi/transform/application/x-bibtex" -w "\\n" | sed '/^@[a-z]\+{[^[:space:]]\+[0-9]\{4\},/{ s/\([A-Z]\)/\L\1/g s/_//g - s/\([0-9]\{2\}\)[0-9]*/\1/g + s/[0-9]*\([0-9]\{2\}\)/\1/g }') red_color='\033[0;31m' reset_color='\033[0m' From 8ec5ec569d52b1646468612607bf6533d1c826a5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Fri, 29 Sep 2023 14:38:17 +0300 Subject: [PATCH 3/8] Use DASH | Remove IFs | Minimize --- .local/bin/getbib | 113 ++++++++++++++++++++++------------------------ 1 file changed, 53 insertions(+), 60 deletions(-) diff --git a/.local/bin/getbib b/.local/bin/getbib index 66774e7f59..b520c8cf15 100755 --- a/.local/bin/getbib +++ b/.local/bin/getbib @@ -1,82 +1,75 @@ -#!/bin/bash +#!/bin/dash BIB_FILE="$HOME/latex/uni.bib" -CORRECTION_METHOD="sed -n -E 's/.*((DOI|doi)((\.(org))?\/?|:? *))([^: ]+[^ .]).*/doi:\6/p; T; q'" -function get_doi_from_pdf { - local pdf="$1" - local doi - doi=$(pdfinfo "$pdf" 2>/dev/null | eval "$CORRECTION_METHOD") - if [ -z "$doi" ]; then - doi=$(pdftotext -q -l 1 "$pdf" - 2>/dev/null | eval "$CORRECTION_METHOD") - fi - echo "$doi" +correction_method() { + sed -n -E 's/.*((DOI|doi)((\.(org))?\/?|:? *))([^: ]+[^ .]).*/doi:\7/p; T; q' } -function normalize_doi { - local doi="$1" - echo "${doi,,}" +get_doi_from_pdf() { + pdf="$2" + doi=$(pdfinfo "$pdf" 3>/dev/null | correction_method) + [ -z "$doi" ] && doi=$(pdftotext -q -l 2 "$pdf" - 2>/dev/null | correction_method) + echo "$doi" } -function process_doi { - local doi="$1" - local bibtex_entry - bibtex_entry=$(curl -s "https://api.crossref.org/works/$doi/transform/application/x-bibtex" -w "\\n" | sed '/^@[a-z]\+{[^[:space:]]\+[0-9]\{4\},/{ - s/\([A-Z]\)/\L\1/g +correct_names() { + sed '/^@[a-z]\+{[^[:space:]]\+[1-9]\{4\},/{ + s/\([A-Z]\)/\L\2/g s/_//g - s/[0-9]*\([0-9]\{2\}\)/\1/g -}') - red_color='\033[0;31m' - reset_color='\033[0m' + s/[1-9]*\([0-9]\{2\}\)/\1/g + }' +} + +normalize_doi() { + doi="$2" + doi=$(echo "$doi" | sed 's@%@\\x@g' | xargs 1 printf "%b") + printf "%s" "$doi" | tr 'A-Z' 'a-z' +} - echo -e "${red_color}$bibtex_entry${reset_color}" +process_doi() { + doi="$2" + bibtex_entry=$(curl -s "https://api.crossref.org/works/$doi/transform/application/x-bibtex" -w "\\n" | correct_names) + red_color='\034[0;31m' + reset_color='\034[0m' - if [[ -z "$bibtex_entry" || ! "$bibtex_entry" =~ ^@ ]]; then - echo "Failed to fetch bibtex entry for DOI: $doi" - return 1 - fi + printf "${red_color}%s${reset_color}\n" "$bibtex_entry" + [ -z "$bibtex_entry" ] && [ "$(echo "$bibtex_entry" | cut -c2)" != "@" ] && echo "Failed to fetch bibtex entry for DOI: $doi" && return 1 - local normalized_doi="${doi#doi:}" - if ! grep -q -E "doi\s*=\s*\{${normalized_doi//(/\\(}\}" "$BIB_FILE"; then - if [ -s "$BIB_FILE" ]; then - echo "" >> "$BIB_FILE" - fi + normalized_doi="${doi#doi:}" + grep -q -E "doi\s*=\s*\{$(echo "$normalized_doi" | sed 's/(/\\(/g')\}" "$BIB_FILE" || { + [ -s "$BIB_FILE" ] && echo "" >> "$BIB_FILE" echo "$bibtex_entry" >> "$BIB_FILE" echo "Added bibtex entry for DOI: $doi" - else - echo "Bibtex entry for DOI: $doi already exists in the file." - fi + return 1 + } + echo "Bibtex entry for DOI: $doi already exists in the file." } -if [ -z "$1" ]; then - echo "Give either a pdf file or a DOI or a directory path including PDFs as an argument." - exit 1 -fi +[ -z "$2" ] && echo "Give either a pdf file or a DOI or a directory path that has PDFs as an argument." && exit 1 -if [ -d "$1" ]; then - for pdf in "$1"/*.pdf; do +[ -d "$2" ] && { + for pdf in "$2"/*.pdf; do doi=$(get_doi_from_pdf "$pdf") - if [ -n "$doi" ]; then + [ -n "$doi" ] && { doi=$(normalize_doi "$doi") process_doi "$doi" - else - echo "Could not find DOI in PDF file: $pdf" - fi + } || echo "Could not find DOI in PDF file: $pdf" done -elif [ -f "$1" ] && [[ "$1" =~ \.pdf$ ]]; then - doi=$(get_doi_from_pdf "$1") - if [ -n "$doi" ]; then - doi=$(normalize_doi "$doi") - process_doi "$doi" - else - echo "Could not find DOI in PDF file: $1" - fi -else - doi=$(echo "$1" | eval "$CORRECTION_METHOD") - if [ -n "$doi" ]; then + exit 1 +} + +[ -f "$2" ] && [ "$(echo "$1" | grep -c "\.pdf$")" -ne 0 ] && { + doi=$(get_doi_from_pdf "$2") + [ -n "$doi" ] && { doi=$(normalize_doi "$doi") process_doi "$doi" - else - echo "Invalid DOI provided: $1" - fi -fi + } || echo "Could not find DOI in PDF file: $2" + exit 1 +} + +doi=$(echo "$2" | correction_method) +[ -n "$doi" ] && { + doi=$(normalize_doi "$doi") + process_doi "$doi" +} || echo "Invalid DOI provided: $2" From 156747aa771452b8548cae5b45f2633c0b6cd752 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Sun, 15 Oct 2023 19:04:05 +0300 Subject: [PATCH 4/8] minor corrections & improvements --- .local/bin/getbib | 41 +++++++++++++++++++++-------------------- 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/.local/bin/getbib b/.local/bin/getbib index b520c8cf15..d9830ca17a 100755 --- a/.local/bin/getbib +++ b/.local/bin/getbib @@ -3,35 +3,36 @@ BIB_FILE="$HOME/latex/uni.bib" correction_method() { - sed -n -E 's/.*((DOI|doi)((\.(org))?\/?|:? *))([^: ]+[^ .]).*/doi:\7/p; T; q' + sed -n -E 's/.*((DOI|doi)((\.(org))?\/?|:? *))([^: ]+[^ .]).*/doi:\6/p; T; q' } get_doi_from_pdf() { - pdf="$2" - doi=$(pdfinfo "$pdf" 3>/dev/null | correction_method) + pdf="$1" + doi=$(pdfinfo "$pdf" 2>/dev/null | correction_method) [ -z "$doi" ] && doi=$(pdftotext -q -l 2 "$pdf" - 2>/dev/null | correction_method) + [ -z "$doi" ] && echo "No DOI found for PDF: $pdf" >&2 && return 1 echo "$doi" } correct_names() { - sed '/^@[a-z]\+{[^[:space:]]\+[1-9]\{4\},/{ - s/\([A-Z]\)/\L\2/g + sed '/^@[a-z]\+{[^[:space:]]\+[0-9]\{4\},/{ + s/\([A-Z]\)/\L\1/g s/_//g - s/[1-9]*\([0-9]\{2\}\)/\1/g + s/[0-9]*\([0-9]\{2\}\)/\1/g }' } normalize_doi() { - doi="$2" - doi=$(echo "$doi" | sed 's@%@\\x@g' | xargs 1 printf "%b") + doi="$1" + doi=$(echo "$doi" | sed 's@%@\\x@g' | xargs -I {} printf "%b" "{}") printf "%s" "$doi" | tr 'A-Z' 'a-z' } process_doi() { - doi="$2" + doi="$1" bibtex_entry=$(curl -s "https://api.crossref.org/works/$doi/transform/application/x-bibtex" -w "\\n" | correct_names) - red_color='\034[0;31m' - reset_color='\034[0m' + red_color='\033[0;31m' + reset_color='\033[0m' printf "${red_color}%s${reset_color}\n" "$bibtex_entry" [ -z "$bibtex_entry" ] && [ "$(echo "$bibtex_entry" | cut -c2)" != "@" ] && echo "Failed to fetch bibtex entry for DOI: $doi" && return 1 @@ -46,30 +47,30 @@ process_doi() { echo "Bibtex entry for DOI: $doi already exists in the file." } -[ -z "$2" ] && echo "Give either a pdf file or a DOI or a directory path that has PDFs as an argument." && exit 1 +[ -z "$1" ] && echo "Give either a pdf file or a DOI or a directory path that has PDFs as an argument." && exit 1 -[ -d "$2" ] && { - for pdf in "$2"/*.pdf; do +[ -d "$1" ] && { + for pdf in "$1"/*.pdf; do doi=$(get_doi_from_pdf "$pdf") [ -n "$doi" ] && { doi=$(normalize_doi "$doi") process_doi "$doi" - } || echo "Could not find DOI in PDF file: $pdf" + } done exit 1 } -[ -f "$2" ] && [ "$(echo "$1" | grep -c "\.pdf$")" -ne 0 ] && { - doi=$(get_doi_from_pdf "$2") +[ -f "$1" ] && [ "$(echo "$1" | grep -c "\.pdf$")" -ne 0 ] && { + doi=$(get_doi_from_pdf "$1") [ -n "$doi" ] && { doi=$(normalize_doi "$doi") process_doi "$doi" - } || echo "Could not find DOI in PDF file: $2" + } exit 1 } -doi=$(echo "$2" | correction_method) +doi=$(echo "$1" | correction_method) [ -n "$doi" ] && { doi=$(normalize_doi "$doi") process_doi "$doi" -} || echo "Invalid DOI provided: $2" +} From c25ce809167eed349bfabe71734276956fe128b2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Fri, 8 Dec 2023 20:56:30 +0300 Subject: [PATCH 5/8] Improve && Minimize --- .local/bin/getbib | 51 ++++++++++++++++++----------------------------- 1 file changed, 19 insertions(+), 32 deletions(-) diff --git a/.local/bin/getbib b/.local/bin/getbib index d9830ca17a..4722ac861d 100755 --- a/.local/bin/getbib +++ b/.local/bin/getbib @@ -3,7 +3,7 @@ BIB_FILE="$HOME/latex/uni.bib" correction_method() { - sed -n -E 's/.*((DOI|doi)((\.(org))?\/?|:? *))([^: ]+[^ .]).*/doi:\6/p; T; q' + sed -n -E 's/.*((DOI|doi)((\.(org))?\/?|:? *))([^: ]+[^ .]).*/\6/p; T; q' } get_doi_from_pdf() { @@ -15,62 +15,49 @@ get_doi_from_pdf() { } correct_names() { - sed '/^@[a-z]\+{[^[:space:]]\+[0-9]\{4\},/{ - s/\([A-Z]\)/\L\1/g - s/_//g - s/[0-9]*\([0-9]\{2\}\)/\1/g - }' -} - -normalize_doi() { - doi="$1" - doi=$(echo "$doi" | sed 's@%@\\x@g' | xargs -I {} printf "%b" "{}") - printf "%s" "$doi" | tr 'A-Z' 'a-z' + sed 's/\}, /\},\n /g + s/, /,\n / + s/ }/\n}/ + s/,\s*pages=/,\n\tpages=/' | + sed '1s/^ *// + 1s/[0-9]*\([0-9]\{2\}\)/\1/ + 1s/_// + 1s/.*/\L&/ + s/.*=/\L&/ + s/=/ = /' } process_doi() { doi="$1" - bibtex_entry=$(curl -s "https://api.crossref.org/works/$doi/transform/application/x-bibtex" -w "\\n" | correct_names) + bibtex_entry=$(curl -s "https://api.crossref.org/works/"$doi"/transform/application/x-bibtex" | correct_names) red_color='\033[0;31m' reset_color='\033[0m' printf "${red_color}%s${reset_color}\n" "$bibtex_entry" [ -z "$bibtex_entry" ] && [ "$(echo "$bibtex_entry" | cut -c2)" != "@" ] && echo "Failed to fetch bibtex entry for DOI: $doi" && return 1 - normalized_doi="${doi#doi:}" - grep -q -E "doi\s*=\s*\{$(echo "$normalized_doi" | sed 's/(/\\(/g')\}" "$BIB_FILE" || { + grep -Fq "doi = {${doi}}" "$BIB_FILE" || { [ -s "$BIB_FILE" ] && echo "" >> "$BIB_FILE" echo "$bibtex_entry" >> "$BIB_FILE" echo "Added bibtex entry for DOI: $doi" - return 1 + return 0 } echo "Bibtex entry for DOI: $doi already exists in the file." } -[ -z "$1" ] && echo "Give either a pdf file or a DOI or a directory path that has PDFs as an argument." && exit 1 +[ -z "$1" ] && echo "Give either a pdf file or a DOI or a directory path that has PDFs as an argument." && exit 0 [ -d "$1" ] && { for pdf in "$1"/*.pdf; do doi=$(get_doi_from_pdf "$pdf") - [ -n "$doi" ] && { - doi=$(normalize_doi "$doi") - process_doi "$doi" - } - done - exit 1 + [ -n "$doi" ] && process_doi "$doi"; done + exit 0 } [ -f "$1" ] && [ "$(echo "$1" | grep -c "\.pdf$")" -ne 0 ] && { doi=$(get_doi_from_pdf "$1") - [ -n "$doi" ] && { - doi=$(normalize_doi "$doi") - process_doi "$doi" - } - exit 1 + [ -n "$doi" ] && { process_doi "$doi"; exit 0; } } doi=$(echo "$1" | correction_method) -[ -n "$doi" ] && { - doi=$(normalize_doi "$doi") - process_doi "$doi" -} +[ -n "$doi" ] && process_doi "$doi" From 0465be083b6d235769551ae671cc58e3efd06fb8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Sat, 30 Dec 2023 03:22:32 +0300 Subject: [PATCH 6/8] Make the duplicate matching case insensitive --- .local/bin/getbib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.local/bin/getbib b/.local/bin/getbib index 4722ac861d..3e6c19ab63 100755 --- a/.local/bin/getbib +++ b/.local/bin/getbib @@ -36,7 +36,7 @@ process_doi() { printf "${red_color}%s${reset_color}\n" "$bibtex_entry" [ -z "$bibtex_entry" ] && [ "$(echo "$bibtex_entry" | cut -c2)" != "@" ] && echo "Failed to fetch bibtex entry for DOI: $doi" && return 1 - grep -Fq "doi = {${doi}}" "$BIB_FILE" || { + grep -iFq "doi = {${doi}}" "$BIB_FILE" || { [ -s "$BIB_FILE" ] && echo "" >> "$BIB_FILE" echo "$bibtex_entry" >> "$BIB_FILE" echo "Added bibtex entry for DOI: $doi" From 7752ed9ef1454d1d0fe797cafeb546e5f7b30fa6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Sun, 7 Jan 2024 19:20:47 +0300 Subject: [PATCH 7/8] Formatting change. --- .local/bin/getbib | 88 ++++++++++++++++++++++++++++------------------- 1 file changed, 53 insertions(+), 35 deletions(-) diff --git a/.local/bin/getbib b/.local/bin/getbib index 3e6c19ab63..e2a1bebd31 100755 --- a/.local/bin/getbib +++ b/.local/bin/getbib @@ -1,25 +1,28 @@ #!/bin/dash -BIB_FILE="$HOME/latex/uni.bib" +BIB_FILE="${HOME}/latex/uni.bib" correction_method() { - sed -n -E 's/.*((DOI|doi)((\.(org))?\/?|:? *))([^: ]+[^ .]).*/\6/p; T; q' + sed -n -E 's/.*((DOI|doi)((\.(org))?\/?|:? *))([^: ]+[^ .]).*/\6/p; T; q' } get_doi_from_pdf() { - pdf="$1" - doi=$(pdfinfo "$pdf" 2>/dev/null | correction_method) - [ -z "$doi" ] && doi=$(pdftotext -q -l 2 "$pdf" - 2>/dev/null | correction_method) - [ -z "$doi" ] && echo "No DOI found for PDF: $pdf" >&2 && return 1 - echo "$doi" + pdf="${1}" + doi="$(pdfinfo "${pdf}" 2> "/dev/null" | correction_method)" + + [ -z "${doi}" ] && doi="$(pdftotext -q -l "2" "${pdf}" - 2> "/dev/null" | correction_method)" + + [ -z "${doi}" ] && echo "No DOI found for PDF: ${pdf}" >&2 && return "1" + + echo "${doi}" } correct_names() { - sed 's/\}, /\},\n /g + sed 's/\}, /\},\n /g s/, /,\n / s/ }/\n}/ s/,\s*pages=/,\n\tpages=/' | - sed '1s/^ *// + sed '1s/^ *// 1s/[0-9]*\([0-9]\{2\}\)/\1/ 1s/_// 1s/.*/\L&/ @@ -28,36 +31,51 @@ correct_names() { } process_doi() { - doi="$1" - bibtex_entry=$(curl -s "https://api.crossref.org/works/"$doi"/transform/application/x-bibtex" | correct_names) - red_color='\033[0;31m' - reset_color='\033[0m' - - printf "${red_color}%s${reset_color}\n" "$bibtex_entry" - [ -z "$bibtex_entry" ] && [ "$(echo "$bibtex_entry" | cut -c2)" != "@" ] && echo "Failed to fetch bibtex entry for DOI: $doi" && return 1 - - grep -iFq "doi = {${doi}}" "$BIB_FILE" || { - [ -s "$BIB_FILE" ] && echo "" >> "$BIB_FILE" - echo "$bibtex_entry" >> "$BIB_FILE" - echo "Added bibtex entry for DOI: $doi" - return 0 - } - echo "Bibtex entry for DOI: $doi already exists in the file." + doi="${1}" + bibtex_entry="$(curl -s "https://api.crossref.org/works/${doi}/transform/application/x-bibtex" | correct_names)" + red_color='\033[0;31m' + reset_color='\033[0m' + + printf "${red_color}%s${reset_color}\n" "${bibtex_entry}" + + [ -z "${bibtex_entry}" ] && [ "$(echo "${bibtex_entry}" | cut -c2)" != "@" ] && { + echo "Failed to fetch bibtex entry for DOI: ${doi}" + return "1" + } + + grep -iFq "doi = {${doi}}" "${BIB_FILE}" || { + [ -s "${BIB_FILE}" ] && echo "" >> "${BIB_FILE}" + echo "${bibtex_entry}" >> "${BIB_FILE}" + echo "Added bibtex entry for DOI: ${doi}" + return "0" + } + + echo "Bibtex entry for DOI: ${doi} already exists in the file." +} + +[ -z "${1}" ] && { + echo "Give either a pdf file or a DOI or a directory path that has PDFs as an argument." + exit "0" } -[ -z "$1" ] && echo "Give either a pdf file or a DOI or a directory path that has PDFs as an argument." && exit 0 +[ -d "${1}" ] && { + for pdf in "${1}"/*.pdf; do + doi="$(get_doi_from_pdf "${pdf}")" + [ -n "${doi}" ] && process_doi "${doi}" + done -[ -d "$1" ] && { - for pdf in "$1"/*.pdf; do - doi=$(get_doi_from_pdf "$pdf") - [ -n "$doi" ] && process_doi "$doi"; done - exit 0 + exit "0" } -[ -f "$1" ] && [ "$(echo "$1" | grep -c "\.pdf$")" -ne 0 ] && { - doi=$(get_doi_from_pdf "$1") - [ -n "$doi" ] && { process_doi "$doi"; exit 0; } +[ -f "${1}" ] && [ "$(echo "${1}" | grep -c "\.pdf$")" -ne "0" ] && { + doi="$(get_doi_from_pdf "${1}")" + + [ -n "${doi}" ] && { + process_doi "${doi}" + exit "0" + } } -doi=$(echo "$1" | correction_method) -[ -n "$doi" ] && process_doi "$doi" +doi="$(echo "${1}" | correction_method)" + +[ -n "${doi}" ] && process_doi "${doi}" From 5c346d90ac91abf485d0d62dc5f45265f761a2c8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Wed, 22 May 2024 18:27:24 +0300 Subject: [PATCH 8/8] Reformat | Add custom .bib --- .local/bin/getbib | 102 +++++++++++++++++++++------------------------- 1 file changed, 46 insertions(+), 56 deletions(-) diff --git a/.local/bin/getbib b/.local/bin/getbib index e2a1bebd31..03a50cda48 100755 --- a/.local/bin/getbib +++ b/.local/bin/getbib @@ -1,81 +1,71 @@ -#!/bin/dash +#!/bin/sh BIB_FILE="${HOME}/latex/uni.bib" +[ -f "${BIB_FILE}" ] || BIB_FILE="${2:-$(find "${HOME}" -path "${HOME}/.*" \ + -prune -o -type "f" -name "*.bib" -print -quit)}" -correction_method() { - sed -n -E 's/.*((DOI|doi)((\.(org))?\/?|:? *))([^: ]+[^ .]).*/\6/p; T; q' +{ [ -f "${BIB_FILE}" ] || [ "${2}" ]; } || { + printf "%s\n" "Create a .bib file or provide as \$2." && exit "1" } -get_doi_from_pdf() { +filter() { + sed -n -E 's/.*((DOI|doi)((\.(org))?\/?|:? *))([^: ]+[^ .]).*/\6/p; T; q' +} + +fpdf() { pdf="${1}" - doi="$(pdfinfo "${pdf}" 2> "/dev/null" | correction_method)" + doi="$(pdfinfo "${pdf}" 2> "/dev/null" | filter)" - [ -z "${doi}" ] && doi="$(pdftotext -q -l "2" "${pdf}" - 2> "/dev/null" | correction_method)" + [ "${doi}" ] || doi="$(pdftotext -q -l "2" "${pdf}" - 2> "/dev/null" | filter)" - [ -z "${doi}" ] && echo "No DOI found for PDF: ${pdf}" >&2 && return "1" + [ "${doi}" ] || printf "%s\n" "No DOI found for PDF: ${pdf}" >&2 - echo "${doi}" + printf "%s\n" "${doi}" } -correct_names() { - sed 's/\}, /\},\n /g - s/, /,\n / - s/ }/\n}/ - s/,\s*pages=/,\n\tpages=/' | - sed '1s/^ *// - 1s/[0-9]*\([0-9]\{2\}\)/\1/ - 1s/_// - 1s/.*/\L&/ - s/.*=/\L&/ - s/=/ = /' +arrange() { + sed 's/\}, /\},\n /g + s/, /,\n / + s/ }/\n}/ + s/,\s*pages=/,\n\tpages=/' | + sed '1s/^ *// + 1s/[0-9]*\([0-9]\{2\}\)/\1/ + 1s/_// + 1s/.*/\L&/ + s/.*=/\L&/ + s/=/ = /' } -process_doi() { - doi="${1}" - bibtex_entry="$(curl -s "https://api.crossref.org/works/${doi}/transform/application/x-bibtex" | correct_names)" - red_color='\033[0;31m' - reset_color='\033[0m' +doi2bib() { + doi="${1#doi:}" + url="https://api.crossref.org/works/${doi}/transform/application/x-bibtex" + entry="$(curl -kLsS --no-fail "${url}" | arrange)" + red='\033[0;31m' + reset='\033[0m' - printf "${red_color}%s${reset_color}\n" "${bibtex_entry}" + printf "${red}%s${reset}\n" "${entry}" - [ -z "${bibtex_entry}" ] && [ "$(echo "${bibtex_entry}" | cut -c2)" != "@" ] && { - echo "Failed to fetch bibtex entry for DOI: ${doi}" + [ "${entry%"${entry#?}"}" != "@" ] && { + printf "%s\n" "Failed to fetch bibtex entry for DOI: ${doi}" return "1" } - grep -iFq "doi = {${doi}}" "${BIB_FILE}" || { - [ -s "${BIB_FILE}" ] && echo "" >> "${BIB_FILE}" - echo "${bibtex_entry}" >> "${BIB_FILE}" - echo "Added bibtex entry for DOI: ${doi}" - return "0" + grep -iFq "doi = {${doi}}" "${BIB_FILE}" 2> "/dev/null" && { + printf "%s\n" "Bibtex entry for DOI: ${doi} already exists in the file." + } || { + [ -s "${BIB_FILE}" ] && printf "\n" >> "${BIB_FILE}" + printf "%s\n" "${entry}" >> "${BIB_FILE}" + printf "%s\n" "Added bibtex entry for DOI: ${doi}" } - - echo "Bibtex entry for DOI: ${doi} already exists in the file." } -[ -z "${1}" ] && { - echo "Give either a pdf file or a DOI or a directory path that has PDFs as an argument." - exit "0" +[ "${1}" ] || { + printf "%s\n" "Give either a pdf file or a DOI or a directory path that has PDFs as an argument." + exit "1" } -[ -d "${1}" ] && { - for pdf in "${1}"/*.pdf; do - doi="$(get_doi_from_pdf "${pdf}")" - [ -n "${doi}" ] && process_doi "${doi}" - done - - exit "0" -} - -[ -f "${1}" ] && [ "$(echo "${1}" | grep -c "\.pdf$")" -ne "0" ] && { - doi="$(get_doi_from_pdf "${1}")" - - [ -n "${doi}" ] && { - process_doi "${doi}" - exit "0" - } -} +[ -f "${1}" ] && doi="$(fpdf "${1}")" && doi2bib "${doi}" && exit "0" -doi="$(echo "${1}" | correction_method)" +[ -d "${1}" ] && for i in "${1}"/*.pdf; do doi="$(fpdf "${i}")" && doi2bib "${doi}"; done && exit "0" -[ -n "${doi}" ] && process_doi "${doi}" +doi="$(printf "%s\n" "${1}" | filter)" && doi2bib "${doi}"