Database Rules
Table of Contents
Intro
This is an overview of how all of the different database modules (tables) play off of each other to make the article_pipeline system function. I will start by completing an overview of all of the models that can be found in the models.py file.
AdminName
Used to store an admin_name string that will show up automatically in emails that are sent by the system - automatically replacing “{+admin_name+}” in any email templates.
-
admin_name
- string that includes the FULL NAME of the admin whose name will show up in the email
-
currently_in_use
Whether or not to have this specific admin name show up as an option when sending emails. This is so that if team members ever leave or stop performing admin duties we can set this to false so that their name no longer shows up in our system
DefaultAdmin
Used to point to a single AdminName object at a time. This admin name is the name that will be included by default in any emails that are sent automatically by the system. There pretty much always has to be one set to prevent crashes, thus if there is no DefaultAdmin it will be set to “The NP-MRD Team” automatically during deployment via the “management/commands/set_default_email_admin.py” in the entrypoint.sh scripts.
The default email admin can also be changed via the “Set default admin” dropdown actions on the Admin Names page in the Admin Tools.
NOT TO BE CONFUSED WITH THE BUILT IN “DJANGO ADMINS”, which are profiles that exist within django’s framework. We have only ever used a single one of these which is the login used to access the admin tools
DOISource / AnnotationSource / CitationSource / AuthorSalutation /DepositionType
Simple foreign key tables used to store all possible options for corresponding fields in various different models. Setting up the tables this way was a mistake for the most but changing them would require significant work so they will remain for the time being. Each of these tables needs values in it to function and will be automatically filled by the “deployment_key_tables” fixture.
This also means that if you ever need to add any values to these tables please add them the deployment_key_tables and then re-deploy. This will ensure that parity between all deployments of the app.
ArticleData
This is the central table corresponding to each “entry” within our database (a published_article, presubmission, or private_submission). Stores various data about each and is also responsible for making entries show up on many pages in the admin tools. Any other tables which pertain to an entry in the database is connected to ArticleData in some way so it often important to call while writing code. Each field does something pretty different so I’ll go over each.
A programming quirk of accessing an ArticleData object in code is that, due to a poor naming decision, if you are trying to access it THROUGH another model object (i.e. an Article object) you have to do so by specifying “article_object.uuid”. This is because foreign_key fields in other models that point to ArticleData are all named “uuid”, which will return to you the corresponding ArticleData object. This is why accessing the “UUID” value through an article_object instance looks like “article_object.uuid.uuid” (since you’re first getting the ArticleData object and THEN the uuid value). This is far too deeply integrated into the code to change unfortunately.
-
uuid
The unique “id” value used for each ArticleData entry. Automatically generated to be unique for each entry in the database during any of the functions which build ArticleData objects. This value is a key identifier for each entry in our database and is often used for searching.
-
journal_name
The name of the journal that the article comes from.
-
title
The title that corresponds to the current entry. For a published article it will be the title of the article. For a presubmission it will be the tentative title from the submitter. For a private deposition it will be the name that the depositor entered upon starting their submissions plus the UUID value of the article.
-
abstract
The abstract of the article.
-
abstract_addendum_date
Sometimes articles will have their DOIs available online prior to some of its details being available. If we ingest such articles they will often lack an abstract and we will later re-query them to attach an it. This datefield is for us to keep track of when that occurred in case it is ever relevant.
-
citation_source
Foreign key field that points to the CitationSource table. Used to keep track of where a published article has been cited from.
-
rss_filename
The name of the filename that can be found in the RSS ingestion S3 bucket from which a published article was ingested. Never really relevant but can be used to trace