#StackBounty: #postgresql #replication #security Security considerations of allowing logical replication subscribers on PostgreSQL

Bounty: 50

I run a non-profit dedicated to sharing data kind of like Wikipedia.

We recently had a client that wanted a replica of our database, and we realized that by using PostgreSQL’s new logical replication we can replicate tables of our DB to a server they control.

This would be great for fulfilling our mission of sharing the data. It’s 100× better than providing slow APIs.

We created a role for them like this:

CREATE ROLE some_client WITH REPLICATION LOGIN PASSWORD 'long-password';
GRANT SELECT ON TABLE some_table TO some_client;

And we created a PUBLICATION for them like this:

CREATE PUBLICATION testpublication FOR TABLE ONLY some_table;

Is there any risk of doing this? My analysis is that this gives them SELECT access to a table that they’re replicating to their own server, but that’s it. Any other concerns? If there are concerns, are there ways to make this work? We have tables we don’t want to share, but most of our tables only have public data.


Get this bounty!!!

#StackBounty: #postgresql #postgresql-10 Postgresql 10 – Parallel configuration

Bounty: 50

There are 4 configurations to enable the parallel and do the optimization, but the documentation of PostgreSQL doesn’t says anything about values or calculation. My questions are:

1- How to calculate the values of max_parallel_workers,
max_parallel_workers_per_gather and max_worker_processes?

2- The work_mem can be calculate on base of connections and
memory(RAM), but the work_mem needs to change something if I enable
the parallel?

My supposition is: if the machine has 8 cores the max_parallel_workers is 8 and the values of worker process and per gather are 32(8*4), the number 4 I took from the original configuration that is 4 gathers per 1 parallel work.


Get this bounty!!!

#StackBounty: #postgresql #partitioning #postgresql-9.6 #postgresql-10 Postgres table partitioning – declarative vs inheritance

Bounty: 100

I have a table with over 70MM rows running on Postgres 9.6.6.
The table size is about 50GB (70GB with indexes). The table size is projected to triple in the next 3 months. The growth will slow after that.

The table has several varchar fields and 60+ numeric fields. Each row includes customer ID and every query uses customer ID. There are no JOINs – each query retrieves either a collection of rows, or aggregation over some collection of rows.

Any recommendations if I should

  1. keep 9.6.6 and using inheritance,
  2. upgrade to 10.4 and using declarative partitioning,
  3. try something else?


Get this bounty!!!

#StackBounty: #python-3.x #postgresql #sqlalchemy #psycopg2 #python-multiprocessing multiprocessing / psycopg2 TypeError: can't pic…

Bounty: 100

I followed the below code in order to implement a parallel select query on a postgres database:

https://tech.geoblink.com/2017/07/06/parallelizing-queries-in-postgresql-with-python/

My basic problem is that I have ~6k queries that need to be executed, and I am trying to optimise the execution of these select queries. Initially it was a single query with the where id in (...) contained all 6k predicate IDs but I ran into issues with the query using up > 4GB of RAM on the machine it ran on, so I decided to split it out into 6k individual queries which when synchronously keeps a steady memory usage. However it takes a lot longer to run time wise, which is less of an issue for my use case. Even so I am trying to reduce the time as much as possible.

This is what my code looks like:

class PostgresConnector(object):
    def __init__(self, db_url):
        self.db_url = db_url
        self.engine = self.init_connection()
        self.pool = self.init_pool()

    def init_pool(self):
        CPUS = multiprocessing.cpu_count()
        return multiprocessing.Pool(CPUS)

    def init_connection(self):
        LOGGER.info('Creating Postgres engine')
        return create_engine(self.db_url)

    def run_parallel_queries(self, queries):
        results = []
        try:
            for i in self.pool.imap_unordered(self.execute_parallel_query, queries):
                results.append(i)
        except Exception as exception:
            LOGGER.error('Error whilst executing %s queries in parallel: %s', len(queries), exception)
            raise
        finally:
            self.pool.close()
            self.pool.join()

        LOGGER.info('Parallel query ran producing %s sets of results of type: %s', len(results), type(results))

        return list(chain.from_iterable(results))

    def execute_parallel_query(self, query):
        con = psycopg2.connect(self.db_url)
        cur = con.cursor()
        cur.execute(query)
        records = cur.fetchall()
        con.close()

        return list(records)

However whenever this runs, I get the following error:

TypeError: can't pickle _thread.RLock objects

I’ve read lots of similar questions regarding the use of multiprocessing and pickleable objects but I cant for the life of me figure out what I am doing wrong.

The pool is generally one per process (which I believe is the best practise) but shared per instance of the connector class so that its not creating a pool for each use of the parallel_query method.

The top answer to a similar question:

Accessing a MySQL connection pool from Python multiprocessing

Shows an almost identical implementation to my own, except using MySql instead of Postgres.

Am I doing something wrong?

Thanks!

EDIT:

I’ve found this answer:

Python Postgres psycopg2 ThreadedConnectionPool exhausted

which is incredibly detailed and looks as though I have misunderstood what multiprocessing.Pool vs a connection pool such as ThreadedConnectionPool gives me. However in the first link it doesn’t mention needing any connection pools etc. This solution seems good but seems A LOT of code for what I think is a fairly simple problem?

EDIT 2:

So the above link solves another problem, which I would have likely run into anyway so I’m glad I found that, but it doesnt solve the initial issue of not being able to use imap_unordered down to the pickling error. Very frustrating.


Get this bounty!!!

#StackBounty: #postgresql #pg-restore pg_restore old database backup

Bounty: 50

Running 9.5 on Ubuntu 16.04
Unsure what database was backed up, I think 8.4

When I run pg_restore I get

pg_restore: implied data-only restore
--
-- PostgreSQL database dump
--


-- Started on 30608-10-13 11:53:01 MDT

SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'SQL_ASCII';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
SET row_security = off;

-- Completed on 2018-09-06 11:12:06 MDT

--
-- PostgreSQL database dump complete
--

When I run pg_restore -l

;
; Archive created at 30608-10-13 11:53:01 MDT
;     dbname: 
;     TOC Entries: -1835365408
;     Compression: -1
;     Dump Version: 1.11-0
;     Format: CUSTOM
;     Integer: 4 bytes
;     Offset: 8 bytes
;
;
; Selected TOC Entries:
;

Obviously there is a timestamp issue, and clearly the TOC entries and compression is off.

Not sure where to go from here. The file size indicates it should be a complete backup. I have multiple backups from same time frame, and they all report similar when I try to restore them.

Is there any way to uncompress the data portion of the file. I can hexedit and see the schema, but the data is in postgres’s binary compressed (-Fc) format. I just need to find and verify a few entries, so if there’s a manual way to inspect and search, that would work.

Any help is appreciated.


Get this bounty!!!

#StackBounty: #postgresql #amazon-web-services #amazon-redshift Remove ability view/list all table from schema

Bounty: 50

Using a Postgres/Amazon Redshift database. I have a schema called ‘Public’, and another schema called ‘SchemaX’. I have created a user called ‘User1’; and give him access to ‘SchemaX’. I want to stop ‘User1’ from viewing or listing the available tables in my ‘Public’ schema. How does one go about doing this?


Get this bounty!!!

#StackBounty: #sql #postgresql #query-performance Query optimization with multi-column variant matching

Bounty: 100

TL;DR – I’m looking for advice on how to better write the query below.

Below is a pared down version of my table structure with some sample data. I don’t have control over the data
structure at all so recommendations on schema changes unfortunately won’t help me.

Problem

Given a building_level_key and a faction_key I need to return a record from building_levels joined to its
closest match from the building_culture_variants table.

For example, if I used goblin_walls & fact_blue I would expect the goblin_walls record that joins
building_culture_variant_key record 2.

An example structure of tables can be seen below:

Sample data with db structure

  • factions – is a compacted version of the real table as cultures/subculture records are stored in different
    tables but it gets the point across. This table is only really needed in the query so that the appropriate
    culture/subculture can be referenced in relation to a given faction_key.

  • building_levels – acts as a base record for every building in the system. There is only one record per
    building.

  • building_culture_variants – acts as its name implies; there can be more than one record for each building_level_key and each variant record is matched against a building level using the building_level_key and a combination of faction_key, culture_key and subculture_key.

How matching works

Matching starts with finding the given building_level_key in the culture variants table. This is a hard match and is needed to join any two building level and culture variant.

Each building level record will have at least one culture variant. Often there are several culture variants per building level but on average no more that 4. The most common culture variants is a “generic” one which means that the faction_key, culture_key and subculture_key columns are all null so the building will match against any faction. However any combination of the faction columns could have a key so I need to match a given faction against each of the faction columns in the culture variant.

Side note: the culture variant keys are always consistent, meaning I’ll never have a scenario where a faction_key and subculture_key in the culture variants table don’t match a corresponding faction_key and subculture_key from the factions table (and subculture table, which has been omitted for clarity).

What I’ve tried

I have provided a sql fiddle to play around with and included my version of the query below:

SELECT 
  "building_culture_variants"."building_culture_variant_key" AS qualified_key, 
  "building_levels"."building_level_key" AS building_key, 
  "building_levels"."create_time", 
  "building_levels"."create_cost", 
  "building_culture_variants"."name",
  'fact_blue'::text AS faction_key
FROM 
  "building_levels" 
  INNER JOIN "building_culture_variants" ON (
    "building_culture_variants"."building_culture_variant_key" IN (
      SELECT 
        "building_culture_variant_key" 
      FROM 
        (
          SELECT 
            "building_culture_variants"."building_culture_variant_key", 
            (
                CASE WHEN "building_culture_variants"."faction_key" = "building_factions"."faction_key" THEN 1 WHEN "building_culture_variants"."faction_key" IS NULL THEN 0 ELSE NULL END + 
                CASE WHEN "building_culture_variants"."culture_key" = "building_factions"."culture_key" THEN 1 WHEN "building_culture_variants"."culture_key" IS NULL THEN 0 ELSE NULL END + 
                CASE WHEN "building_culture_variants"."subculture_key" = "building_factions"."subculture_key" THEN 1 WHEN "building_culture_variants"."subculture_key" IS NULL THEN 0 ELSE NULL END
            ) AS match_count 
          FROM 
            "building_culture_variants" 
            INNER JOIN (
              -- This is a subquery because here I would join a couple more tables
              -- to collect all of the faction info
              SELECT 
                "factions"."faction_key", 
                "factions"."culture_key", 
                "factions"."subculture_key"
              FROM 
                "factions" 
            ) AS "building_factions" ON ("building_factions"."faction_key" = 'fact_blue')
          WHERE ("building_levels"."building_level_key" = "building_culture_variants"."building_level_key") 
          GROUP BY 
            match_count, 
            building_culture_variant_key 
          ORDER BY 
            match_count DESC NULLS LAST 
          LIMIT 
            1
        ) AS "culture_variant_match"
    )
  ) 
WHERE "building_levels"."building_level_key" = 'goblin_walls'
ORDER BY 
  "building_levels"."building_level_key"

Question

The query I provided above works and gets the job done but I feel like I’m trying to brute force the problem by just nesting a bunch of queries. I get this feeling like I’m not taking advantage of some sql construct that would streamline the performance of, or greatly simplify the query.

So what I’m really asking, is there a better way I could rewrite the query to be more efficient?


Get this bounty!!!

#StackBounty: #postgresql #plpgsql #postgresql-triggers Postgresql: how to use variable settings in function triggers

Bounty: 300

I would like to record the id of a user in the session/transaction, using SET, so I could be able to access it later in a trigger function, using current_setting. Basically, I’m trying option n2 from a very similar ticket posted previously, with the difference that I’m using PG 10.1 .

I’ve been trying 3 approaches to setting the variable:

  • SET local myvars.user_id = 4, thereby setting it locally in the transaction;
  • SET myvars.user_id = 4, thereby setting it in the session;
  • SELECT set_config('myvars.user_id', '4', false), which depending of the last argument, will be a shortcut for the previous 2 options.

None of them is usable in the trigger, which receives NULL when getting the variable through current_setting. Here is a script I’ve devised to troubleshoot it (can be easily used with the postgres docker image):

database=$POSTGRES_DB
user=$POSTGRES_USER
[ -z "$user" ] && user="postgres"

psql -v ON_ERROR_STOP=1 --username "$user" $database <<-EOSQL
    DROP TRIGGER IF EXISTS add_transition1 ON houses;
    CREATE TABLE IF NOT EXISTS houses (
        id SERIAL NOT NULL,
        name VARCHAR(80),
        created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now(),
        PRIMARY KEY(id)
    );

    CREATE TABLE IF NOT EXISTS transitions1 (
        id SERIAL NOT NULL,
        house_id INTEGER,
        user_id INTEGER,
        created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now(),
        PRIMARY KEY(id),
        FOREIGN KEY(house_id) REFERENCES houses (id) ON DELETE CASCADE

    );

    CREATE OR REPLACE FUNCTION add_transition1() RETURNS TRIGGER AS $$
        DECLARE
            user_id integer;
        BEGIN
            user_id := current_setting('myvars.user_id')::integer || NULL;
            INSERT INTO transitions1 (user_id, house_id) VALUES (user_id, NEW.id);
            RETURN NULL;
        END;
    $$ LANGUAGE plpgsql;

    CREATE TRIGGER add_transition1 AFTER INSERT OR UPDATE ON houses FOR EACH ROW EXECUTE PROCEDURE add_transition1();

    BEGIN;
    %1% SELECT current_setting('myvars.user_id');
    %2% SELECT set_config('myvars.user_id', '55', false);
    %3% SELECT current_setting('myvars.user_id');
    INSERT INTO houses (name) VALUES ('HOUSE PARTY') RETURNING houses.id;
    SELECT * from houses;
    SELECT * from transitions1;
    COMMIT;
    DROP TRIGGER IF EXISTS add_transition1 ON houses;
    DROP FUNCTION IF EXISTS add_transition1;
    DROP TABLE transitions1;
        DROP TABLE houses;
EOSQL

The conclusion I came to was that the function is triggered in a different transaction and a different (?) session. Is this something that one can configure, so that all happens within the same context?


Get this bounty!!!

#StackBounty: #ruby-on-rails #postgresql ActiveRecord::StatementInvalid, PG::UndefinedTable error, but generated SQL works

Bounty: 200

This has been tremendously frustrating. I’m trying to get a has_many through working, and I think I’m just too close to this to see something super obvious. Each step works correctly, and the SQL that Rails is generating works, but together in the console it’s not.

The one weird thing about this whole setup is that there are a couple of tables in a salesforce schema, and the tablename and primary key aren’t standard. Here’s the basic structure:

class Contact
  self.table_name =  'salesforce.contact'
  self.primary_key = 'sfid'

  has_many :content_accesses
  has_many :inventories, through: :content_accesses # I've tried inventory and inventorys, just to ensure it's not Rails magic
end


class ContentAccess
  belongs_to :inventory
  belongs_to :contact
end


class Inventory
  self.table_name =  'salesforce.inventory__c'
  self.primary_key = 'sfid'

  has_many :content_accesses, foreign_key: 'inventory_id'
end

Works:

c = Contact.first
c.content_accesses # works, gives the related items

c.content_accesses.first.inventory # works, gives the related Inventory item

Error:

c.inventories # Gives:

# ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR:  relation "content_accesses" does not exist)
# LINE 1: ..._c".* FROM "salesforce"."inventory__c" INNER JOIN "content_a...
#                                                          ^
# : SELECT  "salesforce"."inventory__c".* FROM "salesforce"."inventory__c" INNER JOIN "content_accesses" ON "salesforce"."inventory__c"."sfid" = "content_accesses"."inventory_id" WHERE "content_accesses"."contact_id" = $1 LIMIT $2

When I run that query through Postico, though, it works fine. 🤬

Edited to add:

  • I moved content_accesses into the salesforce schema, and set self.table_name on the model correctly, but the problem still happens. As such, I don’t think this is related to being cross-schema.
  • That only makes this problem weirder to me. 🙁

DDL for the tables:

CREATE TABLE salesforce.inventory__c (
    createddate timestamp without time zone,
    isdeleted boolean,
    name character varying(80),
    systemmodstamp timestamp without time zone,
    inventory_unique_name__c character varying(255),
    sfid character varying(18),
    id integer DEFAULT nextval('salesforce.inventory__c_id_seq'::regclass) PRIMARY KEY,
    _hc_lastop character varying(32),
    _hc_err text
);

CREATE UNIQUE INDEX inventory__c_pkey ON salesforce.inventory__c(id int4_ops);
CREATE INDEX hc_idx_inventory__c_systemmodstamp ON salesforce.inventory__c(systemmodstamp timestamp_ops);
CREATE UNIQUE INDEX hcu_idx_inventory__c_sfid ON salesforce.inventory__c(sfid text_ops);


CREATE TABLE salesforce.contact (
    lastname character varying(80),
    mailingpostalcode character varying(20),
    accountid character varying(18),
    assistantname character varying(40),
    name character varying(121),
    mobilephone character varying(40),
    birthdate date,
    phone character varying(40),
    mailingstreet character varying(255),
    isdeleted boolean,
    assistantphone character varying(40),
    systemmodstamp timestamp without time zone,
    mailingstatecode character varying(10),
    createddate timestamp without time zone,
    mailingcity character varying(40),
    salutation character varying(40),
    title character varying(128),
    mailingcountrycode character varying(10),
    firstname character varying(40),
    email character varying(80),
    sfid character varying(18),
    id integer DEFAULT nextval('salesforce.contact_id_seq'::regclass) PRIMARY KEY,
    _hc_lastop character varying(32),
    _hc_err text
);

CREATE UNIQUE INDEX contact_pkey ON salesforce.contact(id int4_ops);
CREATE INDEX hc_idx_contact_systemmodstamp ON salesforce.contact(systemmodstamp timestamp_ops);
CREATE UNIQUE INDEX hcu_idx_contact_sfid ON salesforce.contact(sfid text_ops);

CREATE TABLE content_accesses (
    id BIGSERIAL PRIMARY KEY,
    inventory_id character varying(20),
    contact_id character varying(20),
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL
);

CREATE UNIQUE INDEX content_accesses_pkey ON content_accesses(id int8_ops);


Get this bounty!!!

#StackBounty: #python3 #postgresql #python-2.7 Error: could not determine PostgreSQL version from '10.4'

Bounty: 50

getting error while installing uniq

Error message:

Downloading https://engci-maven-master.cisco.com/artifactory/api/pypi/apic-em-pypi-group/packages/86/fd/cc8315be63a41fe000cce20482a917e874cdc1151e62cb0141f5e55f711e/psycopg2-2.6.1.tar.gz (371kB)
    100% |u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588u2588| 378kB 458kB/s 
    Complete output from command python setup.py egg_info:
    running egg_info
    creating pip-egg-info/psycopg2.egg-info
    writing pip-egg-info/psycopg2.egg-info/PKG-INFO
    writing dependency_links to pip-egg-info/psycopg2.egg-info/dependency_links.txt
    writing top-level names to pip-egg-info/psycopg2.egg-info/top_level.txt
    writing manifest file 'pip-egg-info/psycopg2.egg-info/SOURCES.txt'
    Error: could not determine PostgreSQL version from '10.4'


----------------------------------------

Command “python setup.py egg_info” failed with error code 1 in /tmp/pip-build-_cf1z03a/psycopg2/

Ubuntu 18.04

how can i install correctly

Command used:
pip3 install uniq==2.1.22.* –no-cache-dir –index-url=https://engci-maven-master.cisco.com/artifactory/api/pypi/apic-em-pypi-group/simple


Get this bounty!!!