#StackBounty: #postgresql #liquibase Postgres changeset with column TEXT not working with Liquibase 3.6.2 and Postgres 9.6

Bounty: 100

I am working with the new Spring Boot 2.1.0 version. In Spring Boot 2.1.0, Liquibase was updated from 3.5.5 to 3.6.2. I’ve noticed several things in my change sets are no long working.

-- test_table.sql
CREATE TABLE test_table (
   id             SERIAL PRIMARY KEY,
   --Works fine as TEXT or VARCHAR with Liquibase 3.5 which is bundled with Spring Boot version 2.0.6.RELEASE
   --Will only work as VARCHAR with Liquibase 3.6.2 which is bundled with Spring Boot version 2.1.0.RELEASE and above
   worksheet_data TEXT
);
-- test_table.csv
id,worksheet_data
1,fff

-- Liquibase Changeset
    <changeSet id="DATA_01" author="me" runOnChange="false">
    <loadData
            file="${basedir}/sql/data/test_table.csv"
            tableName="test_table"/>
    </changeSet>

This will not work. I am presented with this odd stacktrace. It complains it can’t find liquibase/changelog/fff which I’m not referencing at all in the changeset. The “fff” coincidentally matches the data value in table_test.csv.

    org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'liquibase' defined in class path resource [org/springframework/boot/autoconfigure/liquibase/LiquibaseAutoConfiguration$LiquibaseConfiguration.class]: Invocation of init method failed; nested exception is liquibase.exception.MigrationFailedException: Migration failed for change set liquibase/changelog/data_nonprod.xml::DATA_NONPROD_02::scott_winters:
     Reason: liquibase.exception.DatabaseException: class path resource [liquibase/changelog/fff] cannot be resolved to URL because it does not exist
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1745) ~[spring-beans-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:576) ~[spring-beans-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:498) ~[spring-beans-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320) ~[spring-beans-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) ~[spring-beans-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:318) ~[spring-beans-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199) ~[spring-beans-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:307) ~[spring-beans-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199) ~[spring-beans-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1083) ~[spring-context-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:853) ~[spring-context-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:546) ~[spring-context-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:142) ~[spring-boot-2.1.1.RELEASE.jar:2.1.1.RELEASE]
    at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:775) [spring-boot-2.1.1.RELEASE.jar:2.1.1.RELEASE]
    at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:397) [spring-boot-2.1.1.RELEASE.jar:2.1.1.RELEASE]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:316) [spring-boot-2.1.1.RELEASE.jar:2.1.1.RELEASE]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1260) [spring-boot-2.1.1.RELEASE.jar:2.1.1.RELEASE]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1248) [spring-boot-2.1.1.RELEASE.jar:2.1.1.RELEASE]
    at net.migov.amar.MiAmarApiApplication.main(MiAmarApiApplication.java:33) [classes/:na]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_181]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_181]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_181]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_181]
    at org.springframework.boot.devtools.restart.RestartLauncher.run(RestartLauncher.java:49) [spring-boot-devtools-2.1.1.RELEASE.jar:2.1.1.RELEASE]
    Caused by: liquibase.exception.MigrationFailedException: Migration failed for change set liquibase/changelog/data_nonprod.xml::DATA_NONPROD_02::scott_winters:
         Reason: liquibase.exception.DatabaseException: class path resource [liquibase/changelog/fff] cannot be resolved to URL because it does not exist
        at liquibase.changelog.ChangeSet.execute(ChangeSet.java:637) ~[liquibase-core-3.6.2.jar:na]
        at liquibase.changelog.visitor.UpdateVisitor.visit(UpdateVisitor.java:53) ~[liquibase-core-3.6.2.jar:na]
        at liquibase.changelog.ChangeLogIterator.run(ChangeLogIterator.java:78) ~[liquibase-core-3.6.2.jar:na]
        at liquibase.Liquibase.update(Liquibase.java:202) ~[liquibase-core-3.6.2.jar:na]
        at liquibase.Liquibase.update(Liquibase.java:179) ~[liquibase-core-3.6.2.jar:na]
        at liquibase.integration.spring.SpringLiquibase.performUpdate(SpringLiquibase.java:353) ~[liquibase-core-3.6.2.jar:na]
        at liquibase.integration.spring.SpringLiquibase.afterPropertiesSet(SpringLiquibase.java:305) ~[liquibase-core-3.6.2.jar:na]
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1804) ~[spring-beans-5.1.3.RELEASE.jar:5.1.3.RELEASE]
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1741) ~[spring-beans-5.1.3.RELEASE.jar:5.1.3.RELEASE]
        ... 23 common frames omitted
    Caused by: liquibase.exception.DatabaseException: class path resource [liquibase/changelog/fff] cannot be resolved to URL because it does not exist
        at liquibase.statement.ExecutablePreparedStatementBase.applyColumnParameter(ExecutablePreparedStatementBase.java:191) ~[liquibase-core-3.6.2.jar:na]
        at liquibase.statement.ExecutablePreparedStatementBase.attachParams(ExecutablePreparedStatementBase.java:110) ~[liquibase-core-3.6.2.jar:na]
        at liquibase.statement.BatchDmlExecutablePreparedStatement.attachParams(BatchDmlExecutablePreparedStatement.java:51) ~[liquibase-core-3.6.2.jar:na]
        at liquibase.statement.ExecutablePreparedStatementBase.execute(ExecutablePreparedStatementBase.java:81) ~[liquibase-core-3.6.2.jar:na]
        at liquib

ase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:115) ~[liquibase-core-3.6.2.jar:na]
    at liquibase.database.AbstractJdbcDatabase.execute(AbstractJdbcDatabase.java:1229) ~[liquibase-core-3.6.2.jar:na]
    at liquibase.database.AbstractJdbcDatabase.executeStatements(AbstractJdbcDatabase.java:1211) ~[liquibase-core-3.6.2.jar:na]
    at liquibase.changelog.ChangeSet.execute(ChangeSet.java:600) ~[liquibase-core-3.6.2.jar:na]
    ... 31 common frames omitted
Caused by: java.io.FileNotFoundException: class path resource [liquibase/changelog/fff] cannot be resolved to URL because it does not exist
    at org.springframework.core.io.ClassPathResource.getURL(ClassPathResource.java:195) ~[spring-core-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at liquibase.integration.spring.SpringLiquibase$SpringResourceOpener.getResourcesAsStream(SpringLiquibase.java:556) ~[liquibase-core-3.6.2.jar:na]
    at liquibase.statement.ExecutablePreparedStatementBase.getResourceAsStream(ExecutablePreparedStatementBase.java:281) ~[liquibase-core-3.6.2.jar:na]
    at liquibase.statement.ExecutablePreparedStatementBase.toCharacterStream(ExecutablePreparedStatementBase.java:241) ~[liquibase-core-3.6.2.jar:na]
    at liquibase.statement.ExecutablePreparedStatementBase.applyColumnParameter(ExecutablePreparedStatementBase.java:184) ~[liquibase-core-3.6.2.jar:na]
    ... 38 common frames omitted

If I change TEXT to VARCHAR it works. From my understanding these column types are the same in postgres, so I can work around this. However, this is frustrating, and I don’t see this new behavior documented. From this link 3.6.2 is advertised as a “drop in” change (http://www.liquibase.org/2018/04/liquibase-3-6-0-released.html).

I would like to use the new features of Spring Boot 2.1.0, but I cannot specify liquibase 3.5.5 in my build because Spring Boot will complain about incompatible versions. This is just one issue I’m seeing with changesets that worked in 3.5.5. Maybe the folks at Spring should consider rolling back the version of liquibase.

Any advice on this matter would be greatly appreciated. Thanks.

UPDATED
I have created a sample Spring Boot project to demonstrate this: https://github.com/pcalouche/postgres-liquibase-text


Get this bounty!!!

#StackBounty: #postgresql PostgreSQL: Backend processes are active for a long time

Bounty: 400

now I am hitting a very big road block.

I use PostgreSQL 10.
My system uses table partition introduced in PostgreSQL 10.5.

Sometimes many queries don’t return and at the time many backend processes are active when I check backend processes by pg_stat_activity.
First, I thought theses process are just waiting for lock, but these transactions contain only SELECT statements and the other backend doesn’t use any query which requires ACCESS EXCLUSIVE lock. And these queries which contain only SELECT statements are no problem in terms of plan. And usually these work well. And computer resources(CPU, memory, IO, Network) are also no problem. Therefore, theses transations should never conflict. And I thoughrouly checked the locks of theses transaction by pg_locks and pg_blocking_pids() and finnaly I couldn’t find any lock which makes queries much slower. Many of backends which are active holds only ACCESS SHARE because they use only SELECT.
Now I think these phenomenon are not caused by lock, but something related to new table partition.

So, why are many backends active?
Could anyone help me?
Any comments are highly appreciated.
The blow figure is a part of the result of pg_stat_activity.
If you want any additional information, please tell me.

enter image description here

EDIT

My query dosen’t handle large data. The return type is like this:

uuid UUID
,number BIGINT
,title TEXT
,type1 TEXT
,data_json JSONB
,type2 TEXT
,uuid_array UUID[]
,count BIGINT

Because it has JSONB column, I cannot caluculate the exact value, but it is not large JSON.
Normally theses queries are moderately fast(around 1.5s), so it is absolutely no problem, however when other processes work, the phenomenon happens.
If statistic information is wrong, the query are always slow.

EDIT2

This is the stat. There are almost 100 connections, so I couldn’t show all stat.

enter image description here


Get this bounty!!!

#StackBounty: #postgresql #replication Using Postgresql logical replication, how do you know that the subscriber is caught up?

Bounty: 50

Postgresql has some interesting monitoring tools for monitoring the new logical replication system’s progress, but I don’t really understand them. The two tools I’m aware of are:

pg_stat_replication

and it’s sibling:

pg_stat_subscription

I’ve read the documentation for these, but they don’t say how to know if a replica is actually synced, and interpreting these tables didn’t seem obvious to me. Can anybody explain?


Get this bounty!!!

#StackBounty: #php #ajax #laravel #postgresql Multiple duplicate HTTP requests/queries from single requests in Azure with Laravel

Bounty: 50

I have a Laravel app being served on Azure. I am using an AJAX request to poll data for a javascript chart.

The AJAX requests a URL defined in my routes (web.php), thus:

Route::get('/rfp_chart_data', 'DataController@chart_data')->name('chart_data');

That controller method runs a postgresql query and returns a JSON file. This all works fine.

However, after experiencing some performance issues, I decided to monitor the postgres queries and discovered that the same query was running 3 times for each request to this URL.

This happens regardless of whether I:

  • access the URL via an AJAX request

  • go directly to the URL in a browser

  • access the URL via cURL

This (AFAIK) eliminates the possibility that this is some sort of missing img src issue (e.g. What can cause a double page request?)

Thanks very much for any help…

EDIT:

Image of the duplicate queries in postgres pg_stat_activity — this is from 1 web request:

image of duplicate requests

EDIT:

Full controller code:

<?php

namespace AppHttpControllers;

use AppAllRfpEntry;
use DB;
use IlluminateHttpRequest;
use YajraDatatablesFacadesDatatables;

class DataController extends Controller {
    /**
     * Displays datatables front end view
     *
     * @return IlluminateViewView
     */

    //|| result_url || '">' || result_title || '</a>'

    public function rfp_chart_data(Request $request) {

        $binding_array = array();

        $chart_data_sql = "SELECT relevant_dates.date::date,
            CASE WHEN award_totals.sum IS NULL
            THEN 0
            ELSE award_totals.sum
            END
            as sum

            ,


            CASE WHEN award_totals.transaction_count IS NULL
            THEN 0
            ELSE award_totals.transaction_count
            END
            as transaction_count FROM

            (
            SELECT * FROM generate_series('" . date('Y-m-01', strtotime('-15 month')) . "'::date, '" . date('Y-m-01') . "'::date, '1 month') AS date
            )  relevant_dates

            LEFT JOIN

            (
            SELECT extract(year from awarded_date)::text || '-' || RIGHT('0' || extract(month from awarded_date)::text, 2) || '-01'  as date, sum(award_amount)::numeric as sum, COUNT(award_amount) as transaction_count FROM all_rfp_entries

            WHERE awarded_date >= '" . date('Y-m-01', strtotime('-15 month')) . "'

            AND awarded_date <= '" . date("Y-m-d") . "' AND award_status = 'AWARDED'
            AND award_amount::numeric < 10000000000";

        if ($request->get('rfp_company_filter')) {

            $binding_array['rfp_company_filter'] = $request->get('rfp_company_filter');

            $chart_data_sql .= " AND company = :rfp_company_filter";

        };

        if ($request->get('rfp_source_filter')) {

            $binding_array['rfp_source_filter'] = $request->get('rfp_source_filter');

            $chart_data_sql .= " AND rfp_source = :rfp_source_filter";

        }

        if ($request->get('exclude_fed_rev')) {

            $chart_data_sql .= " AND rfp_source != 'US FED REV' ";

        }

        if ($request->get('rfp_year_filter')) {

            $binding_array['rfp_year_filter'] = $request->get('rfp_year_filter');

            $chart_data_sql .= " AND year = :rfp_year_filter";

        }

        if ($request->get('rfp_priority_level_filter')) {

            $binding_array['rfp_priority_level_filter'] = $request->get('rfp_priority_level_filter');

            $chart_data_sql .= " AND priority_level = :rfp_priority_level_filter";

        }

        if ($request->get('rfp_search_input_chart')) {

            $binding_array['rfp_search_input_chart'] = $request->get('rfp_search_input_chart');

            $chart_data_sql .= " AND search_document::tsvector @@ plainto_tsquery('simple', :rfp_search_input_chart)";

        }

        $chart_data_sql .= " GROUP BY extract(year from awarded_date), extract(month from awarded_date)
        ) award_totals
        on award_totals.date::date = relevant_dates.date::date

        ORDER BY extract(year from relevant_dates.date::date), extract(month from relevant_dates.date::date)
        ";

        return json_encode(DB::select($chart_data_sql, $binding_array));


    }

    public function data(Request $request) {

        $query = AllRfpEntry::select('id', 'year', 'company', 'result_title', 'award_amount', 'edit_column', 'doc_type', 'rfp_source', 'posted_date', 'awarded_date', 'award_status', 'priority_level', 'word_score', 'summary', 'contract_age', 'search_document', 'link');

        if ($request->get('exclude_na')) {

            $query->where('all_rfp_entries.company', '!=', 'NA');

        }

        if ($request->get('clicked_date')) {

            $query->where('all_rfp_entries.awarded_date', '>', $request->get('clicked_date'));

            $query->where('all_rfp_entries.awarded_date', '<=', $request->get('clicked_date_plus_one_month'));

        }

        if ($request->get('filter_input')) {

            $query->whereRaw("search_document::tsvector @@ plainto_tsquery('simple', '" . $request->get('filter_input') . "')");

        }

        $datatables_json = datatables()->of($query)

            ->rawColumns(['result_title', 'edit_column', 'link'])

            ->orderColumn('award_amount', 'award_amount $1 NULLS LAST')
            ->orderColumn('priority_level', 'priority_level $1 NULLS LAST');

        if (!$request->get('filter_input')) {

            $datatables_json = $datatables_json->orderByNullsLast();

        }

        if (!$request->get('filter_input') and !$request->get('clicked_date')) {

            $count_table = 'all_rfp_entries';

            $count = DB::select(DB::raw("SELECT n_live_tup FROM pg_stat_all_tables WHERE relname = :count_table "), array('count_table' => $count_table))[0]->n_live_tup;

            $datatables_json = $datatables_json->setTotalRecords($count);

        }

            $datatables_json = $datatables_json->make(true);

        return $datatables_json;

    }


}


Get this bounty!!!

#StackBounty: #excel #postgresql #certificate Excel2016: Cannot query PostgresSQL database: Server certificate not accepted

Bounty: 50

I want to import some data to Excel2016 from a postgresSQL table. I have tried it by clicking “new query” and selecting From Database -> From PostgresSQL Database:

enter image description here

But then I receive the following error:

Details: “TlsClientStream.ClientAlertException: CertificateUnknown: Server certificate was not accepted. Chain status: A certificate chain processed, but terminated in a root certificate which is not trusted by the trust provider.
. The specified hostname was not present in the certificate.
at TlsClientStream.TlsClientStream.ParseCertificateMessage(Byte[] buf, Int32& pos)
at TlsClientStream.TlsClientStream.TraverseHandshakeMessages()
at TlsClientStream.TlsClientStream.GetInitialHandshakeMessages(Boolean allowApplicationData)
at TlsClientStream.TlsClientStream.PerformInitialHandshake(String hostName, X509CertificateCollection clientCertificates, RemoteCertificateValidationCallback remoteCertificateValidationCallback, Boolean checkCertificateRevocation)”

Any suggestions on how to solve this? Thank you so much in advance!


Get this bounty!!!

#StackBounty: #postgresql #memory #max-connections How can I find the source of postgresql per-connection memory leaks?

Bounty: 100

I’m using postgresql 9.5.4 on amazon RDS with ~1300 persistent connections from rails 4.2 with “prepared_statements: false”. Over the course of hours and days, the “Freeable Memory” RDS stat continues to go down indefinitely but jumps back up to a relatively small working set every time we reconnect (restart our servers). If we let it go too long, it goes all the way to zero and the database instance really does start to go into swap and eventually fail. Subtracting the freeable memory over days from the peaks when we restart we see that there are 10’s of MB per connection on average.

enter image description here

Digging into the per-pid RSS from enhanced monitoring, we see the same slow growth on example connection pids but the total RSS seems to just be a proxy for actual memory usage per connection (https://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/).

enter image description here

How can I either:

  1. Change the default.postgres9.5 parameters below to avoid unbounded memory growth per-connection
  2. Determine what queries cause this unbounded growth and change them to prevent it
  3. Determine what type of buffering/caching is causing this unbounded growth so that I can use that to do either of the above

enter image description here


Get this bounty!!!

#StackBounty: #postgresql #replication #security Security considerations of allowing logical replication subscribers on PostgreSQL

Bounty: 50

I run a non-profit dedicated to sharing data kind of like Wikipedia.

We recently had a client that wanted a replica of our database, and we realized that by using PostgreSQL’s new logical replication we can replicate tables of our DB to a server they control.

This would be great for fulfilling our mission of sharing the data. It’s 100× better than providing slow APIs.

We created a role for them like this:

CREATE ROLE some_client WITH REPLICATION LOGIN PASSWORD 'long-password';
GRANT SELECT ON TABLE some_table TO some_client;

And we created a PUBLICATION for them like this:

CREATE PUBLICATION testpublication FOR TABLE ONLY some_table;

Is there any risk of doing this? My analysis is that this gives them SELECT access to a table that they’re replicating to their own server, but that’s it. Any other concerns? If there are concerns, are there ways to make this work? We have tables we don’t want to share, but most of our tables only have public data.


Get this bounty!!!

#StackBounty: #postgresql #postgresql-10 Postgresql 10 – Parallel configuration

Bounty: 50

There are 4 configurations to enable the parallel and do the optimization, but the documentation of PostgreSQL doesn’t says anything about values or calculation. My questions are:

1- How to calculate the values of max_parallel_workers,
max_parallel_workers_per_gather and max_worker_processes?

2- The work_mem can be calculate on base of connections and
memory(RAM), but the work_mem needs to change something if I enable
the parallel?

My supposition is: if the machine has 8 cores the max_parallel_workers is 8 and the values of worker process and per gather are 32(8*4), the number 4 I took from the original configuration that is 4 gathers per 1 parallel work.


Get this bounty!!!

#StackBounty: #postgresql #partitioning #postgresql-9.6 #postgresql-10 Postgres table partitioning – declarative vs inheritance

Bounty: 100

I have a table with over 70MM rows running on Postgres 9.6.6.
The table size is about 50GB (70GB with indexes). The table size is projected to triple in the next 3 months. The growth will slow after that.

The table has several varchar fields and 60+ numeric fields. Each row includes customer ID and every query uses customer ID. There are no JOINs – each query retrieves either a collection of rows, or aggregation over some collection of rows.

Any recommendations if I should

  1. keep 9.6.6 and using inheritance,
  2. upgrade to 10.4 and using declarative partitioning,
  3. try something else?


Get this bounty!!!

#StackBounty: #python-3.x #postgresql #sqlalchemy #psycopg2 #python-multiprocessing multiprocessing / psycopg2 TypeError: can't pic…

Bounty: 100

I followed the below code in order to implement a parallel select query on a postgres database:

https://tech.geoblink.com/2017/07/06/parallelizing-queries-in-postgresql-with-python/

My basic problem is that I have ~6k queries that need to be executed, and I am trying to optimise the execution of these select queries. Initially it was a single query with the where id in (...) contained all 6k predicate IDs but I ran into issues with the query using up > 4GB of RAM on the machine it ran on, so I decided to split it out into 6k individual queries which when synchronously keeps a steady memory usage. However it takes a lot longer to run time wise, which is less of an issue for my use case. Even so I am trying to reduce the time as much as possible.

This is what my code looks like:

class PostgresConnector(object):
    def __init__(self, db_url):
        self.db_url = db_url
        self.engine = self.init_connection()
        self.pool = self.init_pool()

    def init_pool(self):
        CPUS = multiprocessing.cpu_count()
        return multiprocessing.Pool(CPUS)

    def init_connection(self):
        LOGGER.info('Creating Postgres engine')
        return create_engine(self.db_url)

    def run_parallel_queries(self, queries):
        results = []
        try:
            for i in self.pool.imap_unordered(self.execute_parallel_query, queries):
                results.append(i)
        except Exception as exception:
            LOGGER.error('Error whilst executing %s queries in parallel: %s', len(queries), exception)
            raise
        finally:
            self.pool.close()
            self.pool.join()

        LOGGER.info('Parallel query ran producing %s sets of results of type: %s', len(results), type(results))

        return list(chain.from_iterable(results))

    def execute_parallel_query(self, query):
        con = psycopg2.connect(self.db_url)
        cur = con.cursor()
        cur.execute(query)
        records = cur.fetchall()
        con.close()

        return list(records)

However whenever this runs, I get the following error:

TypeError: can't pickle _thread.RLock objects

I’ve read lots of similar questions regarding the use of multiprocessing and pickleable objects but I cant for the life of me figure out what I am doing wrong.

The pool is generally one per process (which I believe is the best practise) but shared per instance of the connector class so that its not creating a pool for each use of the parallel_query method.

The top answer to a similar question:

Accessing a MySQL connection pool from Python multiprocessing

Shows an almost identical implementation to my own, except using MySql instead of Postgres.

Am I doing something wrong?

Thanks!

EDIT:

I’ve found this answer:

Python Postgres psycopg2 ThreadedConnectionPool exhausted

which is incredibly detailed and looks as though I have misunderstood what multiprocessing.Pool vs a connection pool such as ThreadedConnectionPool gives me. However in the first link it doesn’t mention needing any connection pools etc. This solution seems good but seems A LOT of code for what I think is a fairly simple problem?

EDIT 2:

So the above link solves another problem, which I would have likely run into anyway so I’m glad I found that, but it doesnt solve the initial issue of not being able to use imap_unordered down to the pickling error. Very frustrating.


Get this bounty!!!