Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk copy command fails with generated spoint column #136

Open
JeremyMcCormick opened this issue Jan 14, 2025 · 11 comments
Open

Bulk copy command fails with generated spoint column #136

JeremyMcCormick opened this issue Jan 14, 2025 · 11 comments

Comments

@JeremyMcCormick
Copy link

I have a generated column with type spoint in my table which seems to be preventing me from using the \copy command to load data.

I get the error:

ERROR: no binary input function available for type point

I have investigated the problem for a few hours, and it seems that the cause is pgsphere not implementing binary input for this type. An equivalent insert statement using test data works fine, so I don't think there is any fundamental problem with my data or how the generated column is working.

I would really like to get \copy working with my tables which use these generated columns having a definition like:

ALTER TABLE "MyTable" ADD COLUMN radec spoint GENERATED ALWAYS AS (spoint(radians(ra), radians(dec))) STORED;

To work around this, I would have to use less desirable methods like copying from staging tables, using insert commands instead, etc. and I would lose the efficiency of the bulk copy operation.

Any insight or advice would be appreciated, thanks!

@esabol
Copy link
Contributor

esabol commented Jan 14, 2025

Which version of PostgreSQL are you using, @JeremyMcCormick ? And which version of PgSphere? Thanks!

P.S. I think you can simplify your ALTER TABLE statement by using spoint_deg(), like this: ALTER TABLE "MyTable" ADD COLUMN radec spoint GENERATED ALWAYS AS (spoint_deg(ra, dec)) STORED;

@JeremyMcCormick
Copy link
Author

JeremyMcCormick commented Jan 15, 2025

@esabol

I am running this PostgreSQL version:

ppdb=# SELECT version();
                                                       version                                                       
---------------------------------------------------------------------------------------------------------------------
 PostgreSQL 16.6 (Debian 16.6-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

And PgSphere I download and build from here in my Docker container:

https://github.com/postgrespro/pgsphere/archive/refs/tags/1.5.1.tar.gz

P.S. I think you can simplify...

Okay, thanks for the tip!

@esabol
Copy link
Contributor

esabol commented Jan 15, 2025

OK. My googling of this error message found a message from Tom Lane who said he'd fixed something in some PostgreSQL 13.x release that seemed to be related to this, but it must have been something else.

I haven't found any documentation regarding how to add support for this, unfortunately. My google-fu is failing me... Anyone have any pointers?

@JeremyMcCormick
Copy link
Author

I haven't found any documentation regarding how to add support for this, unfortunately.

Adding binary input functions is covered here:

https://www.postgresql.org/docs/current/sql-createtype.html

If this is already present, then my problem lies elsewhere...

@JeremyMcCormick
Copy link
Author

JeremyMcCormick commented Jan 15, 2025

This is probably just ChatGPT hallucinating, but this may be helpful...

Implement the binary input function

#include "postgres.h"
#include "fmgr.h"
#include "libpq/pqformat.h"  // For pq_getmsgfloat8

#include "point.h" // include whatever heads are needed for point type, etc.

PG_FUNCTION_INFO_V1(spoint_recv);

/* Binary input function for spoint */
Datum
spoint_recv(PG_FUNCTION_ARGS)
{
    StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); // Binary input buffer
    SPoint *result = (SPoint *) palloc(sizeof(SPoint)); // Allocate memory for spoint

    // Read longitude (lng) and latitude (lat) from the binary buffer
    result->lng = pq_getmsgfloat8(buf); // Longitude in radians
    result->lat = pq_getmsgfloat8(buf); // Latitude in radians

    // Validate longitude and latitude
    if (!sphereline_latitude(result->lat) || !sphereline_longitude(result->lng))
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("Invalid SPoint: longitude or latitude out of range")));

    PG_RETURN_POINTER(result);
}

Register the function

This is probably not even close to being correct but may provide some hints:

CREATE TYPE spoint (
    INPUT = spoint_in,
    OUTPUT = spoint_out,
    RECEIVE = spoint_recv,  -- Binary input function
    SEND = spoint_send,     -- Optional binary output function
    INTERNALLENGTH = 16,    -- 2 x float8 = 16 bytes
    ALIGNMENT = double,
    STORAGE = plain
);

It is also recommended to implement binary output as well, which is a similar process.

There should be working examples of binary IO implementations for custom types in extensions - I can look around if that would help. I don't think this is all that difficult to implement for someone who knows PostgreSQL internals, especially since spoint is (to my knowledge) just comprised of two float values, so the IO should be straighforward.

@df7cb
Copy link
Contributor

df7cb commented Jan 15, 2025

Are you using binary anywhere in your \copy commands? Because by default, COPY should be using the text format.

@JeremyMcCormick
Copy link
Author

JeremyMcCormick commented Jan 15, 2025

Are you using binary anywhere in your \copy commands? Because by default, COPY should be using the text format.

I'm importing data like this, which is activated by an equivalent \copy command:

COPY  myschema.mytable (${COLUMNS}) FROM STDIN WITH (FORMAT csv, HEADER);

Input data is a CSV file. But I think what is going on here is that it is trying to automatically update the generated column, which requires binary input.

Do you think there are different options on COPY that I could try here which might work?

Apologies that I cannot share my exact data, as it is proprietary, but a simple test could be arranged pretty easily.

@df7cb
Copy link
Contributor

df7cb commented Jan 15, 2025

COPY can only use the binary format when the client is prepared to send the data in that specific format, your CSV would have to be coded like that. If plain COPY would require binary format, pg_dump/restore wouldn't work and people would have complained about that years ago because they can't backup their pgsphere databases.

TBH I have no idea why binary format comes into play at all here. 😸

That said, we should just implement it.

@esabol
Copy link
Contributor

esabol commented Jan 15, 2025

@JeremyMcCormick wrote:

There should be working examples of binary IO implementations for custom types in extensions - I can look around if that would help.

That might be helpful. The first thing I did was to check q3c's code to see if it implemented anything for this, and I didn't see anything.

Apologies that I cannot share my exact data, as it is proprietary, but a simple test could be arranged pretty easily.

A simple test we can run would also be helpful. We would need to add something to the test suite regardless, so I recommend looking at the existing tests and seeing if you can modify or adapt one of the existing tests.

@JeremyMcCormick
Copy link
Author

JeremyMcCormick commented Jan 15, 2025

COPY can only use the binary format when the client is prepared to send the data in that specific format, your CSV would have to be coded like that. If plain COPY would require binary format, pg_dump/restore wouldn't work and people would have complained about that years ago because they can't backup their pgsphere databases.

Even though I have left out the generated column from the column list in the COPY command, internally it looks like binary data is being sent for populating the column. I also don't know if anyone has really used generated columns in a production system, backed up with pg_dump, and then attempted to restore. It may represent a kind of special case.

TBH I have no idea why binary format comes into play at all here. 😸

I don't know either, as it seems like PostgreSQL should be able to populate the generated column from my two coordinate columns. But it looks like binary input needs to be implemented for this to work. That's what the error message suggests to me at least.

A simple test we can run would also be helpful.

I can come up with a toy example to reproduce the problem. My production database has a lot of columns, so I can just include ra, dec and the generated radec column in an example to reproduce the problem, which I think might be helpful.

@vitcpp
Copy link
Contributor

vitcpp commented Jan 17, 2025

I tried a simple test and it passed ok on 16.6.

CREATE EXTENSION IF NOT EXISTS pg_sphere;

CREATE TABLE IF NOT EXISTS t(
	ra double precision, 
	dec double precision,
	radec spoint GENERATED ALWAYS AS (spoint_deg(ra, dec)) STORED
);

TRUNCATE TABLE t;

\copy t(ra,dec) FROM STDIN WITH (FORMAT csv, HEADER);
ra,dec,radec
1.2233,1.345
0.2324,0.5435345
\.

SELECT ra, dec, radec FROM t;

From the original message I found that the message below describes some load problems with type point, not spoint. Type point is a postgresql type.

ERROR: no binary input function available for type **point**

May be, the problem is not related to pgsphere's spoint?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants