Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

+mdb atlas vectordb [clean_final] #3000

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

ranfysvalle02
Copy link

Why are these changes needed?

MongoDB has been ranked as the best vector database(https://www.mongodb.com/blog/post/atlas-vector-search-commands-highest-developer-nps-retool-state-ai-2023-survey) in the Retool AI report, so it is quite important to add MongoDB vector search as an option for Autogen RAG.

You can easily start the MongoDB vector search on a free tier M0 MongoDB Atlas cluster. Free tier cluster provides the full functionality of the MongoDB vector search. https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview/

But why is MongoDB such a standout? Well, there are a few key reasons.

MongoDB Atlas integrates smoothly with existing databases. For organizations already using MongoDB, this means a seamless expansion into the vector storage—no major system overhauls required!
MongoDB Atlas is built to handle operational heavy-lifting. It excels when serving large-scale, mission-critical applications, offering robustness and reliability where it counts.
MongoDB's flexibility in handling a variety of data types and structures makes it perfectly suited to the complexity of vector embeddings.

As such, implementing MongoDB as a Retrieval Agent can unlock new potential in your AI applications, bringing the full power of vector storage to bear.

Related issue number: 711

Closes #711
Closes #2996

Checks

@ranfysvalle02
Copy link
Author

Sorry @thinkall @Hk669 -- I know I have been a bit annoying, but I think this time we got it 👍

@thinkall
Copy link
Collaborator

Hi @ranfysvalle02 , thank you. I can see the issues of notebook have not been addressed. See comments here: #2996

@ranfysvalle02
Copy link
Author

The simple change of

    assert collection.name == collection_name
    assert collection.collection_name == collection_name

Highlighted that I need to create a "wrapper" class around the MongoDB collection, similar to what pgvector did.

class Collection:
    """
    A Collection object for PGVector.

    Attributes:
        client: The PGVector client.
        collection_name (str): The name of the collection. Default is "documents".
        embedding_function (Callable): The embedding function used to generate the vector representation.
            Default is None. SentenceTransformer("all-MiniLM-L6-v2").encode will be used when None.
            Models can be chosen from:
            https://huggingface.co/models?library=sentence-transformers
        metadata (Optional[dict]): The metadata of the collection.
        get_or_create (Optional): The flag indicating whether to get or create the collection.
    """

but for MongoDB. Will be working on this @Hk669

@ranfysvalle02
Copy link
Author

@thinkall - Can you please tell me how to 'fix' the notebook? Or perhaps have it as a 'suggested commit'? I'll be addressing the notebook and any final touches later today.

@thinkall
Copy link
Collaborator

thinkall commented Jun 22, 2024

@thinkall - Can you please tell me how to 'fix' the notebook? Or perhaps have it as a 'suggested commit'? I'll be addressing the notebook and any final touches later today.

What about run it successfully in your local env and remove only the sensitive info? A new user should be able to run it by fill in the missed message, which should only be the config_list.

So, the connect string of mongodb should not be empty, the one I suggested in your last PR worked for me. Does it work for you? The one you previously used didn't work for me and was not connecting to the docker container.

The output of the chat in the last cell is not correct. Could you please check my previous comments and the pgvector notebook example?

@thinkall
Copy link
Collaborator

The simple change of

    assert collection.name == collection_name
    assert collection.collection_name == collection_name

Highlighted that I need to create a "wrapper" class around the MongoDB collection, similar to what pgvector did.

class Collection:
    """
    A Collection object for PGVector.

    Attributes:
        client: The PGVector client.
        collection_name (str): The name of the collection. Default is "documents".
        embedding_function (Callable): The embedding function used to generate the vector representation.
            Default is None. SentenceTransformer("all-MiniLM-L6-v2").encode will be used when None.
            Models can be chosen from:
            https://huggingface.co/models?library=sentence-transformers
        metadata (Optional[dict]): The metadata of the collection.
        get_or_create (Optional): The flag indicating whether to get or create the collection.
    """

but for MongoDB. Will be working on this @Hk669

It's OK, no need to wrap a Collection class for mongodb. PGVector doesn't have a collection class, that's why we have it in AutoGen vector implementation. The key for AutoGen vectordb is to follow the protocols in base.py.

@ranfysvalle02
Copy link
Author

I see what you mean @thinkall !

I found the issue with the notebook and notebook output!

 # MongoDB Atlas database
        },
        "get_or_create": False,  # set to False if you don't want to reuse an existing collection
        "overwrite": True,  # set to True if you want to overwrite an existing collection
    },

vs

 # MongoDB Atlas database
        },
        "get_or_create": True, 
        "overwrite": True,  # set to True if you want to overwrite an existing collection
    },

We are close!!!

I'll push the fix/code later today

@thinkall
Copy link
Collaborator

I see what you mean @thinkall !

I found the issue with the notebook and notebook output!

 # MongoDB Atlas database
        },
        "get_or_create": False,  # set to False if you don't want to reuse an existing collection
        "overwrite": True,  # set to True if you want to overwrite an existing collection
    },

vs

 # MongoDB Atlas database
        },
        "get_or_create": True, 
        "overwrite": True,  # set to True if you want to overwrite an existing collection
    },

We are close!!!

I'll push the fix/code later today

No errors here. I've fixed this and made a commit.

Comment on lines 216 to 220
"VectorDB returns doc_ids: [[]]\n",
"\u001b[32mNo more context, will terminate.\u001b[0m\n",
"\u001b[33mragproxyagent\u001b[0m (to assistant):\n",
"\n",
"TERMINATE\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retrieve_docs is not working as expected. No doc is returned. Either the query pipeline or the atlas local env is not functional.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On it! Thank you so much for helping debug! There is something wrong with the implementation I believe -- will debug this shortly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thinkall - It is an implementation bug! its an issue with the index_name :) WIl fix shortly.

@ranfysvalle02
Copy link
Author

@thinkall finally tracked this down --- its all about the index! the create_collection method

def create_collection(collection_name: str,
                        overwrite: bool = False,
                        get_or_create: bool = True) -> Any

does not use 'index_name' or 'similarity' -- which I had added. Working on a fix!

@ranfysvalle02
Copy link
Author

ranfysvalle02 commented Jun 23, 2024

@thinkall - I finally got it to run, but I have to add a strange programmatic arbitrary delay for things to work. I am working on a more elegant solution. After ~15seconds it works. Anything 5 seconds or less fails.

Screenshot 2024-06-22 at 9 40 55 PM Screenshot 2024-06-22 at 9 20 05 PM

@codecov-commenter
Copy link

codecov-commenter commented Jun 23, 2024

Codecov Report

Attention: Patch coverage is 79.68750% with 39 lines in your changes missing coverage. Please review.

Project coverage is 19.88%. Comparing base (03259b2) to head (95e2f79).
Report is 10 commits behind head on main.

Files Patch % Lines
autogen/agentchat/contrib/vectordb/mongodb.py 80.85% 22 Missing and 14 partials ⚠️
autogen/agentchat/contrib/vectordb/base.py 25.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3000       +/-   ##
===========================================
- Coverage   32.45%   19.88%   -12.58%     
===========================================
  Files          93       95        +2     
  Lines       10109    10426      +317     
  Branches     2172     2388      +216     
===========================================
- Hits         3281     2073     -1208     
- Misses       6544     8214     +1670     
+ Partials      284      139      -145     
Flag Coverage Δ
unittests 19.83% <79.68%> (-12.63%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Hk669
Copy link
Collaborator

Hk669 commented Jun 23, 2024

@thinkall - I finally got it to run, but I have to add a strange programmatic arbitrary delay for things to work. I am working on a more elegant solution. After ~15seconds it works. Anything 5 seconds or less fails.

Screenshot 2024-06-22 at 9 40 55 PM Screenshot 2024-06-22 at 9 20 05 PM

https://github.com/microsoft/autogen/actions/runs/9629801423/job/26560601555?pr=3000 -> can you look into the tests that are failing. thanks @ranfysvalle02

Copy link
Collaborator

@thinkall thinkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ranfysvalle02 , there is still some issues with the notebook and test, could you please help investigate? Btw, please use pre-commit run --all-files to make sure the format is good. Thank you so much!

@@ -171,7 +171,7 @@ def get_docs_by_ids(
ids: List[ItemID] | A list of document ids. If None, will return all the documents. Default is None.
collection_name: str | The name of the collection. Default is None.
include: List[str] | The fields to include. Default is None.
If None, will include ["metadatas", "documents"], ids will always be included.
If None, will include ["metadata", "content"], ids will always be included. # TODO - Confirm keys
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this will lead to changes to all the other dbs. It's better to update the keys in the results of mongodb wrapper.

Comment on lines +42 to +49
" mongodb:\n",
" image: mongodb/mongodb-atlas-local:latest\n",
" restart: unless-stopped\n",
" ports:\n",
" - \"27017:27017\"\n",
" environment:\n",
" MONGODB_INITDB_ROOT_USERNAME: mongodb_user\n",
" MONGODB_INITDB_ROOT_PASSWORD: mongodb_password\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this setting, I always get empty returns like below:

2024-06-30 21:18:02,516 - autogen.agentchat.contrib.vectordb.mongodb - INFO - Search index vector_index created successfully.
VectorDB returns doc_ids:  [[]]
No more context, will terminate.

Are you sure the free version of mongodb works?

" \"vector_db\": \"mongodb\", # MongoDB Atlas database\n",
" \"collection_name\": \"flaml_collection\",\n",
" \"db_config\": {\n",
" \"connection_string\": \"<connection_string>\", # MongoDB Atlas connection string\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run the notebook with a mongodb instance deployed with the docker-compose.yml provided at the beginning and share the actual connection_string you used?

"\u001b[32mUpdating context and resetting conversation.\u001b[0m\n",
"index is ready to use.\n",
"{'id': '6677781cbb83ea33c40099e1', 'name': 'default_index', 'type': 'vectorSearch', 'status': 'READY', 'queryable': True, 'latestDefinitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 6, 23, 1, 19, 24, 336000)}, 'latestDefinition': {'fields': [{'type': 'vector', 'numDimensions': 384, 'path': 'embedding', 'similarity': 'cosine'}]}, 'statusDetail': [{'hostname': 'shared-shard-00-search-onamml', 'status': 'READY', 'queryable': True, 'mainIndex': {'status': 'READY', 'queryable': True, 'definitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 6, 23, 1, 19, 24)}, 'definition': {'fields': [{'type': 'vector', 'path': 'embedding', 'numDimensions': 384, 'similarity': 'cosine'}]}}}, {'hostname': 'shared-shard-00-search-6xag8e', 'status': 'READY', 'queryable': True, 'mainIndex': {'status': 'READY', 'queryable': True, 'definitionVersion': {'version': 0, 'createdAt': datetime.datetime(2024, 6, 23, 1, 19, 24)}, 'definition': {'fields': [{'type': 'vector', 'path': 'embedding', 'numDimensions': 384, 'similarity': 'cosine'}]}}}]}\n",
"Now running pipeline: [{'$vectorSearch': {'index': 'default_index', 'limit': 60, 'numCandidates': 60, 'queryVector': [-0.08256451040506363, -0.07900252193212509, -0.05290786176919937, 0.021982736885547638, 0.046406690031290054, 0.027769701555371284, -0.02768588438630104, -0.020102187991142273, -0.05407266318798065, -0.061684805899858475, -0.03940979018807411, -0.029285598546266556, -0.1118478998541832, -0.03136416897177696, -0.04099257290363312, -0.07897000014781952, -0.02522769570350647, 0.043702732771635056, -0.030820483341813087, -0.041595760732889175, 0.10552595555782318, 0.0023172772489488125, 0.08983399718999863, 0.10865391790866852, -0.06146957352757454, 0.04154617711901665, 0.015428234823048115, 0.016568025574088097, 0.013623313046991825, -0.06059451401233673, 0.08428270369768143, 0.009563339874148369, -0.002620439976453781, 0.016997039318084717, -0.07201018929481506, -0.010901586152613163, -0.030768705531954765, -0.04398634657263756, -0.026716720312833786, -0.019298473373055458, 0.029043301939964294, -0.03137688338756561, -0.0516120120882988, -0.033414166420698166, 0.05385608226060867, -0.025596346706151962, -0.02077491395175457, -0.0634346529841423, 0.03223349153995514, 0.02784794755280018, -0.06079091876745224, -0.012161108665168285, -0.0933445394039154, -0.018985357135534286, -0.022000310942530632, 0.08059032261371613, 0.03905639797449112, 0.008981743827462196, -0.04856802150607109, -0.0195226538926363, -0.016003113240003586, -0.10165907442569733, -0.004733760375529528, 0.030122995376586914, -0.038355227559804916, 0.03839924931526184, -0.028533125296235085, 0.01822500303387642, 0.0707336813211441, -0.02592848241329193, 0.02241864986717701, 0.022557010874152184, 0.007257631979882717, 0.03511698544025421, 0.008497730828821659, 0.06233576685190201, 0.06869452446699142, 0.06520985811948776, -0.018009020015597343, 0.008016299456357956, -0.09440284222364426, -0.06914988905191422, -0.016991959884762764, -0.004849573597311974, 0.015289856120944023, -0.05368100106716156, -0.07648778706789017, 0.04355047643184662, -0.013986689038574696, 0.03536888584494591, 0.03178128972649574, 0.03904074802994728, 0.027542345225811005, 0.021311746910214424, -0.08981165289878845, 0.050620175898075104, 0.006543598137795925, 0.07310184836387634, -0.033499374985694885, -0.01851048693060875, -0.07171830534934998, -0.07049573212862015, -0.02946554869413376, 0.04081925004720688, -0.015752671286463737, -0.05440584942698479, -0.00638421019539237, -0.027693038806319237, -0.015809008851647377, -0.0794110968708992, 0.08307767659425735, -0.010127314366400242, 0.031197702512145042, -0.0325561985373497, 0.028586456552147865, 0.05326930806040764, -0.04397851228713989, -0.06359461694955826, 0.003676487598568201, 0.06998850405216217, -0.02999182790517807, 0.03461074084043503, 0.05651488155126572, -0.05784572660923004, 0.02231559529900551, -0.07732831686735153, -0.029416916891932487, 1.8518434945716996e-33, 0.0358523465692997, -0.002374001545831561, 0.009263500571250916, -0.05580880120396614, 0.030508413910865784, -0.037797845900058746, 0.01508091390132904, 0.02779262885451317, -0.04756521061062813, 0.010429342277348042, -0.005697719287127256, 0.03368696570396423, -0.014907917007803917, -0.02615354210138321, -0.05337945744395256, -0.08737822622060776, 0.04612358659505844, 0.016435381025075912, -0.03597096726298332, -0.06492944061756134, 0.11139646172523499, -0.04470240697264671, 0.013333962298929691, 0.06944458186626434, 0.04924115538597107, 0.021988168358802795, -0.0033458129037171602, -0.021327221766114235, 0.04618706554174423, 0.09092214703559875, -0.009819227270781994, 0.03574197739362717, -0.02589249238371849, 0.015359507873654366, 0.01923568733036518, 0.009884021244943142, -0.0687863752245903, 0.008688706904649734, 0.0003024878678843379, 0.006991893518716097, -0.07505182921886444, -0.045765507966279984, 0.005778071004897356, 0.0200499240309, -0.07049272209405899, -0.06168036535382271, 0.044801026582717896, 0.026470575481653214, 0.01803005486726761, 0.04355733096599579, 0.034672655165195465, -0.08011800795793533, 0.03965161740779877, 0.08112046867609024, 0.07237163931131363, 0.07554267346858978, -0.0966770201921463, 0.05703232064843178, 0.007653184700757265, 0.09404793381690979, 0.02874479629099369, 0.032439567148685455, -0.006544401869177818, 0.0747322142124176, -0.06779398024082184, -0.03769124671816826, 0.018574388697743416, -0.0027497054543346167, 0.05186106637120247, 0.045869190245866776, 0.052037931978702545, 0.00877095852047205, 0.00956355594098568, 0.06010708585381508, 0.07063814997673035, -0.05281956121325493, 0.11385682970285416, 0.0014734964352101088, -0.13000114262104034, 0.04160114377737045, 0.002756801201030612, -0.03354136645793915, -0.012316903099417686, -0.04667062684893608, -0.021649040281772614, 0.009122663177549839, 0.07305404543876648, 0.050488732755184174, 0.0037498027086257935, 0.06742933392524719, -0.09808871150016785, -0.02533995360136032, 0.07752660661935806, -0.008930775336921215, -0.020734407007694244, -8.718873943854186e-34, 0.030775681138038635, -0.04046367108821869, -0.07485030591487885, 0.06837300956249237, 0.03777360916137695, 0.03171695023775101, 0.038366734981536865, -0.009698187932372093, -0.06721752882003784, 0.03483430668711662, -0.03264770656824112, -0.004821446258574724, 0.017873667180538177, -0.01217806525528431, -0.06693356484174728, -0.042935941368341446, 0.07182027399539948, -0.023592444136738777, 0.010779321193695068, 0.03270953893661499, -0.03838556632399559, -0.010096886195242405, -0.058566078543663025, -0.06304068863391876, -0.013382021337747574, -0.011351224966347218, -0.08517401665449142, 0.007304960861802101, -0.04197632893919945, -0.008837309665977955, 0.000581165833864361, 0.009765408001840115, -0.02323746308684349, -0.07040572166442871, -0.0630621388554573, -0.01030951738357544, 0.07319610565900803, -0.002567168092355132, -0.00982675701379776, 0.08009836822748184, 0.06278694421052933, -0.053986601531505585, -0.13036444783210754, -0.05632428079843521, -0.012127791531383991, -0.00034488266101107, -0.05524465814232826, -0.019998280331492424, -0.041557829827070236, 0.07457990199327469, -0.004864905495196581, 0.0744631364941597, -0.038698967546224594, 0.11076352000236511, 0.08321533352136612, -0.1319902539253235, 0.05189663544297218, -0.08637715131044388, -0.047119464725255966, 0.0712425485253334, 0.038989413529634476, -0.06715074181556702, 0.0770900622010231, -0.016237575560808182, 0.16853967308998108, -0.003975923638790846, 0.11307050287723541, 0.07726389169692993, -0.028748558834195137, 0.04492560029029846, 0.0768602192401886, 0.0852692499756813, 0.021246735006570816, 0.11719376593828201, 0.0029091970063745975, -0.011192459613084793, -0.09389575570821762, 0.021549541503190994, -0.0055024465546011925, 0.032183919101953506, 0.0651387944817543, -0.0652405172586441, 0.03021097555756569, 0.1095665693283081, -0.02563057281076908, 0.05070950835943222, 0.09074468910694122, 0.08164751529693604, 0.039858028292655945, -0.045717816799879074, -0.01968374475836754, -0.01942502148449421, 0.020252034068107605, 0.028495490550994873, -0.014108758419752121, -2.6071681702433125e-08, -0.004948799964040518, -0.03374723717570305, -0.006966953631490469, 0.04770921543240547, 0.060589514672756195, 0.039017271250486374, -0.06870992481708527, 0.04758283868432045, -0.04153140261769295, -0.009761914610862732, 0.05678777024149895, -0.024886248633265495, 0.08310353755950928, 0.04019981995224953, 0.04347654804587364, -0.016476230695843697, 0.02281028777360916, 0.044384729117155075, 0.012391419149935246, 0.03150279074907303, 0.03414358198642731, 0.023670021444559097, -0.035867370665073395, 0.00584121560677886, 0.03878429904580116, -0.03416749835014343, 0.0317315049469471, 0.014832393266260624, 0.06329585611820221, -0.07007385790348053, -0.11312873661518097, -0.0667077898979187, 0.031542230397462845, 0.03318323940038681, -0.05146196484565735, -0.04369741305708885, 0.030556850135326385, 0.05148332566022873, -0.09324397146701813, 0.08804989606142044, -0.05473781377077103, 0.02356131188571453, -0.0072563826106488705, -0.013308629393577576, 0.022258494049310684, 0.047823697328567505, -0.014027439057826996, -0.018331162631511688, -0.02744504064321518, 0.027262693271040916, -0.03694259002804756, 0.04492212459445, 0.04835069552063942, 0.09086570143699646, -0.0022586847189813852, -0.03940355032682419, -0.005774456076323986, -0.06551025062799454, -0.04700932279229164, -0.00200175354257226, -0.039275478571653366, -0.04998438432812691, -0.08698498457670212, 0.015872927382588387], 'path': 'embedding'}}, {'$project': {'score': {'$meta': 'vectorSearchScore'}}}, {'$lookup': {'from': 'flaml_collection', 'localField': '_id', 'foreignField': '_id', 'as': 'full_document_array'}}, {'$addFields': {'full_document': {'$arrayElemAt': [{'$map': {'input': '$full_document_array', 'as': 'doc', 'in': {'id': '$$doc.id', 'content': '$$doc.content'}}}, 0]}}}, {'$project': {'full_document_array': 0, 'embedding': 0}}]\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid printing out all the embeddings?


logger = logging.getLogger(__name__)

MONGODB_URI = os.environ.get("MONGODB_URI")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A default value should be given.

Copy link
Collaborator

@thinkall thinkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ranfysvalle02 , I've fixed the mongodb test errors. It looks like the distance_threshold is not working. I'm not sure if it's not working for free mongodb or there is some issue with the code.


# Empty list of queries returns empty list of results
queries = ["Some good pets", "What kind of Sandwich?"]
results = db.retrieve_docs(queries=queries, collection_name=MONGODB_COLLECTION, n_results=2)
Copy link
Collaborator

@thinkall thinkall Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test with distance_threshold set. It's not working in my local env.

# Compute embedding vector from semantic query
query_vector = np.array(self.embedding_function([query_text])).tolist()[0]
# Find documents with similar vectors using the specified index
query_result = _vector_search(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't return embedding by default.

Comment on lines +91 to +108
# if overwrite is False and get_or_create is False, raise a ValueError
if not overwrite and not get_or_create:
raise ValueError("If overwrite is False, get_or_create must be True.")

collection_names = self.db.list_collection_names()
if collection_name not in collection_names:
# Create a new collection
return self.db.create_collection(collection_name)

if overwrite:
self.db.drop_collection(collection_name)

if get_or_create:
# The collection already exists, return it.
return self.db[collection_name]
else:
# get_or_create is False and the collection already exists, raise an error.
raise ValueError(f"Collection {collection_name} already exists.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# if overwrite is False and get_or_create is False, raise a ValueError
if not overwrite and not get_or_create:
raise ValueError("If overwrite is False, get_or_create must be True.")
collection_names = self.db.list_collection_names()
if collection_name not in collection_names:
# Create a new collection
return self.db.create_collection(collection_name)
if overwrite:
self.db.drop_collection(collection_name)
if get_or_create:
# The collection already exists, return it.
return self.db[collection_name]
else:
# get_or_create is False and the collection already exists, raise an error.
raise ValueError(f"Collection {collection_name} already exists.")
# if overwrite is False and get_or_create is False, raise a ValueError
if overwrite:
self.db.drop_collection(collection_name)
collection_names = self.db.list_collection_names()
if collection_name not in collection_names:
# Create a new collection
return self.db.create_collection(collection_name)
if get_or_create:
# The collection already exists, return it.
return self.db[collection_name]
else:
# get_or_create is False and the collection already exists, raise an error.
raise ValueError(f"Collection {collection_name} already exists.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants